Received: (at 79724) by debbugs.gnu.org; 30 Oct 2025 11:26:10 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Oct 30 07:26:10 2025 Received: from localhost ([127.0.0.1]:34811 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1vEQn9-0007P8-2I for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 07:26:10 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34986) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1vEQn3-0007OK-7y for 79724 <at> debbugs.gnu.org; Thu, 30 Oct 2025 07:26:04 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1vEQmw-0006Eb-5C; Thu, 30 Oct 2025 07:25:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=IoYOO8tnv3k7Pd5SrRimjwmGnvElaEu+mAW+l9oKHcg=; b=VyiFXjxZoQtPTU884yck hVz2/6ZsLfsfS+KgPCCzJRBWfzGya50m1sE2kfQLJmq38VuDF3xHC8HrMpIgQcXVH77MEuR1WPN0b fb4gYZJ09p8o0eZhvptzu7y45fbYSmv7ArNr9eI3NROBFILw/ujlP7r1wluvTrf6AdpbuOAtcOOei /P7ZlK0LhSKwR2NW0UoKI9/dZ5WkI1En71XHJSHQheOZO4hGogKlpk5lOiqbSGT7q4Ss54C3u5Dq8 gZ9jQBuZVQ8Awn1wOD933szPdOCw9I3JBtPHsVan9KEI0AVbFzfC7E5QgmwJY+PkMKAqr7d3jYueT y8mHK2uVcUkjbg==; Date: Thu, 30 Oct 2025 13:25:49 +0200 Message-Id: <86ecqkmy0y.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> To: Stephen Berman <stephen.berman@HIDDEN> In-Reply-To: <875xbwof8x.fsf@HIDDEN> (message from Stephen Berman on Thu, 30 Oct 2025 11:28:30 +0100) Subject: Re: bug#79724: 31.0.50; No easy way of searching a buffer for raw bytes References: <86ikfwn36t.fsf@HIDDEN> <875xbwof8x.fsf@HIDDEN> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79724 Cc: 79724 <at> debbugs.gnu.org, monnier@HIDDEN X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) > From: Stephen Berman <stephen.berman@HIDDEN> > Cc: 79724 <at> debbugs.gnu.org, Stefan Monnier <monnier@HIDDEN> > Date: Thu, 30 Oct 2025 11:28:30 +0100 > > On Thu, 30 Oct 2025 11:34:18 +0200 Eli Zaretskii <eliz@HIDDEN> wrote: > > > From: eliz@HIDDEN > > --text follows this line-- > > As the subject says, how can a user easily search for raw bytes in a > > buffer? Or how can a Lisp program quickly scan a buffer to find raw > > bytes and either remove or replace them? > > > > To reproduce, start "emacs -Q" then insert a raw byte by typing > > > > C-x 8 RET 3fffe0 RET > > > > Then try to come up with a regexp that finds only the raw byte. > > > > This is important when one has a buffer which could include raw bytes, > > and wants to json-serialize it, in which case there's a need to remove > > raw bytes or replace them with something that will avoid signaling an > > error from the serialization code. > > > > The only way I found is to examine the buffer one character at a time > > using charset-after. But this is tedious and inefficient. > > > > I seem to be unable to find a way to express this with regexps. The > > naïve way would be "[\u3fff00-\u3fffff]", but that doesn't work (it > > finds ASCII letters and nothing else). Nothing else I tried worked, > > including the recipe from the ELisp manual: > > > > 4. If the end points of a range are raw 8-bit bytes (*note Text > > Representations::), or if the range start is ASCII and the end > > is a raw byte (as in ‘[a-\377]’), the range will match only > > ASCII characters and raw 8-bit bytes, but not non-ASCII > > characters. This feature is intended for searching text in > > unibyte buffers and strings. > > > > In a buffer that includes only ASCII characters and a raw byte, typing > > "C-M-s [a-\377]" signals an error "Failing regexp search. > > > > Is there solution for this job that I'm missing? > > This seems to work, at least for your examples (also if I add them to > the HELLO buffer): > > C-M-s [^[:ascii:][:print:]] Thanks, but that will find also other codepoints. (If it doesn't, it's a separate bug.) And anyway, how would one decide this should work based on the documentation of [:print:]?
bug-gnu-emacs@HIDDEN:bug#79724; Package emacs.
Full text available.Received: (at 79724) by debbugs.gnu.org; 30 Oct 2025 11:10:16 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Oct 30 07:10:15 2025 Received: from localhost ([127.0.0.1]:34703 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1vEQXn-0006Hk-CJ for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 07:10:15 -0400 Received: from mail-lj1-x234.google.com ([2a00:1450:4864:20::234]:52727) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from <mattias.engdegard@HIDDEN>) id 1vEQXg-0006Gk-Ry for 79724 <at> debbugs.gnu.org; Thu, 30 Oct 2025 07:10:11 -0400 Received: by mail-lj1-x234.google.com with SMTP id 38308e7fff4ca-37a1267c45dso5672481fa.1 for <79724 <at> debbugs.gnu.org>; Thu, 30 Oct 2025 04:10:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761822601; x=1762427401; darn=debbugs.gnu.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject :date:message-id:reply-to; bh=dB+Atx+yyiLw0B4KilXEhyhCauGRMu5d3jiqVYGcs7M=; b=WvpTcg6dqK2TxLrSP2ajKdiSj86rT/E0GraROFiEu46XBHqiT9m2JNP5Tk24rxJJPJ BAfw+eINKgIcTX9x1XXa3IMQHzWrn/d7ieiew1rkIgbNVD50grPRJty1PmzMvMNUhkHv 7uTuL8xsjcOJYm09mC4ZYvLvGfwdSToY0mQU6VxxETlDvsqjrqaraQf2/JGF7oU004/W pkFoQBXMf4AvgO7qcj0hvcevRnDXZWtaMvwt+l9skC7aX7yA0+YQydtLtFcXKVAgeOmz JFeNgVIUqw0K4OLsJsyfiDUHoDxt0th43ouFry0K7cqO3ueG1BxEdYiL7uISgsRaJ/9P +4kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761822601; x=1762427401; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:sender:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=dB+Atx+yyiLw0B4KilXEhyhCauGRMu5d3jiqVYGcs7M=; b=Tb+1sTJUmKwvOELL9NsuOHX18EQGMBU5dJePyGj2Ri6i4izf+zXQzV/U6FVybDuGlk IRzgwjwSC+GUgNRFFISEXcL+vAsErNz9DDSSE8hiLeEfW6IPO2NJjhUoMCHLghHl3XLv p4hKqhSquDhYzcrTYRnbs8k3G+TZmYKYJnk0wRDK2UK7aOVvSUtHvF3bKKjZLCMl8xGU 4sOVimcb+JNWpCJN1IWMMvY8ETLtR6LLG5uZhBrEjOXm4SyQSIxGqsFM4B1+ceq7o8HB v32mV7keIZnr53xuZ6sU3BIhVtZVdQDQTV/b5/QrGs0td3Pmljyq04f/JQDyfKnfTDTe VrnQ== X-Gm-Message-State: AOJu0YzVLzI5776h1JDXL028Nj6DIH7FEIrRTwwZaEX7nejnRN/iHM6V gaJ6wLSyMAHgP6C6/CK0Pel8b3oKbAfIaqH/cMXA6MckZhe0DehDYkMiupGH+Q== X-Gm-Gg: ASbGncuAtX+kap/uWINq+LZtmkyML7JIPr0+iKQ/O7jkOnxFVe784Gkt6iefC0qNG9S NZFIWoi1YhE9P+RWjd5clqdNbz75Rusj+WYLN6O5T71odtkkh+5pG7w/VR6/ENF2XSGum8tYihx Dkx0065jN11Hj76suYqxOAAdiMI6YMHTMXeADODCMe7njOKd8Nqj5Z3oFGmDMsn+dF5hmRp4Hy3 4TfKUdx3YiWNCL+CbjRQNGhad5Si1XivdmJ7Fs6P6w8PvBmEKL3sTQJMOYkl0rS9YDstKyeCZxS yDw81oyoEcEA7iJUvnpQEcPTnjDG6MJcOuR9AGTV46iBIZ9kSDrd4+PfuFoDVt0+PDNF1VLv5Ki kLBMH8VUuSXonk1++plevgwvSMswGW8ObAQf9xhHZ8PJ52t8pHmj0sVn/c/xWn1g0bgErE+nlHb SWVyt/PHWTaqiqvkCB8HutPiQbo48zCfCf6v0WDK5GDOJMOS6ijl3r74EdbKBa9LVSfzF2msTNV lLV X-Google-Smtp-Source: AGHT+IHU+G6x9U+hd3XxtUk2lwhVsOSkntcMtrpX4YciV0PMY8yWIhk/B+q+87jTbtpV/ZXL4P7g/g== X-Received: by 2002:a2e:bea5:0:b0:378:ebd7:ad0 with SMTP id 38308e7fff4ca-37a052e66c8mr20876421fa.17.1761822600451; Thu, 30 Oct 2025 04:10:00 -0700 (PDT) Received: from smtpclient.apple (c188-150-186-155.bredband.tele2.se. [188.150.186.155]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-59301f50996sm4474975e87.36.2025.10.30.04.09.59 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 30 Oct 2025 04:10:00 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.15\)) Subject: Re: bug#79724: 31.0.50; No easy way of searching a buffer for raw bytes From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN> In-Reply-To: <86frb0n0yd.fsf@HIDDEN> Date: Thu, 30 Oct 2025 12:09:58 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <1BFAF201-1A4D-4B6D-867F-5EE337A6C9C4@HIDDEN> References: <86ikfwn36t.fsf@HIDDEN> <86frb0n0yd.fsf@HIDDEN> To: Eli Zaretskii <eliz@HIDDEN> X-Mailer: Apple Mail (2.3654.120.0.1.15) X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 79724 Cc: 79724 <at> debbugs.gnu.org, monnier@HIDDEN X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) > As the subject says, how can a user easily search for raw bytes in a > buffer? Or how can a Lisp program quickly scan a buffer to find raw > bytes and either remove or replace them? (re-search-forward (rx (in (#x3fff80 . #x3fffff)))) assuming it's a multibyte buffer which is almost always the case. If you want to find all non-Unicode values, including the embarrassing = range in #x110000..#x3fff7f that we don't speak about, maybe you'd like (re-search-forward (rx (not (in (0 . #x10ffff))))) or if you prefer skip-chars-forward, (skip-chars-forward "\0-\x10ffff") etc. > na=C3=AFve way would be "[\u3fff00-\u3fffff]", but that doesn't work Actually you almost nailed it, it's just a char escape matter: it's = either \u with four, \U with eight or \x with any number of hex digits. = (Or just use rx.)
bug-gnu-emacs@HIDDEN:bug#79724; Package emacs.
Full text available.Received: (at 79724) by debbugs.gnu.org; 30 Oct 2025 10:28:46 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Oct 30 06:28:46 2025 Received: from localhost ([127.0.0.1]:34462 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1vEPte-0003q8-BP for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 06:28:46 -0400 Received: from mout.gmx.net ([212.227.15.19]:53087) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from <stephen.berman@HIDDEN>) id 1vEPtY-0003p5-0k for 79724 <at> debbugs.gnu.org; Thu, 30 Oct 2025 06:28:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmx.net; s=s31663417; t=1761820112; x=1762424912; i=stephen.berman@HIDDEN; bh=BjdiFbW4GjxJ0hkf/LzUUquUmNHdf/GVqVghPPcioOA=; h=X-UI-Sender-Class:From:To:Cc:Subject:In-Reply-To:References:Date: Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding:cc: content-transfer-encoding:content-type:date:from:message-id: mime-version:reply-to:subject:to; b=mda/4TMqGOLshsyfyU9bpVEG9dCaBYZ5LrgGQ2D+7ViPJz5GIsIsCL3CjLSbl0wE Laz5raWnEUumRK8L4AI+rpZLcAjKC6sxpfQ9Auw427jvi+660FjFvyjMdhLARDbzS C771ZbLmaRqq/JTujmOr9O9WoMoUWt4BwG19uOujvS5XKxUCVvgPfqM+w2Gqw04KR 2J6qqNehfHcWeKH3qmhkTH0RxtAPwCTHrhPNiQQ09741o73MdiblkiNEasCP2OPW3 l6gTB6E2mMu2M6ww7xjENZj6zJNZIXnL1PCj9J1LhlWiOIUo3S0gT2cYbEw2gM87/ z/u0aEU+mrbnBmd6Zw== X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a Received: from strobelfssd ([88.130.62.64]) by mail.gmx.net (mrgmx005 [212.227.17.190]) with ESMTPSA (Nemesis) id 1M2wL0-1vHhgS1vLQ-008O59; Thu, 30 Oct 2025 11:28:32 +0100 From: Stephen Berman <stephen.berman@HIDDEN> To: Eli Zaretskii <eliz@HIDDEN> Subject: Re: bug#79724: 31.0.50; No easy way of searching a buffer for raw bytes In-Reply-To: <86ikfwn36t.fsf@HIDDEN> References: <86ikfwn36t.fsf@HIDDEN> Date: Thu, 30 Oct 2025 11:28:30 +0100 Message-ID: <875xbwof8x.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Provags-ID: V03:K1:KCycR3oiTJqOLOGGZ+deM/F4+oQxoYnbfArfc0D5ipEgWfHv5BF CAvNQNGk9SrKLMbfiSL2PnQCwVEAuXqKOVihGPl05zIPGXRUjzy+bl2uQbw+/EkjaEQteRs rNkREucKKqp1i4NFg0WmFJqiiUbRMFwruwZHj6VkhslmdhVzZszfISP3BoufKqbfB9hJgoY jmKb1S7LI+gUkj/DuRvPQ== X-Spam-Flag: NO UI-OutboundReport: notjunk:1;M01:P0:oxW35xLHzv8=;Ep6aZswkGizCvpomJUF0l7Q9hqr fYZRSvXFu9Pjtt5QWGEEAwPMs7UkZp8dF8tppUl3bao9c8oHhWqhobFeX61qEFi4HisAlFZLd 76yc8RmeCB0Aneq/hWSJU7OE7PG+pNja/2CPPBbp5Txs/0DuPjHayK1vVZkoKctdNNkkHQBHo nnxQEHCTP/z+8os5cSJo/rxvTruUUrpE0BZdrNLO61rL+/78p5TndsZeDGexPHTiLHekHmn5z kNNDeUe9MSpqu8IbeCCQvxAk6XxNsXX2sk8L/hgPaOuitOvZRwpldrZNvkEHhHOKips39UUTi 3ENKJWNDAJ0pgXRuN2lX5nJL9N93ZoHRypfWo+ixgoxWZPrjOgUbmhjAc7Xhs877Elpm+rZXX yid717feHjJ0wRF3gKEiPDyThscJ/Gv6rBsH9CWbDH0DDElScTSzyaoF0VenWn9iGpcB+gfbg 0jAdraJ+sj6LLMqmmquUEG11mv1jvF7vwPGaZqkCSrcsmqcKgTQyeu3EiFJvCEN8h5/WYlE1m x8cd6ujfvNUPeflaM+4f0MIgnGGZTmFGoO7UFQwdrtiX/cJrIHY2iZJ51ibPnKmSqtxFhpc9S gu9gRCFZUyNtt7LyDVR1augs8FnEBeawY7v72KTxCj5Pmv0ltt/PYmk+nMto7s6+iFfFfO1sE p/JG4ei1DMTw1XexvawpKAaHuFJ3IiqiM8fTZXuJDpyTwxxX/Wt0hCGuc4bLhFx8sdwmZdMo8 ekGLFE1CvC8TNsln0Ta0oHlVBi1qHBNicWCCFkKy5INDsCvEr7zpPRtdkFDvmLUI1lciPZKdA QD0wbZrfg1EA6bUO+T5vF3yJMgBlHjQ8iX2U4DF2VnEmk184m28Em4QhplI5bJQFb0T+B0L46 O7XqASUZ40KDgmh+6TBxI3f1KJanz1z8wKRglyYBB8q0te7J2uTDelMt3DWyAnlt6zenYxC5z rdx0M/lGtNEugJta+Zr+tMY+1f9VdZXWc8VT/GBU5y9zwRU1i5HRq8Q7Hsyo50Q++fJP5EVl1 J5thQ9cBVfcDDX0BEp21JjYvPYG5+hJCRqbEmVOn+M5cF3TZXpO0SvrgHWqsGOLMMrjCxU1ke HJHmS2fFMoTIPRrHyVbG9CI/+5qNQ2DVGi2FvS1GuF9kzkEprL9GBTi6/+nrWHeAfODtLQ+8e bMmBMnkUmgKpzVq7teEW71EWV28ersIk1UiI5Ycnlvy39tIqi9fkt2rkHXnWSetwz8A27SlsQ oo0rgC2AhN7njIJjaJi24It9+Ey3jtgQX7jJRxNrjkZltH9jTUYbag8kYMUqHyTEAmv+cU2bN a6DJmgfcx48jDT254gK6ZZzypwEr6PemjelIf3IJLHqBDebnMSOqlFvsX7Fmnn9xj6o7LJroI VBPzUGLPy0MQ6wHmadUonDiKpk6o0SGah6NFPx1vR+7faxb2lrQccVyxr6jmq7F+tRfdu/x5d vU9l1k1NkUdYNOGIgSSiaxwyfSWsrblcJxonb7FgGF9eDA/cFps/70jYBuBekxZvXY9cEViRq GDZ4dukUsP44mJlrLOpJ+aCPzamKUIW7mmxma3JyLFfaAZxzxLA7tyxmfZAzMvAsCkMd8AFqR lWjOaMu9F4M+zsqQe3rW1QbS+/vwe7P3gct7EoOsl23Tu1S+KXJ8Z75dmBTZlI7PhZ2V5FOzr GbIlCzXjv4eu5M7gNuAc163YETHqjgVXwbc3lxeinHysiZ/ODB1S3NLqMzi2uxtG746BL/rgw nM2JuUn7EnxX1oaLccPQrYaRpuBliAvs1ZsYB1WBmVQkGlnHgUQ8Xdg+idTr8snpLUhEC3edv apes1W6mpifWSg0oKdBh6miWI8Ai0QeWqINIscISXpFj7pklJbmP/epPhzlnCHoHbyp/gBs5S Ix6qvZNFuD8CA69EXSQpa+Cnx8fN4T6peiEI6yVG9GEdNM7Hc+sKf1CBiTEf2X2cvqwPNtKsf XUgKnXlG2lij4Jimk2lHbRB6T9oiSgz8IP74EELgbW+4WV7itlfV3rq9KjhTtpw+nSq6YSqbh 8pneCUVeOvDs83vTCcfvexO17IxbGjmDWXsawyFXuPbCqVNKwnPVfuRf7ATo/mNxX4UzN4g8b AUQBqqnWI016r/HS/0cYb2p7uwSPovW/LdGYnM2+R7+KBvAJ4wiKMrvhQRLYwDsDe28xyczI7 LcN1W6nO4asI15gOcUwKp6E2QisLFzmNd1AIuIblibqS+lTbvsaJNprVm3tGfnLhMtSBlfQ3r awqBQXAjYEjHifY2+URjQ0VpjwwkIJKMJ0GJ2lBKdaXtkV5mNOj0Y74I6OMHyapTFH7uJwPt/ 6o4g+wZE+8KTDVwjs6jzVJbsxrFxp8C1D672qCXIc3ZQXCKOKtcX4ucyB3g0mGaLcwIV6HFud 4nrINl0rmaH/yC3rphgNS0Wrn3Sfsa+7Oe1DAd2+X4dS1rFAv0mNYDIfCQYfusUTmSKgfaa49 VUB5S95B0QNsrJnzAzLomC3Ik0jwKBFJvsd0Xr9eqzSHS0gu74eJOmYpfKCBOEL8/MxWPi2x3 U/aiZRHr7tJ9CDJzKnbKbB91BrkM11C+vgA7BCmbssMgTyTXUKOfzoVyaBu+S4DiG8nJH3FQp g3TTZHZ6iinb2C0xG8xqZoaXKPEXyxMnl8elCeRvNOzGSOY7NtPnQEH52VcG1POwHqb8ZvTbz fJ3dg5yOlJvcjs0DEKxlD/BoOJ/JtDShvcEHVeXpJxDH+isHehU0mXwqvyYnleFUt44hFuNLk PAEGhmHM4FRU/HZaHfC0HqsqJd6R5radjfZVwj+SBISWMNILZrIx2iLp+dsBYG43qxIL9EHRh lVi9/0pLLZeRwnPI+mTzjTp67HLu81Tsp3ixh0cnfcGW5UFba57UIWTbm69flOgHue1CovOfn Edl/ZaxyDJk/GEw1T9iJKHPjRrD+dqs56+H4w9Sv3yLIqCGLAjMWxyxjfCj/tSgTqc3Wwpmcd eoHsEGC2NJvC06noE1F/95E25vTNeCfAOWR5l9poztVinOBSWqfSmcJyLxIhlz1VbyNO7b+PF 7eV/SgXy5uzzNA6D4ICHHUaIqqb9BhoMiRmPfbcjcMoOssLFRIsOU/2oHhTKzdObLLyDsLQS/ neFjRFVO4A8X7rElAYDXHVKyLJXrUVhY00ua8zOhfoNXbUtqsQx6D2TJQkO3ZuVafRiNX8QNV dYtjWZvEfl+ypL+n3JrP4UC8abtuEo6QhOdOf6Hyb4nolsOOMxC56NceMxUlM1rW+d4iV1aJ9 EcujXpYqA7yQKWyJz3W8AnnX34RljDpHaRayLRgoZ1D8XzZ2cF41L80Cb9rJvV3x5NQ9aZjRD Wgay9ggoPDJvCc2sJo+eu3UZ3GeBf4wrDRZuNTAGfAqt7gfxaAEkjxhsp9/cMkryOmYEPB9CB Ma0OaMzQ+Q6KfMnFgV+ZjnzOBEDuRZ5IzQdmrAA5/ov6ht874FFy7ChN+MHOSuAtsVYtfvpS0 FDsQ54PT8xM3OXVDQCS0bsO8UY1QQXGhEnTnFyfSjK2ujpZIq0bMSIHEdgw7orJDDkF+6X35P weVBlpIktg5JOdI3StNWI+8VFdQGJ356HJksjBWi+QEE/pyeQsLo/2+i2D7MamP7HdUOCsHFm +mLCvxBucNW6ZhgdTxMfhC/qJffsyY1nFpr8UQE5BT/Fz8u1LOrE+/WlD+YX9Pw3iVcM4qqvg REkKZXjVZfZl1qzChYj/l2YuLOdf/siiEl0+bxI7Ty9PppfBHYP7GMFreIGv7mgA7zYnaEwMG oVUb2QHhcxaixYnAyrutmnwEEtXAlgosxC3eN+zsuE6UxxBNYnIxhiKRL+QSjX5kwuBuZjyYe 0+q5qRuibDBe1FjmBrEhP1LbUGJ+REA8Eki8Eg56KS4KUoyiALAG2m+6LWSxY8D0obEEMvIuf nIld3furAYt4BRV5gs8gte8/NDhAYGJ+aNa/WtmrCuR7adpwwPkLgIlIgfWFRmtqPwhDqcVH5 c7BXvoUViGU5t9MzTHiEJmd0df5izbo6CbhJ0vGNR/inm9997g2XtoLO1hS+9D3sRxKALUbsE rNVgtNhaCvTDrwwga/T1fNYS/zc75NYcNLt3Y1WOWxgNUR+BC80aSyRvdPqwBfg9MwUURRqA8 rtcKOc9jTzTlS5TZw6zbzfhCBRVOFoeRsXK6n5+LYgLvxbOd2N9kEMpIFqJovMPfXhCaA6eO1 CubMhOBxYWYsPZVIWNAwPiKky8p2XZ1ZTjnrh3ED83C/kiiOpQAUTessO+9eXoKK4BDPQMgdp G7aMwLvcBRjt5knsktt94OAf8EwMmIMXz41+kyAUXYff4g9q9gfCOMRar7oVmfndNMo5dh0G2 quhJlz+BfHLS8NLsq9eJ/47tf7kDzrWJWFCcw4PmbcLu8qWSQfCXUDvjLqT/lgTI9q1P0HBUh 7WJBi908nia6HZv9pZyHzsAnnKyG98v31Bh9D1Oz/EY8NuPpsidTszQRPyoLKFLkFykImSlq/ NtYxf+e/+Aa8CQQcEHqyhGPLMwxq4h7xvajZNJi+NPc9JpRuUtkkglb73/PMMPIxPe7iN6oVo SBe8Y3MCfF2YDEOU1SIHhghH6UsrYQWMTeRd4R3RoeH/GU2RKX55m78ZcAMLnkOeIKAtut1u7 ExL4mquJQnjw49Cv9Jai53XYNg685dCxtgX0tZifISPevPa2ohW2r7DuNRdWWFOFfHgny3c3K mszkqhfxQ0UvP82Fgl9EJhbrtqiLTKf6g5+b5XeWVPnvgT0LVjtffbc4/R9iMulut2qVZvRJL s8ilwA29glrFTYlrNRn53/3TASm+iwk2wX2mN0sO91DUw9PrZ7sZWLpZL0M/yCOcWX9sM4N6+ oDX4RLksqECM4wj24a+oqPoPrAVzDKoYDJmUMRJYC5kGF4uIoPRZghO0ChR0tImHitlbJfmu1 ZONSSkH16F3fKN/M+bUHKNZ6BpGQLcYUXpKYsRcolRp+6rzFZWmmdgkrUd7hkKyns/8MsQ8tO j01cfpwFOWVP/qhaN9A2KfNOrks6GrFDgskq1Jl2qyGQU4h+kdwyLuDZ0fU6sBWI5nqfI1dUA jj3Q4S+AxMZnuHwQ7GhcAUcgGYEI0j3vcyU+8iXNNTwe+Pr X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 79724 Cc: 79724 <at> debbugs.gnu.org, Stefan Monnier <monnier@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.7 (-) On Thu, 30 Oct 2025 11:34:18 +0200 Eli Zaretskii <eliz@HIDDEN> wrote: > From: eliz@HIDDEN > --text follows this line-- > As the subject says, how can a user easily search for raw bytes in a > buffer? Or how can a Lisp program quickly scan a buffer to find raw > bytes and either remove or replace them? > > To reproduce, start "emacs -Q" then insert a raw byte by typing > > C-x 8 RET 3fffe0 RET > > Then try to come up with a regexp that finds only the raw byte. > > This is important when one has a buffer which could include raw bytes, > and wants to json-serialize it, in which case there's a need to remove > raw bytes or replace them with something that will avoid signaling an > error from the serialization code. > > The only way I found is to examine the buffer one character at a time > using charset-after. But this is tedious and inefficient. > > I seem to be unable to find a way to express this with regexps. The > na=C3=AFve way would be "[\u3fff00-\u3fffff]", but that doesn't work (it > finds ASCII letters and nothing else). Nothing else I tried worked, > including the recipe from the ELisp manual: > > 4. If the end points of a range are raw 8-bit bytes (*note Text > Representations::), or if the range start is ASCII and the end > is a raw byte (as in =E2=80=98[a-\377]=E2=80=99), the range wil= l match only > ASCII characters and raw 8-bit bytes, but not non-ASCII > characters. This feature is intended for searching text in > unibyte buffers and strings. > > In a buffer that includes only ASCII characters and a raw byte, typing > "C-M-s [a-\377]" signals an error "Failing regexp search. > > Is there solution for this job that I'm missing? This seems to work, at least for your examples (also if I add them to the HELLO buffer): C-M-s [^[:ascii:][:print:]] Steve Berman
bug-gnu-emacs@HIDDEN:bug#79724; Package emacs.
Full text available.Received: (at 79724) by debbugs.gnu.org; 30 Oct 2025 10:22:54 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Oct 30 06:22:54 2025 Received: from localhost ([127.0.0.1]:34419 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1vEPnx-0003OB-Dn for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 06:22:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41086) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1vEPns-0003Ne-O7 for 79724 <at> debbugs.gnu.org; Thu, 30 Oct 2025 06:22:50 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1vEPnj-0004fP-Jb; Thu, 30 Oct 2025 06:22:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=UUAvNwgNoZb/hoEg2gQwcp0GwagS9dogjENASBTukME=; b=V945RDbEDPApDDgh3qVd H8E+COPZzaYct8Enjc6Dgf1OlUeur+hZ/oLkIIluj4pMGT+MPptbBig4DWuEbBYR0vQzE05ZnAXPN Gx08ftDrHTvaYoP9A6GbSTLsZ0aHpmbwgG2xDu0KjgV/jppOCCoAGLtEztQtWbKuDKq0ErC1YACW+ 4ZELyh3Oux3fg580mzH9s29oR21PqCtKDVNj7J9/W8v+MZ2mCEfUj62bEhIK+8tpsNgz9V4vcEyU3 Tqo8YCaDVLkNK4LVYZT9hQWOwLz5BGXq7BcazOvEqcFzPv2NbEJYqV/1lIAus2g0xAIhKRv8M5zkE wNcZqyG7MhjbDA==; Date: Thu, 30 Oct 2025 12:22:34 +0200 Message-Id: <86frb0n0yd.fsf@HIDDEN> From: Eli Zaretskii <eliz@HIDDEN> To: 79724 <at> debbugs.gnu.org In-Reply-To: <86ikfwn36t.fsf@HIDDEN> (message from Eli Zaretskii on Thu, 30 Oct 2025 11:34:18 +0200) Subject: Re: bug#79724: 31.0.50; No easy way of searching a buffer for raw bytes References: <86ikfwn36t.fsf@HIDDEN> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 79724 Cc: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattiase@HIDDEN>, monnier@HIDDEN X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) Let me add Mattias as well to the discussion, since he made some of the changes in this area. > Cc: Stefan Monnier <monnier@HIDDEN> > Date: Thu, 30 Oct 2025 11:34:18 +0200 > From: Eli Zaretskii <eliz@HIDDEN> > > As the subject says, how can a user easily search for raw bytes in a > buffer? Or how can a Lisp program quickly scan a buffer to find raw > bytes and either remove or replace them? > > To reproduce, start "emacs -Q" then insert a raw byte by typing > > C-x 8 RET 3fffe0 RET > > Then try to come up with a regexp that finds only the raw byte. > > This is important when one has a buffer which could include raw bytes, > and wants to json-serialize it, in which case there's a need to remove > raw bytes or replace them with something that will avoid signaling an > error from the serialization code. > > The only way I found is to examine the buffer one character at a time > using charset-after. But this is tedious and inefficient. > > I seem to be unable to find a way to express this with regexps. The > naïve way would be "[\u3fff00-\u3fffff]", but that doesn't work (it > finds ASCII letters and nothing else). Nothing else I tried worked, > including the recipe from the ELisp manual: > > 4. If the end points of a range are raw 8-bit bytes (*note Text > Representations::), or if the range start is ASCII and the end > is a raw byte (as in ‘[a-\377]’), the range will match only > ASCII characters and raw 8-bit bytes, but not non-ASCII > characters. This feature is intended for searching text in > unibyte buffers and strings. > > In a buffer that includes only ASCII characters and a raw byte, typing > "C-M-s [a-\377]" signals an error "Failing regexp search. > > Is there solution for this job that I'm missing? If so, we should at > least document it. If there's no solution currently, I think we > should add something to make it easier. > > > In GNU Emacs 31.0.50 (build 1458, i686-pc-mingw32) of 2025-10-30 built > on ELIZ-PC > Repository revision: 06b3f11cb8f040d192a91972b40eab8c85a2cc5b > Repository branch: master > Windowing system distributor 'Microsoft Corp.', version 10.0.26100 > System Description: Microsoft Windows 10 Enterprise (v10.0.2009.26100.6899) > > Configured using: > 'configure -C --prefix=/d/usr --with-wide-int > --without-native-compilation --enable-checking=yes,glyphs 'CFLAGS=-O0 > -gdwarf-4 -g3'' > > Configured features: > ACL GIF GMP GNUTLS HARFBUZZ JPEG LCMS2 LIBXML2 MODULES NOTIFY W32NOTIFY > PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS > TREE_SITTER WEBP XPM ZLIB > > Important settings: > value of $LANG: ENG > locale-coding-system: cp1252 > > Major mode: Lisp Interaction > > Minor modes in effect: > tooltip-mode: t > global-eldoc-mode: t > eldoc-mode: t > show-paren-mode: t > electric-indent-mode: t > mouse-wheel-mode: t > tool-bar-mode: t > menu-bar-mode: t > file-name-shadow-mode: t > global-font-lock-mode: t > font-lock-mode: t > blink-cursor-mode: t > minibuffer-nonselected-mode: t > minibuffer-regexp-mode: t > line-number-mode: t > indent-tabs-mode: t > transient-mark-mode: t > auto-composition-mode: t > auto-encryption-mode: t > auto-compression-mode: t > > Load-path shadows: > None found. > > Features: > (shadow sort mail-extr emacsbug lisp-mnt message mailcap yank-media puny > dired dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg > rfc6068 epg-config gnus-util text-property-search time-date subr-x > mm-decode mm-bodies mm-encode mailabbrev gmm-utils mailheader sendmail > mail-parse rfc2231 rfc2047 rfc2045 ietf-drums mm-util mail-prsvr > mail-utils warnings icons cl-loaddefs cl-lib rmc iso-transl tooltip > cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type > elisp-mode mwheel touch-screen dos-w32 ls-lisp term/w32-nt disp-table > term/w32-win w32-win w32-vars term/common-win tool-bar dnd fontset image > regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode > prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu > timer select scroll-bar mouse jit-lock font-lock syntax font-core > term/tty-colors frame minibuffer nadvice seq simple cl-generic > indonesian philippine cham georgian utf-8-lang misc-lang vietnamese > tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek > romanian slovak czech european ethiopic indian cyrillic chinese > composite emoji-zwj charscript charprop case-table epa-hook > jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs > theme-loaddefs faces cus-face macroexp files window text-properties > overlay sha1 md5 base64 format env code-pages mule custom widget keymap > hashtable-print-readable backquote threads w32notify w32 lcms2 multi-tty > move-toolbar make-network-process tty-child-frames emacs) > > Memory information: > ((conses 16 46888 16793) (symbols 48 6655 0) (strings 16 16703 2197) > (string-bytes 1 346778) (vectors 16 9844) > (vector-slots 8 115806 11013) (floats 8 23 6) (intervals 40 310 75) > (buffers 928 10)) > > > >
bug-gnu-emacs@HIDDEN:bug#79724; Package emacs.
Full text available.
Received: (at submit) by debbugs.gnu.org; 30 Oct 2025 09:34:59 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Oct 30 05:34:59 2025
Received: from localhost ([127.0.0.1]:34182 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1vEP3a-0000ml-Mu
for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 05:34:59 -0400
Received: from lists.gnu.org ([2001:470:142::17]:57270)
by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1vEP3S-0000mO-HV
for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 05:34:51 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1vEP3F-0005t1-8e
for bug-gnu-emacs@HIDDEN; Thu, 30 Oct 2025 05:34:39 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1vEP3C-0006XD-NS
for bug-gnu-emacs@HIDDEN; Thu, 30 Oct 2025 05:34:36 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
s=fencepost-gnu-org; h=MIME-version:Subject:To:From:Date:in-reply-to:
references; bh=CgOFuZW3oeoHVmJ1rRGxUhM0BFFOhNC61MaiFdyLYq8=; b=UbD/vhANXHrJH0
tsrMkvOv3jJPhjp/VFHqZXuRLIU2c22cD0jAj4kX5MmOiYlBnFB0U3trZ2165/p64h3Crrv9LoOmB
5iF1x7dEgBmDnsL8Xaw6ke1LkaZWh98Sypt9bgorUKhilfbW+Jn2PqhXPDHV5HSFD0DghpYRkDlRg
YxdL7EETlq2IRKIU2uiNOGUDoIZP7hYjWpyRITuYt8OvLJPazZh3XsgALtZAwbqPxa0ATFnJNO6po
5OPmAOJn8ArrEiP5dBvPyp92PVZhbE76f5YGHerJ5oPE+t+OhE/liQUYZdOIzzsBntWYzAl796hCP
O6B4rPpXs2r2pQbt//3w==;
Date: Thu, 30 Oct 2025 11:34:18 +0200
Message-Id: <86ikfwn36t.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: bug-gnu-emacs@HIDDEN
Subject: 31.0.50; No easy way of searching a buffer for raw bytes
X-Debbugs-Cc: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattiase@HIDDEN>
X-Debbugs-Cc: Stefan Monnier <monnier@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -0.0 (/)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)
From: eliz@HIDDEN
--text follows this line--
As the subject says, how can a user easily search for raw bytes in a
buffer? Or how can a Lisp program quickly scan a buffer to find raw
bytes and either remove or replace them?
To reproduce, start "emacs -Q" then insert a raw byte by typing
C-x 8 RET 3fffe0 RET
Then try to come up with a regexp that finds only the raw byte.
This is important when one has a buffer which could include raw bytes,
and wants to json-serialize it, in which case there's a need to remove
raw bytes or replace them with something that will avoid signaling an
error from the serialization code.
The only way I found is to examine the buffer one character at a time
using charset-after. But this is tedious and inefficient.
I seem to be unable to find a way to express this with regexps. The
naïve way would be "[\u3fff00-\u3fffff]", but that doesn't work (it
finds ASCII letters and nothing else). Nothing else I tried worked,
including the recipe from the ELisp manual:
4. If the end points of a range are raw 8-bit bytes (*note Text
Representations::), or if the range start is ASCII and the end
is a raw byte (as in ‘[a-\377]’), the range will match only
ASCII characters and raw 8-bit bytes, but not non-ASCII
characters. This feature is intended for searching text in
unibyte buffers and strings.
In a buffer that includes only ASCII characters and a raw byte, typing
"C-M-s [a-\377]" signals an error "Failing regexp search.
Is there solution for this job that I'm missing? If so, we should at
least document it. If there's no solution currently, I think we
should add something to make it easier.
In GNU Emacs 31.0.50 (build 1458, i686-pc-mingw32) of 2025-10-30 built
on ELIZ-PC
Repository revision: 06b3f11cb8f040d192a91972b40eab8c85a2cc5b
Repository branch: master
Windowing system distributor 'Microsoft Corp.', version 10.0.26100
System Description: Microsoft Windows 10 Enterprise (v10.0.2009.26100.6899)
Configured using:
'configure -C --prefix=/d/usr --with-wide-int
--without-native-compilation --enable-checking=yes,glyphs 'CFLAGS=-O0
-gdwarf-4 -g3''
Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG LCMS2 LIBXML2 MODULES NOTIFY W32NOTIFY
PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS
TREE_SITTER WEBP XPM ZLIB
Important settings:
value of $LANG: ENG
locale-coding-system: cp1252
Major mode: Lisp Interaction
Minor modes in effect:
tooltip-mode: t
global-eldoc-mode: t
eldoc-mode: t
show-paren-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
minibuffer-nonselected-mode: t
minibuffer-regexp-mode: t
line-number-mode: t
indent-tabs-mode: t
transient-mark-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
Load-path shadows:
None found.
Features:
(shadow sort mail-extr emacsbug lisp-mnt message mailcap yank-media puny
dired dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg
rfc6068 epg-config gnus-util text-property-search time-date subr-x
mm-decode mm-bodies mm-encode mailabbrev gmm-utils mailheader sendmail
mail-parse rfc2231 rfc2047 rfc2045 ietf-drums mm-util mail-prsvr
mail-utils warnings icons cl-loaddefs cl-lib rmc iso-transl tooltip
cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type
elisp-mode mwheel touch-screen dos-w32 ls-lisp term/w32-nt disp-table
term/w32-win w32-win w32-vars term/common-win tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu
timer select scroll-bar mouse jit-lock font-lock syntax font-core
term/tty-colors frame minibuffer nadvice seq simple cl-generic
indonesian philippine cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
composite emoji-zwj charscript charprop case-table epa-hook
jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs
theme-loaddefs faces cus-face macroexp files window text-properties
overlay sha1 md5 base64 format env code-pages mule custom widget keymap
hashtable-print-readable backquote threads w32notify w32 lcms2 multi-tty
move-toolbar make-network-process tty-child-frames emacs)
Memory information:
((conses 16 46888 16793) (symbols 48 6655 0) (strings 16 16703 2197)
(string-bytes 1 346778) (vectors 16 9844)
(vector-slots 8 115806 11013) (floats 8 23 6) (intervals 40 310 75)
(buffers 928 10))
Eli Zaretskii <eliz@HIDDEN>:monnier@HIDDEN, bug-gnu-emacs@HIDDEN.
Full text available.monnier@HIDDEN, bug-gnu-emacs@HIDDEN:bug#79724; Package emacs.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.