X-Loop: help-debbugs@HIDDEN
Subject: bug#79724: 31.0.50; No easy way of searching a buffer for raw bytes
Resent-From: Eli Zaretskii <eliz@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: monnier@HIDDEN, bug-gnu-emacs@HIDDEN
Resent-Date: Thu, 30 Oct 2025 09:35:01 +0000
Resent-Message-ID: <handler.79724.B.17618168993028 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 79724
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords:
To: 79724 <at> debbugs.gnu.org
Cc: Stefan Monnier <monnier@HIDDEN>
X-Debbugs-Original-To: bug-gnu-emacs@HIDDEN
X-Debbugs-Original-Xcc: Stefan Monnier <monnier@HIDDEN>
Received: via spool by submit <at> debbugs.gnu.org id=B.17618168993028
(code B ref -1); Thu, 30 Oct 2025 09:35:01 +0000
Received: (at submit) by debbugs.gnu.org; 30 Oct 2025 09:34:59 +0000
Received: from localhost ([127.0.0.1]:34182 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1vEP3a-0000ml-Mu
for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 05:34:59 -0400
Received: from lists.gnu.org ([2001:470:142::17]:57270)
by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1vEP3S-0000mO-HV
for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 05:34:51 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1vEP3F-0005t1-8e
for bug-gnu-emacs@HIDDEN; Thu, 30 Oct 2025 05:34:39 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1vEP3C-0006XD-NS
for bug-gnu-emacs@HIDDEN; Thu, 30 Oct 2025 05:34:36 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
s=fencepost-gnu-org; h=MIME-version:Subject:To:From:Date:in-reply-to:
references; bh=CgOFuZW3oeoHVmJ1rRGxUhM0BFFOhNC61MaiFdyLYq8=; b=UbD/vhANXHrJH0
tsrMkvOv3jJPhjp/VFHqZXuRLIU2c22cD0jAj4kX5MmOiYlBnFB0U3trZ2165/p64h3Crrv9LoOmB
5iF1x7dEgBmDnsL8Xaw6ke1LkaZWh98Sypt9bgorUKhilfbW+Jn2PqhXPDHV5HSFD0DghpYRkDlRg
YxdL7EETlq2IRKIU2uiNOGUDoIZP7hYjWpyRITuYt8OvLJPazZh3XsgALtZAwbqPxa0ATFnJNO6po
5OPmAOJn8ArrEiP5dBvPyp92PVZhbE76f5YGHerJ5oPE+t+OhE/liQUYZdOIzzsBntWYzAl796hCP
O6B4rPpXs2r2pQbt//3w==;
Date: Thu, 30 Oct 2025 11:34:18 +0200
Message-Id: <86ikfwn36t.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)
From: eliz@HIDDEN
--text follows this line--
As the subject says, how can a user easily search for raw bytes in a
buffer? Or how can a Lisp program quickly scan a buffer to find raw
bytes and either remove or replace them?
To reproduce, start "emacs -Q" then insert a raw byte by typing
C-x 8 RET 3fffe0 RET
Then try to come up with a regexp that finds only the raw byte.
This is important when one has a buffer which could include raw bytes,
and wants to json-serialize it, in which case there's a need to remove
raw bytes or replace them with something that will avoid signaling an
error from the serialization code.
The only way I found is to examine the buffer one character at a time
using charset-after. But this is tedious and inefficient.
I seem to be unable to find a way to express this with regexps. The
naïve way would be "[\u3fff00-\u3fffff]", but that doesn't work (it
finds ASCII letters and nothing else). Nothing else I tried worked,
including the recipe from the ELisp manual:
4. If the end points of a range are raw 8-bit bytes (*note Text
Representations::), or if the range start is ASCII and the end
is a raw byte (as in ‘[a-\377]’), the range will match only
ASCII characters and raw 8-bit bytes, but not non-ASCII
characters. This feature is intended for searching text in
unibyte buffers and strings.
In a buffer that includes only ASCII characters and a raw byte, typing
"C-M-s [a-\377]" signals an error "Failing regexp search.
Is there solution for this job that I'm missing? If so, we should at
least document it. If there's no solution currently, I think we
should add something to make it easier.
In GNU Emacs 31.0.50 (build 1458, i686-pc-mingw32) of 2025-10-30 built
on ELIZ-PC
Repository revision: 06b3f11cb8f040d192a91972b40eab8c85a2cc5b
Repository branch: master
Windowing system distributor 'Microsoft Corp.', version 10.0.26100
System Description: Microsoft Windows 10 Enterprise (v10.0.2009.26100.6899)
Configured using:
'configure -C --prefix=/d/usr --with-wide-int
--without-native-compilation --enable-checking=yes,glyphs 'CFLAGS=-O0
-gdwarf-4 -g3''
Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG LCMS2 LIBXML2 MODULES NOTIFY W32NOTIFY
PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS
TREE_SITTER WEBP XPM ZLIB
Important settings:
value of $LANG: ENG
locale-coding-system: cp1252
Major mode: Lisp Interaction
Minor modes in effect:
tooltip-mode: t
global-eldoc-mode: t
eldoc-mode: t
show-paren-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
minibuffer-nonselected-mode: t
minibuffer-regexp-mode: t
line-number-mode: t
indent-tabs-mode: t
transient-mark-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
Load-path shadows:
None found.
Features:
(shadow sort mail-extr emacsbug lisp-mnt message mailcap yank-media puny
dired dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg
rfc6068 epg-config gnus-util text-property-search time-date subr-x
mm-decode mm-bodies mm-encode mailabbrev gmm-utils mailheader sendmail
mail-parse rfc2231 rfc2047 rfc2045 ietf-drums mm-util mail-prsvr
mail-utils warnings icons cl-loaddefs cl-lib rmc iso-transl tooltip
cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type
elisp-mode mwheel touch-screen dos-w32 ls-lisp term/w32-nt disp-table
term/w32-win w32-win w32-vars term/common-win tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu
timer select scroll-bar mouse jit-lock font-lock syntax font-core
term/tty-colors frame minibuffer nadvice seq simple cl-generic
indonesian philippine cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
composite emoji-zwj charscript charprop case-table epa-hook
jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs
theme-loaddefs faces cus-face macroexp files window text-properties
overlay sha1 md5 base64 format env code-pages mule custom widget keymap
hashtable-print-readable backquote threads w32notify w32 lcms2 multi-tty
move-toolbar make-network-process tty-child-frames emacs)
Memory information:
((conses 16 46888 16793) (symbols 48 6655 0) (strings 16 16703 2197)
(string-bytes 1 346778) (vectors 16 9844)
(vector-slots 8 115806 11013) (floats 8 23 6) (intervals 40 310 75)
(buffers 928 10))
Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) Content-Type: text/plain; charset=utf-8 X-Loop: help-debbugs@HIDDEN From: help-debbugs@HIDDEN (GNU bug Tracking System) To: Eli Zaretskii <eliz@HIDDEN> Subject: bug#79724: Acknowledgement (31.0.50; No easy way of searching a buffer for raw bytes) Message-ID: <handler.79724.B.17618168993028.ack <at> debbugs.gnu.org> References: <86ikfwn36t.fsf@HIDDEN> X-Gnu-PR-Message: ack 79724 X-Gnu-PR-Package: emacs Reply-To: 79724 <at> debbugs.gnu.org Date: Thu, 30 Oct 2025 09:35:02 +0000 Thank you for filing a new bug report with debbugs.gnu.org. This is an automatically generated reply to let you know your message has been received. Your message is being forwarded to the package maintainers and other interested parties for their attention; they will reply in due course. As you requested using X-Debbugs-CC, your message was also forwarded to Stefan Monnier <monnier@HIDDEN> (after having been given a bug report number, if it did not have one). Your message has been sent to the package maintainer(s): bug-gnu-emacs@HIDDEN If you wish to submit further information on this problem, please send it to 79724 <at> debbugs.gnu.org. Please do not send mail to help-debbugs@HIDDEN unless you wish to report a problem with the Bug-tracking system. --=20 79724: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D79724 GNU Bug Tracking System Contact help-debbugs@HIDDEN with problems
X-Loop: help-debbugs@HIDDEN
Subject: bug#79724: 31.0.50; No easy way of searching a buffer for raw bytes
Resent-From: Eli Zaretskii <eliz@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@HIDDEN
Resent-Date: Thu, 30 Oct 2025 10:23:02 +0000
Resent-Message-ID: <handler.79724.B79724.176181977413036 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 79724
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords:
To: 79724 <at> debbugs.gnu.org
Cc: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= <mattiase@HIDDEN>, monnier@HIDDEN
Received: via spool by 79724-submit <at> debbugs.gnu.org id=B79724.176181977413036
(code B ref 79724); Thu, 30 Oct 2025 10:23:02 +0000
Received: (at 79724) by debbugs.gnu.org; 30 Oct 2025 10:22:54 +0000
Received: from localhost ([127.0.0.1]:34419 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1vEPnx-0003OB-Dn
for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 06:22:53 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:41086)
by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1vEPns-0003Ne-O7
for 79724 <at> debbugs.gnu.org; Thu, 30 Oct 2025 06:22:50 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
id 1vEPnj-0004fP-Jb; Thu, 30 Oct 2025 06:22:39 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
Date; bh=UUAvNwgNoZb/hoEg2gQwcp0GwagS9dogjENASBTukME=; b=V945RDbEDPApDDgh3qVd
H8E+COPZzaYct8Enjc6Dgf1OlUeur+hZ/oLkIIluj4pMGT+MPptbBig4DWuEbBYR0vQzE05ZnAXPN
Gx08ftDrHTvaYoP9A6GbSTLsZ0aHpmbwgG2xDu0KjgV/jppOCCoAGLtEztQtWbKuDKq0ErC1YACW+
4ZELyh3Oux3fg580mzH9s29oR21PqCtKDVNj7J9/W8v+MZ2mCEfUj62bEhIK+8tpsNgz9V4vcEyU3
Tqo8YCaDVLkNK4LVYZT9hQWOwLz5BGXq7BcazOvEqcFzPv2NbEJYqV/1lIAus2g0xAIhKRv8M5zkE
wNcZqyG7MhjbDA==;
Date: Thu, 30 Oct 2025 12:22:34 +0200
Message-Id: <86frb0n0yd.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
In-Reply-To: <86ikfwn36t.fsf@HIDDEN> (message from Eli Zaretskii on Thu, 30
Oct 2025 11:34:18 +0200)
References: <86ikfwn36t.fsf@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)
Let me add Mattias as well to the discussion, since he made some of
the changes in this area.
> Cc: Stefan Monnier <monnier@HIDDEN>
> Date: Thu, 30 Oct 2025 11:34:18 +0200
> From: Eli Zaretskii <eliz@HIDDEN>
>
> As the subject says, how can a user easily search for raw bytes in a
> buffer? Or how can a Lisp program quickly scan a buffer to find raw
> bytes and either remove or replace them?
>
> To reproduce, start "emacs -Q" then insert a raw byte by typing
>
> C-x 8 RET 3fffe0 RET
>
> Then try to come up with a regexp that finds only the raw byte.
>
> This is important when one has a buffer which could include raw bytes,
> and wants to json-serialize it, in which case there's a need to remove
> raw bytes or replace them with something that will avoid signaling an
> error from the serialization code.
>
> The only way I found is to examine the buffer one character at a time
> using charset-after. But this is tedious and inefficient.
>
> I seem to be unable to find a way to express this with regexps. The
> naïve way would be "[\u3fff00-\u3fffff]", but that doesn't work (it
> finds ASCII letters and nothing else). Nothing else I tried worked,
> including the recipe from the ELisp manual:
>
> 4. If the end points of a range are raw 8-bit bytes (*note Text
> Representations::), or if the range start is ASCII and the end
> is a raw byte (as in ‘[a-\377]’), the range will match only
> ASCII characters and raw 8-bit bytes, but not non-ASCII
> characters. This feature is intended for searching text in
> unibyte buffers and strings.
>
> In a buffer that includes only ASCII characters and a raw byte, typing
> "C-M-s [a-\377]" signals an error "Failing regexp search.
>
> Is there solution for this job that I'm missing? If so, we should at
> least document it. If there's no solution currently, I think we
> should add something to make it easier.
>
>
> In GNU Emacs 31.0.50 (build 1458, i686-pc-mingw32) of 2025-10-30 built
> on ELIZ-PC
> Repository revision: 06b3f11cb8f040d192a91972b40eab8c85a2cc5b
> Repository branch: master
> Windowing system distributor 'Microsoft Corp.', version 10.0.26100
> System Description: Microsoft Windows 10 Enterprise (v10.0.2009.26100.6899)
>
> Configured using:
> 'configure -C --prefix=/d/usr --with-wide-int
> --without-native-compilation --enable-checking=yes,glyphs 'CFLAGS=-O0
> -gdwarf-4 -g3''
>
> Configured features:
> ACL GIF GMP GNUTLS HARFBUZZ JPEG LCMS2 LIBXML2 MODULES NOTIFY W32NOTIFY
> PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS
> TREE_SITTER WEBP XPM ZLIB
>
> Important settings:
> value of $LANG: ENG
> locale-coding-system: cp1252
>
> Major mode: Lisp Interaction
>
> Minor modes in effect:
> tooltip-mode: t
> global-eldoc-mode: t
> eldoc-mode: t
> show-paren-mode: t
> electric-indent-mode: t
> mouse-wheel-mode: t
> tool-bar-mode: t
> menu-bar-mode: t
> file-name-shadow-mode: t
> global-font-lock-mode: t
> font-lock-mode: t
> blink-cursor-mode: t
> minibuffer-nonselected-mode: t
> minibuffer-regexp-mode: t
> line-number-mode: t
> indent-tabs-mode: t
> transient-mark-mode: t
> auto-composition-mode: t
> auto-encryption-mode: t
> auto-compression-mode: t
>
> Load-path shadows:
> None found.
>
> Features:
> (shadow sort mail-extr emacsbug lisp-mnt message mailcap yank-media puny
> dired dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg
> rfc6068 epg-config gnus-util text-property-search time-date subr-x
> mm-decode mm-bodies mm-encode mailabbrev gmm-utils mailheader sendmail
> mail-parse rfc2231 rfc2047 rfc2045 ietf-drums mm-util mail-prsvr
> mail-utils warnings icons cl-loaddefs cl-lib rmc iso-transl tooltip
> cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type
> elisp-mode mwheel touch-screen dos-w32 ls-lisp term/w32-nt disp-table
> term/w32-win w32-win w32-vars term/common-win tool-bar dnd fontset image
> regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode
> prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu
> timer select scroll-bar mouse jit-lock font-lock syntax font-core
> term/tty-colors frame minibuffer nadvice seq simple cl-generic
> indonesian philippine cham georgian utf-8-lang misc-lang vietnamese
> tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek
> romanian slovak czech european ethiopic indian cyrillic chinese
> composite emoji-zwj charscript charprop case-table epa-hook
> jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs
> theme-loaddefs faces cus-face macroexp files window text-properties
> overlay sha1 md5 base64 format env code-pages mule custom widget keymap
> hashtable-print-readable backquote threads w32notify w32 lcms2 multi-tty
> move-toolbar make-network-process tty-child-frames emacs)
>
> Memory information:
> ((conses 16 46888 16793) (symbols 48 6655 0) (strings 16 16703 2197)
> (string-bytes 1 346778) (vectors 16 9844)
> (vector-slots 8 115806 11013) (floats 8 23 6) (intervals 40 310 75)
> (buffers 928 10))
>
>
>
>
X-Loop: help-debbugs@HIDDEN
Subject: bug#79724: 31.0.50; No easy way of searching a buffer for raw bytes
Resent-From: Stephen Berman <stephen.berman@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@HIDDEN
Resent-Date: Thu, 30 Oct 2025 10:29:02 +0000
Resent-Message-ID: <handler.79724.B79724.176182012614769 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 79724
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords:
To: Eli Zaretskii <eliz@HIDDEN>
Cc: 79724 <at> debbugs.gnu.org, Stefan Monnier <monnier@HIDDEN>
Received: via spool by 79724-submit <at> debbugs.gnu.org id=B79724.176182012614769
(code B ref 79724); Thu, 30 Oct 2025 10:29:02 +0000
Received: (at 79724) by debbugs.gnu.org; 30 Oct 2025 10:28:46 +0000
Received: from localhost ([127.0.0.1]:34462 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1vEPte-0003q8-BP
for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 06:28:46 -0400
Received: from mout.gmx.net ([212.227.15.19]:53087)
by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.84_2) (envelope-from <stephen.berman@HIDDEN>)
id 1vEPtY-0003p5-0k
for 79724 <at> debbugs.gnu.org; Thu, 30 Oct 2025 06:28:41 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmx.net;
s=s31663417; t=1761820112; x=1762424912; i=stephen.berman@HIDDEN;
bh=BjdiFbW4GjxJ0hkf/LzUUquUmNHdf/GVqVghPPcioOA=;
h=X-UI-Sender-Class:From:To:Cc:Subject:In-Reply-To:References:Date:
Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding:cc:
content-transfer-encoding:content-type:date:from:message-id:
mime-version:reply-to:subject:to;
b=mda/4TMqGOLshsyfyU9bpVEG9dCaBYZ5LrgGQ2D+7ViPJz5GIsIsCL3CjLSbl0wE
Laz5raWnEUumRK8L4AI+rpZLcAjKC6sxpfQ9Auw427jvi+660FjFvyjMdhLARDbzS
C771ZbLmaRqq/JTujmOr9O9WoMoUWt4BwG19uOujvS5XKxUCVvgPfqM+w2Gqw04KR
2J6qqNehfHcWeKH3qmhkTH0RxtAPwCTHrhPNiQQ09741o73MdiblkiNEasCP2OPW3
l6gTB6E2mMu2M6ww7xjENZj6zJNZIXnL1PCj9J1LhlWiOIUo3S0gT2cYbEw2gM87/
z/u0aEU+mrbnBmd6Zw==
X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a
Received: from strobelfssd ([88.130.62.64]) by mail.gmx.net (mrgmx005
[212.227.17.190]) with ESMTPSA (Nemesis) id 1M2wL0-1vHhgS1vLQ-008O59; Thu, 30
Oct 2025 11:28:32 +0100
From: Stephen Berman <stephen.berman@HIDDEN>
In-Reply-To: <86ikfwn36t.fsf@HIDDEN>
References: <86ikfwn36t.fsf@HIDDEN>
Date: Thu, 30 Oct 2025 11:28:30 +0100
Message-ID: <875xbwof8x.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:KCycR3oiTJqOLOGGZ+deM/F4+oQxoYnbfArfc0D5ipEgWfHv5BF
CAvNQNGk9SrKLMbfiSL2PnQCwVEAuXqKOVihGPl05zIPGXRUjzy+bl2uQbw+/EkjaEQteRs
rNkREucKKqp1i4NFg0WmFJqiiUbRMFwruwZHj6VkhslmdhVzZszfISP3BoufKqbfB9hJgoY
jmKb1S7LI+gUkj/DuRvPQ==
X-Spam-Flag: NO
UI-OutboundReport: notjunk:1;M01:P0:oxW35xLHzv8=;Ep6aZswkGizCvpomJUF0l7Q9hqr
fYZRSvXFu9Pjtt5QWGEEAwPMs7UkZp8dF8tppUl3bao9c8oHhWqhobFeX61qEFi4HisAlFZLd
76yc8RmeCB0Aneq/hWSJU7OE7PG+pNja/2CPPBbp5Txs/0DuPjHayK1vVZkoKctdNNkkHQBHo
nnxQEHCTP/z+8os5cSJo/rxvTruUUrpE0BZdrNLO61rL+/78p5TndsZeDGexPHTiLHekHmn5z
kNNDeUe9MSpqu8IbeCCQvxAk6XxNsXX2sk8L/hgPaOuitOvZRwpldrZNvkEHhHOKips39UUTi
3ENKJWNDAJ0pgXRuN2lX5nJL9N93ZoHRypfWo+ixgoxWZPrjOgUbmhjAc7Xhs877Elpm+rZXX
yid717feHjJ0wRF3gKEiPDyThscJ/Gv6rBsH9CWbDH0DDElScTSzyaoF0VenWn9iGpcB+gfbg
0jAdraJ+sj6LLMqmmquUEG11mv1jvF7vwPGaZqkCSrcsmqcKgTQyeu3EiFJvCEN8h5/WYlE1m
x8cd6ujfvNUPeflaM+4f0MIgnGGZTmFGoO7UFQwdrtiX/cJrIHY2iZJ51ibPnKmSqtxFhpc9S
gu9gRCFZUyNtt7LyDVR1augs8FnEBeawY7v72KTxCj5Pmv0ltt/PYmk+nMto7s6+iFfFfO1sE
p/JG4ei1DMTw1XexvawpKAaHuFJ3IiqiM8fTZXuJDpyTwxxX/Wt0hCGuc4bLhFx8sdwmZdMo8
ekGLFE1CvC8TNsln0Ta0oHlVBi1qHBNicWCCFkKy5INDsCvEr7zpPRtdkFDvmLUI1lciPZKdA
QD0wbZrfg1EA6bUO+T5vF3yJMgBlHjQ8iX2U4DF2VnEmk184m28Em4QhplI5bJQFb0T+B0L46
O7XqASUZ40KDgmh+6TBxI3f1KJanz1z8wKRglyYBB8q0te7J2uTDelMt3DWyAnlt6zenYxC5z
rdx0M/lGtNEugJta+Zr+tMY+1f9VdZXWc8VT/GBU5y9zwRU1i5HRq8Q7Hsyo50Q++fJP5EVl1
J5thQ9cBVfcDDX0BEp21JjYvPYG5+hJCRqbEmVOn+M5cF3TZXpO0SvrgHWqsGOLMMrjCxU1ke
HJHmS2fFMoTIPRrHyVbG9CI/+5qNQ2DVGi2FvS1GuF9kzkEprL9GBTi6/+nrWHeAfODtLQ+8e
bMmBMnkUmgKpzVq7teEW71EWV28ersIk1UiI5Ycnlvy39tIqi9fkt2rkHXnWSetwz8A27SlsQ
oo0rgC2AhN7njIJjaJi24It9+Ey3jtgQX7jJRxNrjkZltH9jTUYbag8kYMUqHyTEAmv+cU2bN
a6DJmgfcx48jDT254gK6ZZzypwEr6PemjelIf3IJLHqBDebnMSOqlFvsX7Fmnn9xj6o7LJroI
VBPzUGLPy0MQ6wHmadUonDiKpk6o0SGah6NFPx1vR+7faxb2lrQccVyxr6jmq7F+tRfdu/x5d
vU9l1k1NkUdYNOGIgSSiaxwyfSWsrblcJxonb7FgGF9eDA/cFps/70jYBuBekxZvXY9cEViRq
GDZ4dukUsP44mJlrLOpJ+aCPzamKUIW7mmxma3JyLFfaAZxzxLA7tyxmfZAzMvAsCkMd8AFqR
lWjOaMu9F4M+zsqQe3rW1QbS+/vwe7P3gct7EoOsl23Tu1S+KXJ8Z75dmBTZlI7PhZ2V5FOzr
GbIlCzXjv4eu5M7gNuAc163YETHqjgVXwbc3lxeinHysiZ/ODB1S3NLqMzi2uxtG746BL/rgw
nM2JuUn7EnxX1oaLccPQrYaRpuBliAvs1ZsYB1WBmVQkGlnHgUQ8Xdg+idTr8snpLUhEC3edv
apes1W6mpifWSg0oKdBh6miWI8Ai0QeWqINIscISXpFj7pklJbmP/epPhzlnCHoHbyp/gBs5S
Ix6qvZNFuD8CA69EXSQpa+Cnx8fN4T6peiEI6yVG9GEdNM7Hc+sKf1CBiTEf2X2cvqwPNtKsf
XUgKnXlG2lij4Jimk2lHbRB6T9oiSgz8IP74EELgbW+4WV7itlfV3rq9KjhTtpw+nSq6YSqbh
8pneCUVeOvDs83vTCcfvexO17IxbGjmDWXsawyFXuPbCqVNKwnPVfuRf7ATo/mNxX4UzN4g8b
AUQBqqnWI016r/HS/0cYb2p7uwSPovW/LdGYnM2+R7+KBvAJ4wiKMrvhQRLYwDsDe28xyczI7
LcN1W6nO4asI15gOcUwKp6E2QisLFzmNd1AIuIblibqS+lTbvsaJNprVm3tGfnLhMtSBlfQ3r
awqBQXAjYEjHifY2+URjQ0VpjwwkIJKMJ0GJ2lBKdaXtkV5mNOj0Y74I6OMHyapTFH7uJwPt/
6o4g+wZE+8KTDVwjs6jzVJbsxrFxp8C1D672qCXIc3ZQXCKOKtcX4ucyB3g0mGaLcwIV6HFud
4nrINl0rmaH/yC3rphgNS0Wrn3Sfsa+7Oe1DAd2+X4dS1rFAv0mNYDIfCQYfusUTmSKgfaa49
VUB5S95B0QNsrJnzAzLomC3Ik0jwKBFJvsd0Xr9eqzSHS0gu74eJOmYpfKCBOEL8/MxWPi2x3
U/aiZRHr7tJ9CDJzKnbKbB91BrkM11C+vgA7BCmbssMgTyTXUKOfzoVyaBu+S4DiG8nJH3FQp
g3TTZHZ6iinb2C0xG8xqZoaXKPEXyxMnl8elCeRvNOzGSOY7NtPnQEH52VcG1POwHqb8ZvTbz
fJ3dg5yOlJvcjs0DEKxlD/BoOJ/JtDShvcEHVeXpJxDH+isHehU0mXwqvyYnleFUt44hFuNLk
PAEGhmHM4FRU/HZaHfC0HqsqJd6R5radjfZVwj+SBISWMNILZrIx2iLp+dsBYG43qxIL9EHRh
lVi9/0pLLZeRwnPI+mTzjTp67HLu81Tsp3ixh0cnfcGW5UFba57UIWTbm69flOgHue1CovOfn
Edl/ZaxyDJk/GEw1T9iJKHPjRrD+dqs56+H4w9Sv3yLIqCGLAjMWxyxjfCj/tSgTqc3Wwpmcd
eoHsEGC2NJvC06noE1F/95E25vTNeCfAOWR5l9poztVinOBSWqfSmcJyLxIhlz1VbyNO7b+PF
7eV/SgXy5uzzNA6D4ICHHUaIqqb9BhoMiRmPfbcjcMoOssLFRIsOU/2oHhTKzdObLLyDsLQS/
neFjRFVO4A8X7rElAYDXHVKyLJXrUVhY00ua8zOhfoNXbUtqsQx6D2TJQkO3ZuVafRiNX8QNV
dYtjWZvEfl+ypL+n3JrP4UC8abtuEo6QhOdOf6Hyb4nolsOOMxC56NceMxUlM1rW+d4iV1aJ9
EcujXpYqA7yQKWyJz3W8AnnX34RljDpHaRayLRgoZ1D8XzZ2cF41L80Cb9rJvV3x5NQ9aZjRD
Wgay9ggoPDJvCc2sJo+eu3UZ3GeBf4wrDRZuNTAGfAqt7gfxaAEkjxhsp9/cMkryOmYEPB9CB
Ma0OaMzQ+Q6KfMnFgV+ZjnzOBEDuRZ5IzQdmrAA5/ov6ht874FFy7ChN+MHOSuAtsVYtfvpS0
FDsQ54PT8xM3OXVDQCS0bsO8UY1QQXGhEnTnFyfSjK2ujpZIq0bMSIHEdgw7orJDDkF+6X35P
weVBlpIktg5JOdI3StNWI+8VFdQGJ356HJksjBWi+QEE/pyeQsLo/2+i2D7MamP7HdUOCsHFm
+mLCvxBucNW6ZhgdTxMfhC/qJffsyY1nFpr8UQE5BT/Fz8u1LOrE+/WlD+YX9Pw3iVcM4qqvg
REkKZXjVZfZl1qzChYj/l2YuLOdf/siiEl0+bxI7Ty9PppfBHYP7GMFreIGv7mgA7zYnaEwMG
oVUb2QHhcxaixYnAyrutmnwEEtXAlgosxC3eN+zsuE6UxxBNYnIxhiKRL+QSjX5kwuBuZjyYe
0+q5qRuibDBe1FjmBrEhP1LbUGJ+REA8Eki8Eg56KS4KUoyiALAG2m+6LWSxY8D0obEEMvIuf
nIld3furAYt4BRV5gs8gte8/NDhAYGJ+aNa/WtmrCuR7adpwwPkLgIlIgfWFRmtqPwhDqcVH5
c7BXvoUViGU5t9MzTHiEJmd0df5izbo6CbhJ0vGNR/inm9997g2XtoLO1hS+9D3sRxKALUbsE
rNVgtNhaCvTDrwwga/T1fNYS/zc75NYcNLt3Y1WOWxgNUR+BC80aSyRvdPqwBfg9MwUURRqA8
rtcKOc9jTzTlS5TZw6zbzfhCBRVOFoeRsXK6n5+LYgLvxbOd2N9kEMpIFqJovMPfXhCaA6eO1
CubMhOBxYWYsPZVIWNAwPiKky8p2XZ1ZTjnrh3ED83C/kiiOpQAUTessO+9eXoKK4BDPQMgdp
G7aMwLvcBRjt5knsktt94OAf8EwMmIMXz41+kyAUXYff4g9q9gfCOMRar7oVmfndNMo5dh0G2
quhJlz+BfHLS8NLsq9eJ/47tf7kDzrWJWFCcw4PmbcLu8qWSQfCXUDvjLqT/lgTI9q1P0HBUh
7WJBi908nia6HZv9pZyHzsAnnKyG98v31Bh9D1Oz/EY8NuPpsidTszQRPyoLKFLkFykImSlq/
NtYxf+e/+Aa8CQQcEHqyhGPLMwxq4h7xvajZNJi+NPc9JpRuUtkkglb73/PMMPIxPe7iN6oVo
SBe8Y3MCfF2YDEOU1SIHhghH6UsrYQWMTeRd4R3RoeH/GU2RKX55m78ZcAMLnkOeIKAtut1u7
ExL4mquJQnjw49Cv9Jai53XYNg685dCxtgX0tZifISPevPa2ohW2r7DuNRdWWFOFfHgny3c3K
mszkqhfxQ0UvP82Fgl9EJhbrtqiLTKf6g5+b5XeWVPnvgT0LVjtffbc4/R9iMulut2qVZvRJL
s8ilwA29glrFTYlrNRn53/3TASm+iwk2wX2mN0sO91DUw9PrZ7sZWLpZL0M/yCOcWX9sM4N6+
oDX4RLksqECM4wj24a+oqPoPrAVzDKoYDJmUMRJYC5kGF4uIoPRZghO0ChR0tImHitlbJfmu1
ZONSSkH16F3fKN/M+bUHKNZ6BpGQLcYUXpKYsRcolRp+6rzFZWmmdgkrUd7hkKyns/8MsQ8tO
j01cfpwFOWVP/qhaN9A2KfNOrks6GrFDgskq1Jl2qyGQU4h+kdwyLuDZ0fU6sBWI5nqfI1dUA
jj3Q4S+AxMZnuHwQ7GhcAUcgGYEI0j3vcyU+8iXNNTwe+Pr
X-Spam-Score: -0.7 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.7 (-)
On Thu, 30 Oct 2025 11:34:18 +0200 Eli Zaretskii <eliz@HIDDEN> wrote:
> From: eliz@HIDDEN
> --text follows this line--
> As the subject says, how can a user easily search for raw bytes in a
> buffer? Or how can a Lisp program quickly scan a buffer to find raw
> bytes and either remove or replace them?
>
> To reproduce, start "emacs -Q" then insert a raw byte by typing
>
> C-x 8 RET 3fffe0 RET
>
> Then try to come up with a regexp that finds only the raw byte.
>
> This is important when one has a buffer which could include raw bytes,
> and wants to json-serialize it, in which case there's a need to remove
> raw bytes or replace them with something that will avoid signaling an
> error from the serialization code.
>
> The only way I found is to examine the buffer one character at a time
> using charset-after. But this is tedious and inefficient.
>
> I seem to be unable to find a way to express this with regexps. The
> na=C3=AFve way would be "[\u3fff00-\u3fffff]", but that doesn't work (it
> finds ASCII letters and nothing else). Nothing else I tried worked,
> including the recipe from the ELisp manual:
>
> 4. If the end points of a range are raw 8-bit bytes (*note Text
> Representations::), or if the range start is ASCII and the end
> is a raw byte (as in =E2=80=98[a-\377]=E2=80=99), the range wil=
l match only
> ASCII characters and raw 8-bit bytes, but not non-ASCII
> characters. This feature is intended for searching text in
> unibyte buffers and strings.
>
> In a buffer that includes only ASCII characters and a raw byte, typing
> "C-M-s [a-\377]" signals an error "Failing regexp search.
>
> Is there solution for this job that I'm missing?
This seems to work, at least for your examples (also if I add them to
the HELLO buffer):
C-M-s [^[:ascii:][:print:]]
Steve Berman
X-Loop: help-debbugs@HIDDEN
Subject: bug#79724: 31.0.50; No easy way of searching a buffer for raw bytes
Resent-From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@HIDDEN
Resent-Date: Thu, 30 Oct 2025 11:11:01 +0000
Resent-Message-ID: <handler.79724.B79724.176182261624169 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 79724
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords:
To: Eli Zaretskii <eliz@HIDDEN>
Cc: 79724 <at> debbugs.gnu.org, monnier@HIDDEN
Received: via spool by 79724-submit <at> debbugs.gnu.org id=B79724.176182261624169
(code B ref 79724); Thu, 30 Oct 2025 11:11:01 +0000
Received: (at 79724) by debbugs.gnu.org; 30 Oct 2025 11:10:16 +0000
Received: from localhost ([127.0.0.1]:34703 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1vEQXn-0006Hk-CJ
for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 07:10:15 -0400
Received: from mail-lj1-x234.google.com ([2a00:1450:4864:20::234]:52727)
by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
(Exim 4.84_2) (envelope-from <mattias.engdegard@HIDDEN>)
id 1vEQXg-0006Gk-Ry
for 79724 <at> debbugs.gnu.org; Thu, 30 Oct 2025 07:10:11 -0400
Received: by mail-lj1-x234.google.com with SMTP id
38308e7fff4ca-37a1267c45dso5672481fa.1
for <79724 <at> debbugs.gnu.org>; Thu, 30 Oct 2025 04:10:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20230601; t=1761822601; x=1762427401; darn=debbugs.gnu.org;
h=to:references:message-id:content-transfer-encoding:cc:date
:in-reply-to:from:subject:mime-version:sender:from:to:cc:subject
:date:message-id:reply-to;
bh=dB+Atx+yyiLw0B4KilXEhyhCauGRMu5d3jiqVYGcs7M=;
b=WvpTcg6dqK2TxLrSP2ajKdiSj86rT/E0GraROFiEu46XBHqiT9m2JNP5Tk24rxJJPJ
BAfw+eINKgIcTX9x1XXa3IMQHzWrn/d7ieiew1rkIgbNVD50grPRJty1PmzMvMNUhkHv
7uTuL8xsjcOJYm09mC4ZYvLvGfwdSToY0mQU6VxxETlDvsqjrqaraQf2/JGF7oU004/W
pkFoQBXMf4AvgO7qcj0hvcevRnDXZWtaMvwt+l9skC7aX7yA0+YQydtLtFcXKVAgeOmz
JFeNgVIUqw0K4OLsJsyfiDUHoDxt0th43ouFry0K7cqO3ueG1BxEdYiL7uISgsRaJ/9P
+4kw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20230601; t=1761822601; x=1762427401;
h=to:references:message-id:content-transfer-encoding:cc:date
:in-reply-to:from:subject:mime-version:sender:x-gm-message-state
:from:to:cc:subject:date:message-id:reply-to;
bh=dB+Atx+yyiLw0B4KilXEhyhCauGRMu5d3jiqVYGcs7M=;
b=Tb+1sTJUmKwvOELL9NsuOHX18EQGMBU5dJePyGj2Ri6i4izf+zXQzV/U6FVybDuGlk
IRzgwjwSC+GUgNRFFISEXcL+vAsErNz9DDSSE8hiLeEfW6IPO2NJjhUoMCHLghHl3XLv
p4hKqhSquDhYzcrTYRnbs8k3G+TZmYKYJnk0wRDK2UK7aOVvSUtHvF3bKKjZLCMl8xGU
4sOVimcb+JNWpCJN1IWMMvY8ETLtR6LLG5uZhBrEjOXm4SyQSIxGqsFM4B1+ceq7o8HB
v32mV7keIZnr53xuZ6sU3BIhVtZVdQDQTV/b5/QrGs0td3Pmljyq04f/JQDyfKnfTDTe
VrnQ==
X-Gm-Message-State: AOJu0YzVLzI5776h1JDXL028Nj6DIH7FEIrRTwwZaEX7nejnRN/iHM6V
gaJ6wLSyMAHgP6C6/CK0Pel8b3oKbAfIaqH/cMXA6MckZhe0DehDYkMiupGH+Q==
X-Gm-Gg: ASbGncuAtX+kap/uWINq+LZtmkyML7JIPr0+iKQ/O7jkOnxFVe784Gkt6iefC0qNG9S
NZFIWoi1YhE9P+RWjd5clqdNbz75Rusj+WYLN6O5T71odtkkh+5pG7w/VR6/ENF2XSGum8tYihx
Dkx0065jN11Hj76suYqxOAAdiMI6YMHTMXeADODCMe7njOKd8Nqj5Z3oFGmDMsn+dF5hmRp4Hy3
4TfKUdx3YiWNCL+CbjRQNGhad5Si1XivdmJ7Fs6P6w8PvBmEKL3sTQJMOYkl0rS9YDstKyeCZxS
yDw81oyoEcEA7iJUvnpQEcPTnjDG6MJcOuR9AGTV46iBIZ9kSDrd4+PfuFoDVt0+PDNF1VLv5Ki
kLBMH8VUuSXonk1++plevgwvSMswGW8ObAQf9xhHZ8PJ52t8pHmj0sVn/c/xWn1g0bgErE+nlHb
SWVyt/PHWTaqiqvkCB8HutPiQbo48zCfCf6v0WDK5GDOJMOS6ijl3r74EdbKBa9LVSfzF2msTNV
lLV
X-Google-Smtp-Source: AGHT+IHU+G6x9U+hd3XxtUk2lwhVsOSkntcMtrpX4YciV0PMY8yWIhk/B+q+87jTbtpV/ZXL4P7g/g==
X-Received: by 2002:a2e:bea5:0:b0:378:ebd7:ad0 with SMTP id
38308e7fff4ca-37a052e66c8mr20876421fa.17.1761822600451;
Thu, 30 Oct 2025 04:10:00 -0700 (PDT)
Received: from smtpclient.apple (c188-150-186-155.bredband.tele2.se.
[188.150.186.155]) by smtp.gmail.com with ESMTPSA id
2adb3069b0e04-59301f50996sm4474975e87.36.2025.10.30.04.09.59
(version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
Thu, 30 Oct 2025 04:10:00 -0700 (PDT)
Content-Type: text/plain;
charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.15\))
From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <86frb0n0yd.fsf@HIDDEN>
Date: Thu, 30 Oct 2025 12:09:58 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <1BFAF201-1A4D-4B6D-867F-5EE337A6C9C4@HIDDEN>
References: <86ikfwn36t.fsf@HIDDEN> <86frb0n0yd.fsf@HIDDEN>
X-Mailer: Apple Mail (2.3654.120.0.1.15)
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)
> As the subject says, how can a user easily search for raw bytes in a
> buffer? Or how can a Lisp program quickly scan a buffer to find raw
> bytes and either remove or replace them?
(re-search-forward (rx (in (#x3fff80 . #x3fffff))))
assuming it's a multibyte buffer which is almost always the case.
If you want to find all non-Unicode values, including the embarrassing =
range in #x110000..#x3fff7f that we don't speak about, maybe you'd like
(re-search-forward (rx (not (in (0 . #x10ffff)))))
or if you prefer skip-chars-forward,
(skip-chars-forward "\0-\x10ffff")
etc.
> na=C3=AFve way would be "[\u3fff00-\u3fffff]", but that doesn't work
Actually you almost nailed it, it's just a char escape matter: it's =
either \u with four, \U with eight or \x with any number of hex digits. =
(Or just use rx.)
X-Loop: help-debbugs@HIDDEN
Subject: bug#79724: 31.0.50; No easy way of searching a buffer for raw bytes
Resent-From: Eli Zaretskii <eliz@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@HIDDEN
Resent-Date: Thu, 30 Oct 2025 11:27:02 +0000
Resent-Message-ID: <handler.79724.B79724.176182357028479 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 79724
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords:
To: Stephen Berman <stephen.berman@HIDDEN>
Cc: 79724 <at> debbugs.gnu.org, monnier@HIDDEN
Received: via spool by 79724-submit <at> debbugs.gnu.org id=B79724.176182357028479
(code B ref 79724); Thu, 30 Oct 2025 11:27:02 +0000
Received: (at 79724) by debbugs.gnu.org; 30 Oct 2025 11:26:10 +0000
Received: from localhost ([127.0.0.1]:34811 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1vEQn9-0007P8-2I
for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 07:26:10 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:34986)
by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1vEQn3-0007OK-7y
for 79724 <at> debbugs.gnu.org; Thu, 30 Oct 2025 07:26:04 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
id 1vEQmw-0006Eb-5C; Thu, 30 Oct 2025 07:25:54 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
Date; bh=IoYOO8tnv3k7Pd5SrRimjwmGnvElaEu+mAW+l9oKHcg=; b=VyiFXjxZoQtPTU884yck
hVz2/6ZsLfsfS+KgPCCzJRBWfzGya50m1sE2kfQLJmq38VuDF3xHC8HrMpIgQcXVH77MEuR1WPN0b
fb4gYZJ09p8o0eZhvptzu7y45fbYSmv7ArNr9eI3NROBFILw/ujlP7r1wluvTrf6AdpbuOAtcOOei
/P7ZlK0LhSKwR2NW0UoKI9/dZ5WkI1En71XHJSHQheOZO4hGogKlpk5lOiqbSGT7q4Ss54C3u5Dq8
gZ9jQBuZVQ8Awn1wOD933szPdOCw9I3JBtPHsVan9KEI0AVbFzfC7E5QgmwJY+PkMKAqr7d3jYueT
y8mHK2uVcUkjbg==;
Date: Thu, 30 Oct 2025 13:25:49 +0200
Message-Id: <86ecqkmy0y.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
In-Reply-To: <875xbwof8x.fsf@HIDDEN> (message from Stephen Berman on Thu, 30
Oct 2025 11:28:30 +0100)
References: <86ikfwn36t.fsf@HIDDEN> <875xbwof8x.fsf@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)
> From: Stephen Berman <stephen.berman@HIDDEN>
> Cc: 79724 <at> debbugs.gnu.org, Stefan Monnier <monnier@HIDDEN>
> Date: Thu, 30 Oct 2025 11:28:30 +0100
>
> On Thu, 30 Oct 2025 11:34:18 +0200 Eli Zaretskii <eliz@HIDDEN> wrote:
>
> > From: eliz@HIDDEN
> > --text follows this line--
> > As the subject says, how can a user easily search for raw bytes in a
> > buffer? Or how can a Lisp program quickly scan a buffer to find raw
> > bytes and either remove or replace them?
> >
> > To reproduce, start "emacs -Q" then insert a raw byte by typing
> >
> > C-x 8 RET 3fffe0 RET
> >
> > Then try to come up with a regexp that finds only the raw byte.
> >
> > This is important when one has a buffer which could include raw bytes,
> > and wants to json-serialize it, in which case there's a need to remove
> > raw bytes or replace them with something that will avoid signaling an
> > error from the serialization code.
> >
> > The only way I found is to examine the buffer one character at a time
> > using charset-after. But this is tedious and inefficient.
> >
> > I seem to be unable to find a way to express this with regexps. The
> > naïve way would be "[\u3fff00-\u3fffff]", but that doesn't work (it
> > finds ASCII letters and nothing else). Nothing else I tried worked,
> > including the recipe from the ELisp manual:
> >
> > 4. If the end points of a range are raw 8-bit bytes (*note Text
> > Representations::), or if the range start is ASCII and the end
> > is a raw byte (as in ‘[a-\377]’), the range will match only
> > ASCII characters and raw 8-bit bytes, but not non-ASCII
> > characters. This feature is intended for searching text in
> > unibyte buffers and strings.
> >
> > In a buffer that includes only ASCII characters and a raw byte, typing
> > "C-M-s [a-\377]" signals an error "Failing regexp search.
> >
> > Is there solution for this job that I'm missing?
>
> This seems to work, at least for your examples (also if I add them to
> the HELLO buffer):
>
> C-M-s [^[:ascii:][:print:]]
Thanks, but that will find also other codepoints. (If it doesn't,
it's a separate bug.) And anyway, how would one decide this should
work based on the documentation of [:print:]?
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.