GNU bug report logs - #79724
31.0.50; No easy way of searching a buffer for raw bytes

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: emacs; Reported by: Eli Zaretskii <eliz@HIDDEN>; dated Thu, 30 Oct 2025 09:35:01 UTC; Maintainer for emacs is bug-gnu-emacs@HIDDEN.

Message received at 79724 <at> debbugs.gnu.org:


Received: (at 79724) by debbugs.gnu.org; 30 Oct 2025 11:26:10 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Oct 30 07:26:10 2025
Received: from localhost ([127.0.0.1]:34811 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1vEQn9-0007P8-2I
	for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 07:26:10 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:34986)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1vEQn3-0007OK-7y
 for 79724 <at> debbugs.gnu.org; Thu, 30 Oct 2025 07:26:04 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1vEQmw-0006Eb-5C; Thu, 30 Oct 2025 07:25:54 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
 Date; bh=IoYOO8tnv3k7Pd5SrRimjwmGnvElaEu+mAW+l9oKHcg=; b=VyiFXjxZoQtPTU884yck
 hVz2/6ZsLfsfS+KgPCCzJRBWfzGya50m1sE2kfQLJmq38VuDF3xHC8HrMpIgQcXVH77MEuR1WPN0b
 fb4gYZJ09p8o0eZhvptzu7y45fbYSmv7ArNr9eI3NROBFILw/ujlP7r1wluvTrf6AdpbuOAtcOOei
 /P7ZlK0LhSKwR2NW0UoKI9/dZ5WkI1En71XHJSHQheOZO4hGogKlpk5lOiqbSGT7q4Ss54C3u5Dq8
 gZ9jQBuZVQ8Awn1wOD933szPdOCw9I3JBtPHsVan9KEI0AVbFzfC7E5QgmwJY+PkMKAqr7d3jYueT
 y8mHK2uVcUkjbg==;
Date: Thu, 30 Oct 2025 13:25:49 +0200
Message-Id: <86ecqkmy0y.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Stephen Berman <stephen.berman@HIDDEN>
In-Reply-To: <875xbwof8x.fsf@HIDDEN> (message from Stephen Berman on Thu, 30
 Oct 2025 11:28:30 +0100)
Subject: Re: bug#79724: 31.0.50; No easy way of searching a buffer for raw
 bytes
References: <86ikfwn36t.fsf@HIDDEN> <875xbwof8x.fsf@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79724
Cc: 79724 <at> debbugs.gnu.org, monnier@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

> From: Stephen Berman <stephen.berman@HIDDEN>
> Cc: 79724 <at> debbugs.gnu.org,  Stefan Monnier <monnier@HIDDEN>
> Date: Thu, 30 Oct 2025 11:28:30 +0100
> 
> On Thu, 30 Oct 2025 11:34:18 +0200 Eli Zaretskii <eliz@HIDDEN> wrote:
> 
> > From: eliz@HIDDEN
> > --text follows this line--
> > As the subject says, how can a user easily search for raw bytes in a
> > buffer?  Or how can a Lisp program quickly scan a buffer to find raw
> > bytes and either remove or replace them?
> >
> > To reproduce, start "emacs -Q" then insert a raw byte by typing
> >
> >   C-x 8 RET 3fffe0 RET
> >
> > Then try to come up with a regexp that finds only the raw byte.
> >
> > This is important when one has a buffer which could include raw bytes,
> > and wants to json-serialize it, in which case there's a need to remove
> > raw bytes or replace them with something that will avoid signaling an
> > error from the serialization code.
> >
> > The only way I found is to examine the buffer one character at a time
> > using charset-after.  But this is tedious and inefficient.
> >
> > I seem to be unable to find a way to express this with regexps.  The
> > naïve way would be "[\u3fff00-\u3fffff]", but that doesn't work (it
> > finds ASCII letters and nothing else).  Nothing else I tried worked,
> > including the recipe from the ELisp manual:
> >
> >        4. If the end points of a range are raw 8-bit bytes (*note Text
> >           Representations::), or if the range start is ASCII and the end
> >           is a raw byte (as in ‘[a-\377]’), the range will match only
> >           ASCII characters and raw 8-bit bytes, but not non-ASCII
> >           characters.  This feature is intended for searching text in
> >           unibyte buffers and strings.
> >
> > In a buffer that includes only ASCII characters and a raw byte, typing
> > "C-M-s [a-\377]" signals an error "Failing regexp search.
> >
> > Is there solution for this job that I'm missing?
> 
> This seems to work, at least for your examples (also if I add them to
> the HELLO buffer):
> 
> C-M-s [^[:ascii:][:print:]]

Thanks, but that will find also other codepoints.  (If it doesn't,
it's a separate bug.)  And anyway, how would one decide this should
work based on the documentation of [:print:]?





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#79724; Package emacs. Full text available.

Message received at 79724 <at> debbugs.gnu.org:


Received: (at 79724) by debbugs.gnu.org; 30 Oct 2025 11:10:16 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Oct 30 07:10:15 2025
Received: from localhost ([127.0.0.1]:34703 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1vEQXn-0006Hk-CJ
	for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 07:10:15 -0400
Received: from mail-lj1-x234.google.com ([2a00:1450:4864:20::234]:52727)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.84_2) (envelope-from <mattias.engdegard@HIDDEN>)
 id 1vEQXg-0006Gk-Ry
 for 79724 <at> debbugs.gnu.org; Thu, 30 Oct 2025 07:10:11 -0400
Received: by mail-lj1-x234.google.com with SMTP id
 38308e7fff4ca-37a1267c45dso5672481fa.1
 for <79724 <at> debbugs.gnu.org>; Thu, 30 Oct 2025 04:10:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1761822601; x=1762427401; darn=debbugs.gnu.org;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:from:to:cc:subject
 :date:message-id:reply-to;
 bh=dB+Atx+yyiLw0B4KilXEhyhCauGRMu5d3jiqVYGcs7M=;
 b=WvpTcg6dqK2TxLrSP2ajKdiSj86rT/E0GraROFiEu46XBHqiT9m2JNP5Tk24rxJJPJ
 BAfw+eINKgIcTX9x1XXa3IMQHzWrn/d7ieiew1rkIgbNVD50grPRJty1PmzMvMNUhkHv
 7uTuL8xsjcOJYm09mC4ZYvLvGfwdSToY0mQU6VxxETlDvsqjrqaraQf2/JGF7oU004/W
 pkFoQBXMf4AvgO7qcj0hvcevRnDXZWtaMvwt+l9skC7aX7yA0+YQydtLtFcXKVAgeOmz
 JFeNgVIUqw0K4OLsJsyfiDUHoDxt0th43ouFry0K7cqO3ueG1BxEdYiL7uISgsRaJ/9P
 +4kw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1761822601; x=1762427401;
 h=to:references:message-id:content-transfer-encoding:cc:date
 :in-reply-to:from:subject:mime-version:sender:x-gm-message-state
 :from:to:cc:subject:date:message-id:reply-to;
 bh=dB+Atx+yyiLw0B4KilXEhyhCauGRMu5d3jiqVYGcs7M=;
 b=Tb+1sTJUmKwvOELL9NsuOHX18EQGMBU5dJePyGj2Ri6i4izf+zXQzV/U6FVybDuGlk
 IRzgwjwSC+GUgNRFFISEXcL+vAsErNz9DDSSE8hiLeEfW6IPO2NJjhUoMCHLghHl3XLv
 p4hKqhSquDhYzcrTYRnbs8k3G+TZmYKYJnk0wRDK2UK7aOVvSUtHvF3bKKjZLCMl8xGU
 4sOVimcb+JNWpCJN1IWMMvY8ETLtR6LLG5uZhBrEjOXm4SyQSIxGqsFM4B1+ceq7o8HB
 v32mV7keIZnr53xuZ6sU3BIhVtZVdQDQTV/b5/QrGs0td3Pmljyq04f/JQDyfKnfTDTe
 VrnQ==
X-Gm-Message-State: AOJu0YzVLzI5776h1JDXL028Nj6DIH7FEIrRTwwZaEX7nejnRN/iHM6V
 gaJ6wLSyMAHgP6C6/CK0Pel8b3oKbAfIaqH/cMXA6MckZhe0DehDYkMiupGH+Q==
X-Gm-Gg: ASbGncuAtX+kap/uWINq+LZtmkyML7JIPr0+iKQ/O7jkOnxFVe784Gkt6iefC0qNG9S
 NZFIWoi1YhE9P+RWjd5clqdNbz75Rusj+WYLN6O5T71odtkkh+5pG7w/VR6/ENF2XSGum8tYihx
 Dkx0065jN11Hj76suYqxOAAdiMI6YMHTMXeADODCMe7njOKd8Nqj5Z3oFGmDMsn+dF5hmRp4Hy3
 4TfKUdx3YiWNCL+CbjRQNGhad5Si1XivdmJ7Fs6P6w8PvBmEKL3sTQJMOYkl0rS9YDstKyeCZxS
 yDw81oyoEcEA7iJUvnpQEcPTnjDG6MJcOuR9AGTV46iBIZ9kSDrd4+PfuFoDVt0+PDNF1VLv5Ki
 kLBMH8VUuSXonk1++plevgwvSMswGW8ObAQf9xhHZ8PJ52t8pHmj0sVn/c/xWn1g0bgErE+nlHb
 SWVyt/PHWTaqiqvkCB8HutPiQbo48zCfCf6v0WDK5GDOJMOS6ijl3r74EdbKBa9LVSfzF2msTNV
 lLV
X-Google-Smtp-Source: AGHT+IHU+G6x9U+hd3XxtUk2lwhVsOSkntcMtrpX4YciV0PMY8yWIhk/B+q+87jTbtpV/ZXL4P7g/g==
X-Received: by 2002:a2e:bea5:0:b0:378:ebd7:ad0 with SMTP id
 38308e7fff4ca-37a052e66c8mr20876421fa.17.1761822600451; 
 Thu, 30 Oct 2025 04:10:00 -0700 (PDT)
Received: from smtpclient.apple (c188-150-186-155.bredband.tele2.se.
 [188.150.186.155]) by smtp.gmail.com with ESMTPSA id
 2adb3069b0e04-59301f50996sm4474975e87.36.2025.10.30.04.09.59
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 30 Oct 2025 04:10:00 -0700 (PDT)
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.15\))
Subject: Re: bug#79724: 31.0.50; No easy way of searching a buffer for raw
 bytes
From: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= <mattias.engdegard@HIDDEN>
In-Reply-To: <86frb0n0yd.fsf@HIDDEN>
Date: Thu, 30 Oct 2025 12:09:58 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <1BFAF201-1A4D-4B6D-867F-5EE337A6C9C4@HIDDEN>
References: <86ikfwn36t.fsf@HIDDEN> <86frb0n0yd.fsf@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
X-Mailer: Apple Mail (2.3654.120.0.1.15)
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 79724
Cc: 79724 <at> debbugs.gnu.org, monnier@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

> As the subject says, how can a user easily search for raw bytes in a
> buffer?  Or how can a Lisp program quickly scan a buffer to find raw
> bytes and either remove or replace them?

  (re-search-forward (rx (in (#x3fff80 . #x3fffff))))

assuming it's a multibyte buffer which is almost always the case.

If you want to find all non-Unicode values, including the embarrassing =
range in #x110000..#x3fff7f that we don't speak about, maybe you'd like

  (re-search-forward (rx (not (in (0 . #x10ffff)))))

or if you prefer skip-chars-forward,

  (skip-chars-forward "\0-\x10ffff")

etc.

> na=C3=AFve way would be "[\u3fff00-\u3fffff]", but that doesn't work

Actually you almost nailed it, it's just a char escape matter: it's =
either \u with four, \U with eight or \x with any number of hex digits. =
(Or just use rx.)





Information forwarded to bug-gnu-emacs@HIDDEN:
bug#79724; Package emacs. Full text available.

Message received at 79724 <at> debbugs.gnu.org:


Received: (at 79724) by debbugs.gnu.org; 30 Oct 2025 10:28:46 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Oct 30 06:28:46 2025
Received: from localhost ([127.0.0.1]:34462 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1vEPte-0003q8-BP
	for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 06:28:46 -0400
Received: from mout.gmx.net ([212.227.15.19]:53087)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <stephen.berman@HIDDEN>)
 id 1vEPtY-0003p5-0k
 for 79724 <at> debbugs.gnu.org; Thu, 30 Oct 2025 06:28:41 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmx.net;
 s=s31663417; t=1761820112; x=1762424912; i=stephen.berman@HIDDEN;
 bh=BjdiFbW4GjxJ0hkf/LzUUquUmNHdf/GVqVghPPcioOA=;
 h=X-UI-Sender-Class:From:To:Cc:Subject:In-Reply-To:References:Date:
 Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding:cc:
 content-transfer-encoding:content-type:date:from:message-id:
 mime-version:reply-to:subject:to;
 b=mda/4TMqGOLshsyfyU9bpVEG9dCaBYZ5LrgGQ2D+7ViPJz5GIsIsCL3CjLSbl0wE
 Laz5raWnEUumRK8L4AI+rpZLcAjKC6sxpfQ9Auw427jvi+660FjFvyjMdhLARDbzS
 C771ZbLmaRqq/JTujmOr9O9WoMoUWt4BwG19uOujvS5XKxUCVvgPfqM+w2Gqw04KR
 2J6qqNehfHcWeKH3qmhkTH0RxtAPwCTHrhPNiQQ09741o73MdiblkiNEasCP2OPW3
 l6gTB6E2mMu2M6ww7xjENZj6zJNZIXnL1PCj9J1LhlWiOIUo3S0gT2cYbEw2gM87/
 z/u0aEU+mrbnBmd6Zw==
X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a
Received: from strobelfssd ([88.130.62.64]) by mail.gmx.net (mrgmx005
 [212.227.17.190]) with ESMTPSA (Nemesis) id 1M2wL0-1vHhgS1vLQ-008O59; Thu, 30
 Oct 2025 11:28:32 +0100
From: Stephen Berman <stephen.berman@HIDDEN>
To: Eli Zaretskii <eliz@HIDDEN>
Subject: Re: bug#79724: 31.0.50; No easy way of searching a buffer for raw
 bytes
In-Reply-To: <86ikfwn36t.fsf@HIDDEN>
References: <86ikfwn36t.fsf@HIDDEN>
Date: Thu, 30 Oct 2025 11:28:30 +0100
Message-ID: <875xbwof8x.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:KCycR3oiTJqOLOGGZ+deM/F4+oQxoYnbfArfc0D5ipEgWfHv5BF
 CAvNQNGk9SrKLMbfiSL2PnQCwVEAuXqKOVihGPl05zIPGXRUjzy+bl2uQbw+/EkjaEQteRs
 rNkREucKKqp1i4NFg0WmFJqiiUbRMFwruwZHj6VkhslmdhVzZszfISP3BoufKqbfB9hJgoY
 jmKb1S7LI+gUkj/DuRvPQ==
X-Spam-Flag: NO
UI-OutboundReport: notjunk:1;M01:P0:oxW35xLHzv8=;Ep6aZswkGizCvpomJUF0l7Q9hqr
 fYZRSvXFu9Pjtt5QWGEEAwPMs7UkZp8dF8tppUl3bao9c8oHhWqhobFeX61qEFi4HisAlFZLd
 76yc8RmeCB0Aneq/hWSJU7OE7PG+pNja/2CPPBbp5Txs/0DuPjHayK1vVZkoKctdNNkkHQBHo
 nnxQEHCTP/z+8os5cSJo/rxvTruUUrpE0BZdrNLO61rL+/78p5TndsZeDGexPHTiLHekHmn5z
 kNNDeUe9MSpqu8IbeCCQvxAk6XxNsXX2sk8L/hgPaOuitOvZRwpldrZNvkEHhHOKips39UUTi
 3ENKJWNDAJ0pgXRuN2lX5nJL9N93ZoHRypfWo+ixgoxWZPrjOgUbmhjAc7Xhs877Elpm+rZXX
 yid717feHjJ0wRF3gKEiPDyThscJ/Gv6rBsH9CWbDH0DDElScTSzyaoF0VenWn9iGpcB+gfbg
 0jAdraJ+sj6LLMqmmquUEG11mv1jvF7vwPGaZqkCSrcsmqcKgTQyeu3EiFJvCEN8h5/WYlE1m
 x8cd6ujfvNUPeflaM+4f0MIgnGGZTmFGoO7UFQwdrtiX/cJrIHY2iZJ51ibPnKmSqtxFhpc9S
 gu9gRCFZUyNtt7LyDVR1augs8FnEBeawY7v72KTxCj5Pmv0ltt/PYmk+nMto7s6+iFfFfO1sE
 p/JG4ei1DMTw1XexvawpKAaHuFJ3IiqiM8fTZXuJDpyTwxxX/Wt0hCGuc4bLhFx8sdwmZdMo8
 ekGLFE1CvC8TNsln0Ta0oHlVBi1qHBNicWCCFkKy5INDsCvEr7zpPRtdkFDvmLUI1lciPZKdA
 QD0wbZrfg1EA6bUO+T5vF3yJMgBlHjQ8iX2U4DF2VnEmk184m28Em4QhplI5bJQFb0T+B0L46
 O7XqASUZ40KDgmh+6TBxI3f1KJanz1z8wKRglyYBB8q0te7J2uTDelMt3DWyAnlt6zenYxC5z
 rdx0M/lGtNEugJta+Zr+tMY+1f9VdZXWc8VT/GBU5y9zwRU1i5HRq8Q7Hsyo50Q++fJP5EVl1
 J5thQ9cBVfcDDX0BEp21JjYvPYG5+hJCRqbEmVOn+M5cF3TZXpO0SvrgHWqsGOLMMrjCxU1ke
 HJHmS2fFMoTIPRrHyVbG9CI/+5qNQ2DVGi2FvS1GuF9kzkEprL9GBTi6/+nrWHeAfODtLQ+8e
 bMmBMnkUmgKpzVq7teEW71EWV28ersIk1UiI5Ycnlvy39tIqi9fkt2rkHXnWSetwz8A27SlsQ
 oo0rgC2AhN7njIJjaJi24It9+Ey3jtgQX7jJRxNrjkZltH9jTUYbag8kYMUqHyTEAmv+cU2bN
 a6DJmgfcx48jDT254gK6ZZzypwEr6PemjelIf3IJLHqBDebnMSOqlFvsX7Fmnn9xj6o7LJroI
 VBPzUGLPy0MQ6wHmadUonDiKpk6o0SGah6NFPx1vR+7faxb2lrQccVyxr6jmq7F+tRfdu/x5d
 vU9l1k1NkUdYNOGIgSSiaxwyfSWsrblcJxonb7FgGF9eDA/cFps/70jYBuBekxZvXY9cEViRq
 GDZ4dukUsP44mJlrLOpJ+aCPzamKUIW7mmxma3JyLFfaAZxzxLA7tyxmfZAzMvAsCkMd8AFqR
 lWjOaMu9F4M+zsqQe3rW1QbS+/vwe7P3gct7EoOsl23Tu1S+KXJ8Z75dmBTZlI7PhZ2V5FOzr
 GbIlCzXjv4eu5M7gNuAc163YETHqjgVXwbc3lxeinHysiZ/ODB1S3NLqMzi2uxtG746BL/rgw
 nM2JuUn7EnxX1oaLccPQrYaRpuBliAvs1ZsYB1WBmVQkGlnHgUQ8Xdg+idTr8snpLUhEC3edv
 apes1W6mpifWSg0oKdBh6miWI8Ai0QeWqINIscISXpFj7pklJbmP/epPhzlnCHoHbyp/gBs5S
 Ix6qvZNFuD8CA69EXSQpa+Cnx8fN4T6peiEI6yVG9GEdNM7Hc+sKf1CBiTEf2X2cvqwPNtKsf
 XUgKnXlG2lij4Jimk2lHbRB6T9oiSgz8IP74EELgbW+4WV7itlfV3rq9KjhTtpw+nSq6YSqbh
 8pneCUVeOvDs83vTCcfvexO17IxbGjmDWXsawyFXuPbCqVNKwnPVfuRf7ATo/mNxX4UzN4g8b
 AUQBqqnWI016r/HS/0cYb2p7uwSPovW/LdGYnM2+R7+KBvAJ4wiKMrvhQRLYwDsDe28xyczI7
 LcN1W6nO4asI15gOcUwKp6E2QisLFzmNd1AIuIblibqS+lTbvsaJNprVm3tGfnLhMtSBlfQ3r
 awqBQXAjYEjHifY2+URjQ0VpjwwkIJKMJ0GJ2lBKdaXtkV5mNOj0Y74I6OMHyapTFH7uJwPt/
 6o4g+wZE+8KTDVwjs6jzVJbsxrFxp8C1D672qCXIc3ZQXCKOKtcX4ucyB3g0mGaLcwIV6HFud
 4nrINl0rmaH/yC3rphgNS0Wrn3Sfsa+7Oe1DAd2+X4dS1rFAv0mNYDIfCQYfusUTmSKgfaa49
 VUB5S95B0QNsrJnzAzLomC3Ik0jwKBFJvsd0Xr9eqzSHS0gu74eJOmYpfKCBOEL8/MxWPi2x3
 U/aiZRHr7tJ9CDJzKnbKbB91BrkM11C+vgA7BCmbssMgTyTXUKOfzoVyaBu+S4DiG8nJH3FQp
 g3TTZHZ6iinb2C0xG8xqZoaXKPEXyxMnl8elCeRvNOzGSOY7NtPnQEH52VcG1POwHqb8ZvTbz
 fJ3dg5yOlJvcjs0DEKxlD/BoOJ/JtDShvcEHVeXpJxDH+isHehU0mXwqvyYnleFUt44hFuNLk
 PAEGhmHM4FRU/HZaHfC0HqsqJd6R5radjfZVwj+SBISWMNILZrIx2iLp+dsBYG43qxIL9EHRh
 lVi9/0pLLZeRwnPI+mTzjTp67HLu81Tsp3ixh0cnfcGW5UFba57UIWTbm69flOgHue1CovOfn
 Edl/ZaxyDJk/GEw1T9iJKHPjRrD+dqs56+H4w9Sv3yLIqCGLAjMWxyxjfCj/tSgTqc3Wwpmcd
 eoHsEGC2NJvC06noE1F/95E25vTNeCfAOWR5l9poztVinOBSWqfSmcJyLxIhlz1VbyNO7b+PF
 7eV/SgXy5uzzNA6D4ICHHUaIqqb9BhoMiRmPfbcjcMoOssLFRIsOU/2oHhTKzdObLLyDsLQS/
 neFjRFVO4A8X7rElAYDXHVKyLJXrUVhY00ua8zOhfoNXbUtqsQx6D2TJQkO3ZuVafRiNX8QNV
 dYtjWZvEfl+ypL+n3JrP4UC8abtuEo6QhOdOf6Hyb4nolsOOMxC56NceMxUlM1rW+d4iV1aJ9
 EcujXpYqA7yQKWyJz3W8AnnX34RljDpHaRayLRgoZ1D8XzZ2cF41L80Cb9rJvV3x5NQ9aZjRD
 Wgay9ggoPDJvCc2sJo+eu3UZ3GeBf4wrDRZuNTAGfAqt7gfxaAEkjxhsp9/cMkryOmYEPB9CB
 Ma0OaMzQ+Q6KfMnFgV+ZjnzOBEDuRZ5IzQdmrAA5/ov6ht874FFy7ChN+MHOSuAtsVYtfvpS0
 FDsQ54PT8xM3OXVDQCS0bsO8UY1QQXGhEnTnFyfSjK2ujpZIq0bMSIHEdgw7orJDDkF+6X35P
 weVBlpIktg5JOdI3StNWI+8VFdQGJ356HJksjBWi+QEE/pyeQsLo/2+i2D7MamP7HdUOCsHFm
 +mLCvxBucNW6ZhgdTxMfhC/qJffsyY1nFpr8UQE5BT/Fz8u1LOrE+/WlD+YX9Pw3iVcM4qqvg
 REkKZXjVZfZl1qzChYj/l2YuLOdf/siiEl0+bxI7Ty9PppfBHYP7GMFreIGv7mgA7zYnaEwMG
 oVUb2QHhcxaixYnAyrutmnwEEtXAlgosxC3eN+zsuE6UxxBNYnIxhiKRL+QSjX5kwuBuZjyYe
 0+q5qRuibDBe1FjmBrEhP1LbUGJ+REA8Eki8Eg56KS4KUoyiALAG2m+6LWSxY8D0obEEMvIuf
 nIld3furAYt4BRV5gs8gte8/NDhAYGJ+aNa/WtmrCuR7adpwwPkLgIlIgfWFRmtqPwhDqcVH5
 c7BXvoUViGU5t9MzTHiEJmd0df5izbo6CbhJ0vGNR/inm9997g2XtoLO1hS+9D3sRxKALUbsE
 rNVgtNhaCvTDrwwga/T1fNYS/zc75NYcNLt3Y1WOWxgNUR+BC80aSyRvdPqwBfg9MwUURRqA8
 rtcKOc9jTzTlS5TZw6zbzfhCBRVOFoeRsXK6n5+LYgLvxbOd2N9kEMpIFqJovMPfXhCaA6eO1
 CubMhOBxYWYsPZVIWNAwPiKky8p2XZ1ZTjnrh3ED83C/kiiOpQAUTessO+9eXoKK4BDPQMgdp
 G7aMwLvcBRjt5knsktt94OAf8EwMmIMXz41+kyAUXYff4g9q9gfCOMRar7oVmfndNMo5dh0G2
 quhJlz+BfHLS8NLsq9eJ/47tf7kDzrWJWFCcw4PmbcLu8qWSQfCXUDvjLqT/lgTI9q1P0HBUh
 7WJBi908nia6HZv9pZyHzsAnnKyG98v31Bh9D1Oz/EY8NuPpsidTszQRPyoLKFLkFykImSlq/
 NtYxf+e/+Aa8CQQcEHqyhGPLMwxq4h7xvajZNJi+NPc9JpRuUtkkglb73/PMMPIxPe7iN6oVo
 SBe8Y3MCfF2YDEOU1SIHhghH6UsrYQWMTeRd4R3RoeH/GU2RKX55m78ZcAMLnkOeIKAtut1u7
 ExL4mquJQnjw49Cv9Jai53XYNg685dCxtgX0tZifISPevPa2ohW2r7DuNRdWWFOFfHgny3c3K
 mszkqhfxQ0UvP82Fgl9EJhbrtqiLTKf6g5+b5XeWVPnvgT0LVjtffbc4/R9iMulut2qVZvRJL
 s8ilwA29glrFTYlrNRn53/3TASm+iwk2wX2mN0sO91DUw9PrZ7sZWLpZL0M/yCOcWX9sM4N6+
 oDX4RLksqECM4wj24a+oqPoPrAVzDKoYDJmUMRJYC5kGF4uIoPRZghO0ChR0tImHitlbJfmu1
 ZONSSkH16F3fKN/M+bUHKNZ6BpGQLcYUXpKYsRcolRp+6rzFZWmmdgkrUd7hkKyns/8MsQ8tO
 j01cfpwFOWVP/qhaN9A2KfNOrks6GrFDgskq1Jl2qyGQU4h+kdwyLuDZ0fU6sBWI5nqfI1dUA
 jj3Q4S+AxMZnuHwQ7GhcAUcgGYEI0j3vcyU+8iXNNTwe+Pr
X-Spam-Score: -0.7 (/)
X-Debbugs-Envelope-To: 79724
Cc: 79724 <at> debbugs.gnu.org, Stefan Monnier <monnier@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.7 (-)

On Thu, 30 Oct 2025 11:34:18 +0200 Eli Zaretskii <eliz@HIDDEN> wrote:

> From: eliz@HIDDEN
> --text follows this line--
> As the subject says, how can a user easily search for raw bytes in a
> buffer?  Or how can a Lisp program quickly scan a buffer to find raw
> bytes and either remove or replace them?
>
> To reproduce, start "emacs -Q" then insert a raw byte by typing
>
>   C-x 8 RET 3fffe0 RET
>
> Then try to come up with a regexp that finds only the raw byte.
>
> This is important when one has a buffer which could include raw bytes,
> and wants to json-serialize it, in which case there's a need to remove
> raw bytes or replace them with something that will avoid signaling an
> error from the serialization code.
>
> The only way I found is to examine the buffer one character at a time
> using charset-after.  But this is tedious and inefficient.
>
> I seem to be unable to find a way to express this with regexps.  The
> na=C3=AFve way would be "[\u3fff00-\u3fffff]", but that doesn't work (it
> finds ASCII letters and nothing else).  Nothing else I tried worked,
> including the recipe from the ELisp manual:
>
>        4. If the end points of a range are raw 8-bit bytes (*note Text
>           Representations::), or if the range start is ASCII and the end
>           is a raw byte (as in =E2=80=98[a-\377]=E2=80=99), the range wil=
l match only
>           ASCII characters and raw 8-bit bytes, but not non-ASCII
>           characters.  This feature is intended for searching text in
>           unibyte buffers and strings.
>
> In a buffer that includes only ASCII characters and a raw byte, typing
> "C-M-s [a-\377]" signals an error "Failing regexp search.
>
> Is there solution for this job that I'm missing?

This seems to work, at least for your examples (also if I add them to
the HELLO buffer):

C-M-s [^[:ascii:][:print:]]

Steve Berman




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#79724; Package emacs. Full text available.

Message received at 79724 <at> debbugs.gnu.org:


Received: (at 79724) by debbugs.gnu.org; 30 Oct 2025 10:22:54 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Oct 30 06:22:54 2025
Received: from localhost ([127.0.0.1]:34419 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1vEPnx-0003OB-Dn
	for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 06:22:53 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:41086)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1vEPns-0003Ne-O7
 for 79724 <at> debbugs.gnu.org; Thu, 30 Oct 2025 06:22:50 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
 id 1vEPnj-0004fP-Jb; Thu, 30 Oct 2025 06:22:39 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
 Date; bh=UUAvNwgNoZb/hoEg2gQwcp0GwagS9dogjENASBTukME=; b=V945RDbEDPApDDgh3qVd
 H8E+COPZzaYct8Enjc6Dgf1OlUeur+hZ/oLkIIluj4pMGT+MPptbBig4DWuEbBYR0vQzE05ZnAXPN
 Gx08ftDrHTvaYoP9A6GbSTLsZ0aHpmbwgG2xDu0KjgV/jppOCCoAGLtEztQtWbKuDKq0ErC1YACW+
 4ZELyh3Oux3fg580mzH9s29oR21PqCtKDVNj7J9/W8v+MZ2mCEfUj62bEhIK+8tpsNgz9V4vcEyU3
 Tqo8YCaDVLkNK4LVYZT9hQWOwLz5BGXq7BcazOvEqcFzPv2NbEJYqV/1lIAus2g0xAIhKRv8M5zkE
 wNcZqyG7MhjbDA==;
Date: Thu, 30 Oct 2025 12:22:34 +0200
Message-Id: <86frb0n0yd.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: 79724 <at> debbugs.gnu.org
In-Reply-To: <86ikfwn36t.fsf@HIDDEN> (message from Eli Zaretskii on Thu, 30
 Oct 2025 11:34:18 +0200)
Subject: Re: bug#79724: 31.0.50;
 No easy way of searching a buffer for raw bytes
References: <86ikfwn36t.fsf@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 79724
Cc: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattiase@HIDDEN>,
 monnier@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Let me add Mattias as well to the discussion, since he made some of
the changes in this area.

> Cc: Stefan Monnier <monnier@HIDDEN>
> Date: Thu, 30 Oct 2025 11:34:18 +0200
> From: Eli Zaretskii <eliz@HIDDEN>
> 
> As the subject says, how can a user easily search for raw bytes in a
> buffer?  Or how can a Lisp program quickly scan a buffer to find raw
> bytes and either remove or replace them?
> 
> To reproduce, start "emacs -Q" then insert a raw byte by typing
> 
>   C-x 8 RET 3fffe0 RET
> 
> Then try to come up with a regexp that finds only the raw byte.
> 
> This is important when one has a buffer which could include raw bytes,
> and wants to json-serialize it, in which case there's a need to remove
> raw bytes or replace them with something that will avoid signaling an
> error from the serialization code.
> 
> The only way I found is to examine the buffer one character at a time
> using charset-after.  But this is tedious and inefficient.
> 
> I seem to be unable to find a way to express this with regexps.  The
> naïve way would be "[\u3fff00-\u3fffff]", but that doesn't work (it
> finds ASCII letters and nothing else).  Nothing else I tried worked,
> including the recipe from the ELisp manual:
> 
>        4. If the end points of a range are raw 8-bit bytes (*note Text
>           Representations::), or if the range start is ASCII and the end
>           is a raw byte (as in ‘[a-\377]’), the range will match only
>           ASCII characters and raw 8-bit bytes, but not non-ASCII
>           characters.  This feature is intended for searching text in
>           unibyte buffers and strings.
> 
> In a buffer that includes only ASCII characters and a raw byte, typing
> "C-M-s [a-\377]" signals an error "Failing regexp search.
> 
> Is there solution for this job that I'm missing?  If so, we should at
> least document it.  If there's no solution currently, I think we
> should add something to make it easier.
> 
> 
> In GNU Emacs 31.0.50 (build 1458, i686-pc-mingw32) of 2025-10-30 built
>  on ELIZ-PC
> Repository revision: 06b3f11cb8f040d192a91972b40eab8c85a2cc5b
> Repository branch: master
> Windowing system distributor 'Microsoft Corp.', version 10.0.26100
> System Description: Microsoft Windows 10 Enterprise (v10.0.2009.26100.6899)
> 
> Configured using:
>  'configure -C --prefix=/d/usr --with-wide-int
>  --without-native-compilation --enable-checking=yes,glyphs 'CFLAGS=-O0
>  -gdwarf-4 -g3''
> 
> Configured features:
> ACL GIF GMP GNUTLS HARFBUZZ JPEG LCMS2 LIBXML2 MODULES NOTIFY W32NOTIFY
> PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS
> TREE_SITTER WEBP XPM ZLIB
> 
> Important settings:
>   value of $LANG: ENG
>   locale-coding-system: cp1252
> 
> Major mode: Lisp Interaction
> 
> Minor modes in effect:
>   tooltip-mode: t
>   global-eldoc-mode: t
>   eldoc-mode: t
>   show-paren-mode: t
>   electric-indent-mode: t
>   mouse-wheel-mode: t
>   tool-bar-mode: t
>   menu-bar-mode: t
>   file-name-shadow-mode: t
>   global-font-lock-mode: t
>   font-lock-mode: t
>   blink-cursor-mode: t
>   minibuffer-nonselected-mode: t
>   minibuffer-regexp-mode: t
>   line-number-mode: t
>   indent-tabs-mode: t
>   transient-mark-mode: t
>   auto-composition-mode: t
>   auto-encryption-mode: t
>   auto-compression-mode: t
> 
> Load-path shadows:
> None found.
> 
> Features:
> (shadow sort mail-extr emacsbug lisp-mnt message mailcap yank-media puny
> dired dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg
> rfc6068 epg-config gnus-util text-property-search time-date subr-x
> mm-decode mm-bodies mm-encode mailabbrev gmm-utils mailheader sendmail
> mail-parse rfc2231 rfc2047 rfc2045 ietf-drums mm-util mail-prsvr
> mail-utils warnings icons cl-loaddefs cl-lib rmc iso-transl tooltip
> cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type
> elisp-mode mwheel touch-screen dos-w32 ls-lisp term/w32-nt disp-table
> term/w32-win w32-win w32-vars term/common-win tool-bar dnd fontset image
> regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode
> prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu
> timer select scroll-bar mouse jit-lock font-lock syntax font-core
> term/tty-colors frame minibuffer nadvice seq simple cl-generic
> indonesian philippine cham georgian utf-8-lang misc-lang vietnamese
> tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek
> romanian slovak czech european ethiopic indian cyrillic chinese
> composite emoji-zwj charscript charprop case-table epa-hook
> jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs
> theme-loaddefs faces cus-face macroexp files window text-properties
> overlay sha1 md5 base64 format env code-pages mule custom widget keymap
> hashtable-print-readable backquote threads w32notify w32 lcms2 multi-tty
> move-toolbar make-network-process tty-child-frames emacs)
> 
> Memory information:
> ((conses 16 46888 16793) (symbols 48 6655 0) (strings 16 16703 2197)
>  (string-bytes 1 346778) (vectors 16 9844)
>  (vector-slots 8 115806 11013) (floats 8 23 6) (intervals 40 310 75)
>  (buffers 928 10))
> 
> 
> 
> 




Information forwarded to bug-gnu-emacs@HIDDEN:
bug#79724; Package emacs. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 30 Oct 2025 09:34:59 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Oct 30 05:34:59 2025
Received: from localhost ([127.0.0.1]:34182 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1vEP3a-0000ml-Mu
	for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 05:34:59 -0400
Received: from lists.gnu.org ([2001:470:142::17]:57270)
 by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1vEP3S-0000mO-HV
 for submit <at> debbugs.gnu.org; Thu, 30 Oct 2025 05:34:51 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1vEP3F-0005t1-8e
 for bug-gnu-emacs@HIDDEN; Thu, 30 Oct 2025 05:34:39 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eliz@HIDDEN>) id 1vEP3C-0006XD-NS
 for bug-gnu-emacs@HIDDEN; Thu, 30 Oct 2025 05:34:36 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-version:Subject:To:From:Date:in-reply-to:
 references; bh=CgOFuZW3oeoHVmJ1rRGxUhM0BFFOhNC61MaiFdyLYq8=; b=UbD/vhANXHrJH0
 tsrMkvOv3jJPhjp/VFHqZXuRLIU2c22cD0jAj4kX5MmOiYlBnFB0U3trZ2165/p64h3Crrv9LoOmB
 5iF1x7dEgBmDnsL8Xaw6ke1LkaZWh98Sypt9bgorUKhilfbW+Jn2PqhXPDHV5HSFD0DghpYRkDlRg
 YxdL7EETlq2IRKIU2uiNOGUDoIZP7hYjWpyRITuYt8OvLJPazZh3XsgALtZAwbqPxa0ATFnJNO6po
 5OPmAOJn8ArrEiP5dBvPyp92PVZhbE76f5YGHerJ5oPE+t+OhE/liQUYZdOIzzsBntWYzAl796hCP
 O6B4rPpXs2r2pQbt//3w==;
Date: Thu, 30 Oct 2025 11:34:18 +0200
Message-Id: <86ikfwn36t.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: bug-gnu-emacs@HIDDEN
Subject: 31.0.50; No easy way of searching a buffer for raw bytes
X-Debbugs-Cc: Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattiase@HIDDEN>
X-Debbugs-Cc: Stefan Monnier <monnier@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -0.0 (/)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

From: eliz@HIDDEN
--text follows this line--
As the subject says, how can a user easily search for raw bytes in a
buffer?  Or how can a Lisp program quickly scan a buffer to find raw
bytes and either remove or replace them?

To reproduce, start "emacs -Q" then insert a raw byte by typing

  C-x 8 RET 3fffe0 RET

Then try to come up with a regexp that finds only the raw byte.

This is important when one has a buffer which could include raw bytes,
and wants to json-serialize it, in which case there's a need to remove
raw bytes or replace them with something that will avoid signaling an
error from the serialization code.

The only way I found is to examine the buffer one character at a time
using charset-after.  But this is tedious and inefficient.

I seem to be unable to find a way to express this with regexps.  The
naïve way would be "[\u3fff00-\u3fffff]", but that doesn't work (it
finds ASCII letters and nothing else).  Nothing else I tried worked,
including the recipe from the ELisp manual:

       4. If the end points of a range are raw 8-bit bytes (*note Text
          Representations::), or if the range start is ASCII and the end
          is a raw byte (as in ‘[a-\377]’), the range will match only
          ASCII characters and raw 8-bit bytes, but not non-ASCII
          characters.  This feature is intended for searching text in
          unibyte buffers and strings.

In a buffer that includes only ASCII characters and a raw byte, typing
"C-M-s [a-\377]" signals an error "Failing regexp search.

Is there solution for this job that I'm missing?  If so, we should at
least document it.  If there's no solution currently, I think we
should add something to make it easier.


In GNU Emacs 31.0.50 (build 1458, i686-pc-mingw32) of 2025-10-30 built
 on ELIZ-PC
Repository revision: 06b3f11cb8f040d192a91972b40eab8c85a2cc5b
Repository branch: master
Windowing system distributor 'Microsoft Corp.', version 10.0.26100
System Description: Microsoft Windows 10 Enterprise (v10.0.2009.26100.6899)

Configured using:
 'configure -C --prefix=/d/usr --with-wide-int
 --without-native-compilation --enable-checking=yes,glyphs 'CFLAGS=-O0
 -gdwarf-4 -g3''

Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG LCMS2 LIBXML2 MODULES NOTIFY W32NOTIFY
PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS
TREE_SITTER WEBP XPM ZLIB

Important settings:
  value of $LANG: ENG
  locale-coding-system: cp1252

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  minibuffer-nonselected-mode: t
  minibuffer-regexp-mode: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug lisp-mnt message mailcap yank-media puny
dired dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg
rfc6068 epg-config gnus-util text-property-search time-date subr-x
mm-decode mm-bodies mm-encode mailabbrev gmm-utils mailheader sendmail
mail-parse rfc2231 rfc2047 rfc2045 ietf-drums mm-util mail-prsvr
mail-utils warnings icons cl-loaddefs cl-lib rmc iso-transl tooltip
cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type
elisp-mode mwheel touch-screen dos-w32 ls-lisp term/w32-nt disp-table
term/w32-win w32-win w32-vars term/common-win tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu
timer select scroll-bar mouse jit-lock font-lock syntax font-core
term/tty-colors frame minibuffer nadvice seq simple cl-generic
indonesian philippine cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
composite emoji-zwj charscript charprop case-table epa-hook
jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs
theme-loaddefs faces cus-face macroexp files window text-properties
overlay sha1 md5 base64 format env code-pages mule custom widget keymap
hashtable-print-readable backquote threads w32notify w32 lcms2 multi-tty
move-toolbar make-network-process tty-child-frames emacs)

Memory information:
((conses 16 46888 16793) (symbols 48 6655 0) (strings 16 16703 2197)
 (string-bytes 1 346778) (vectors 16 9844)
 (vector-slots 8 115806 11013) (floats 8 23 6) (intervals 40 310 75)
 (buffers 928 10))




Acknowledgement sent to Eli Zaretskii <eliz@HIDDEN>:
New bug report received and forwarded. Copy sent to monnier@HIDDEN, bug-gnu-emacs@HIDDEN. Full text available.
Report forwarded to monnier@HIDDEN, bug-gnu-emacs@HIDDEN:
bug#79724; Package emacs. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Thu, 30 Oct 2025 11:30:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.