Received: (at 81115) by debbugs.gnu.org; 30 May 2026 14:41:02 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sat May 30 10:41:01 2026 Received: from localhost ([127.0.0.1]:42821 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1wTKs1-0002HU-64 for submit <at> debbugs.gnu.org; Sat, 30 May 2026 10:41:01 -0400 Received: from mail-244123.protonmail.ch ([109.224.244.123]:14051) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from <pipcet@HIDDEN>) id 1wTKry-0002HB-5D for 81115 <at> debbugs.gnu.org; Sat, 30 May 2026 10:40:59 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1780152051; x=1780411251; bh=5a2m086o4EkS5pCOQacZKv85ZrNRI80Ybw6+3oWXkaU=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=IfojatZx1mE2v/Vz6Kas3wc7Km3D7dBFzGUriR/ZxTpmZQ+JzwW8uWQ0ZLii1ciiv b+glMwh4rPYjPzZJn8dnzeD3AUq5M0NPvntrjBk0iQJpS6Nh2YiTqKUEZO1Hc/hZrr LplxbLJniEGt+PxzYSIZbPuq/EEO4xSFZ5Ucbp4X23G3/T7Npyt/ffk4VUXs86zoOg N8YfEnM8mSz2ZjFfh3WC+d2Jow/7B0MhbHgsV4RHd25ByuBWtVJF1oxGGZh0vwvOhf CeHn0lcO9Chb+RH5ruuo+hPfoJ0iYqZjo4PUOUc/spFz6FOfZl0wqmRDnaxU7USDyc JyahnQUeVw64A== Date: Sat, 30 May 2026 14:40:44 +0000 To: Eli Zaretskii <eliz@HIDDEN> From: Pip Cet <pipcet@HIDDEN> Subject: Re: bug#81115: 32.0.50; Improving FOR_EACH_TAIL Message-ID: <87fr392c4l.fsf@HIDDEN> In-Reply-To: <8633z917z7.fsf@HIDDEN> References: <87ik8b7h4c.fsf@HIDDEN> <8633z917z7.fsf@HIDDEN> Feedback-ID: 112775352:user:proton X-Pm-Message-ID: 68f96963f5a240d0dba80871a70148a72c65e80d MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 81115 Cc: =?utf-8?Q?Mattias_Engdeg=C3=A5rd?= <mattiase@HIDDEN>, Paul Eggert <eggert@HIDDEN>, Helmut Eller <eller.helmut@HIDDEN>, Stefan Monnier <monnier@HIDDEN>, 81115 <at> debbugs.gnu.org X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.7 (-) "Eli Zaretskii" <eliz@HIDDEN> writes: >> Date: Mon, 25 May 2026 13:28:17 +0000 >> From: Pip Cet via "Bug reports for GNU Emacs, >> the Swiss army knife of text editors" <bug-gnu-emacs@HIDDEN> >> >> This is a proposal to improve the implementation of our FOR_EACH_TAIL >> macro, by reducing the impact of cycle detection and quit checks: the >> common case is a short proper list, so let's optimize for that! >> >> Here's the proposed code: >> >> #define FOR_EACH_TAIL_STEP_CYCLEP(tail, check_quit)=09=09=09\ >> ((tail) =3D XCDR (tail),=09=09=09=09=09=09\ >> (__builtin_expect (BASE_EQ (tail, li.tortoise), 0)=09=09=09\ >> || (__builtin_expect (!((++li.l) & (FOR_EACH_TAIL_THRESHOLD-1)), 0)= =09\ >> =09&& (((check_quit) ? maybe_quit () : (void) 0, false)=09=09\ >> =09 || (POWER_OF_2 (li.l) && (li.tortoise =3D (tail), false)))))) >> >> First, correctness: both algorithms work by checking that the i-th >> element we're looking at isn't the same that we saw k_i steps k_i ago >> (the old algorithm skips some checks, but that doesn't matter). To >> ensure we detect cycles starting arbitarily late, we need to ensure (i - >> k_i) is unbounded. To ensure we detect cycles of any given length, we >> need to ensure that (k_i) is also unbounded, and reaches multiples of >> each natural number infinitely often. Both properties are unchanged by >> the modification of a finite number of (k_i), which is all I've done: >> the old sequence is >> >> 1, 2, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6, 7, 8, .... >> >> (the last check before a "1" is skipped) >> >> the new sequence is >> >> 1, 2, 3, ..., 255, 256, 1, 2, ..., 255, 256, 1, 2, ..., 511, 512, ... >> >> (no checks are skipped) >> >> In practice, the new algorithm reduces the memory overhead of cycle >> checking to two words (easily kept in registers), rather than four >> words, and eliminates all quit checking for lists shorter than 256 >> elements. >> >> That "256" is a tunable parameter, but there is a weak-ish argument for >> using 256: x86 CPUs have special instructions to access the least >> significant byte of a register, and those may save a few bytes in the >> generated code. >> >> The existing algorithm, on current systems, uses 4 words of storage: one >> for the "tortoise" object, one for a 64-bit counter, and two more for an >> 80-bit counter that is split across two variables. >> >> The new algorithm uses 2 words of storage: one for the "tortoise" object >> and one for a simple counter indicating how many iterations have >> happened. It is incremented once per iteration and never otherwise >> modified, and it is used only to check whether it is a multiple of 256 >> or a power of two. >> >> The existing algorithm checks quits very often when operating on short >> lists: in the first 256 iterations, there are 7 calls to maybe_quit, and >> 4 of them happen even for a list of 30 elements. >> >> The new algorithm checks quits only after 256 iterations, and then once >> per 256 iterations. >> >> The structure of the new algorithm is as follows: >> >> 1. cycle check; unlikely branch to cold code if cycle detected >> 2. increment the counter >> 3. check the least significant byte of the counter >> 4. continue the loop if !=3D 0 (likely) >> 5. check quit >> 6. if the counter is a power of two, advance the tortoise >> 7. continue the loop >> >> The old algorithm did this: >> >> 1. decrement the LSB 16 bits of the split counter >> 2. if nonzero, perform cycle check, then continue the loop >> 3. check quit >> 4. decrement the MSB 64 bits of the split counter >> 5. if nonzero, continue the loop >> 6. reset the split counter to the next power of two >> 7. advance the tortoise >> 8. continue the loop >> >> The new algorithm sometimes does sometimes consider elements in a cycle >> several times, as does the old one. That the new algorithm, in the worst >> case, considers the same element 255 times rather than twice seems a >> small price to pay. It should be noted that this happens only for >> dotted lists with a linear and a circular part, not for simple short >> cycles which are now detected after the first loop. >> >> The old step 6, in particular, is gone entirely: no bit shifts are >> necessary, we don't have to reinitialize a count-down timer; instead, we >> rely on the very fast POWER_OF_2 macro. >> >> These changes preserve the precise detection of cycle length in Brent's >> algorithm, which Fnthcdr relies on for performance. However, Fnthcdr >> previously assumed that simple short cycles weren't detected as early as >> the new code does, leading to integer overflow. >> >> If necessary, we can split the changes into three phases: Phase 1 is >> rewriting the counter logic to use a single counter. Phase 2 is >> reducing the frequency of quit checks and tortoise advancement to happen >> only from the 256th iteration. Phase 3 is adding __builtin_expect >> expressions, which make the C code less readable but improve the >> generated code. >> >> A final improvement is the termination condition: the old code used >> !CONSP (tail), which is logically correct but misleading: the usual case >> is NILP (tail), and telling the compiler so explicitly does improve the >> code somewhat. We still need to handle !NILP (tail) && !CONSP (tail), >> but that is unlikely. >> >> Further improvements are desperately needed in this area: we need to >> move to checking only a single flag in maybe_quit (Helmut has a patch, >> IIRC), and not to call probably_quit when quit is inhibited, for >> example. slow_eq should be removed again once we reimplement a >> positioning mechanism based on cons cells, or we should give SWP their >> own tag to speed up EQ in the common case. >> >> These questions still need to be answered: >> >> 1. Is 256 the right multiple to use for quit checks and tortoise >> advancement? >> >> 2. Do we want to use __builtin_expect here, and if so, do we want to >> hide it behind a LIKELY macro as most other projects do? Can we add a >> "probability" weight to LIKELY to facilitate optimization further? >> >> For debug builds only, we probably want to call maybe_quit at least once >> in each FOR_EACH_TAIL, to catch bugs when quitting happens. For >> production builds, the threshold of 256 is so high that these unsafe >> quits will happen very rarely. However, I think this change should also >> involve renaming FOR_EACH_TAIL to FOR_EACH_TAIL_QUIT and >> FOR_EACH_TAIL_NOQUIT depending on whether quitting is safe, so it'll >> have to happen separately. >> >> I've not been able to perform reliable benchmarks of this, and it's not >> quite clear what we should be benchmarking: obviously there is some >> slowdown for circular lists, but that's intentional; in exchange, the >> lists we're actually seeing in this macro should be faster, but the >> effect is likely to be too small to measure reliably on hardware which >> changes its clock frequency unpredictably. However, looking at the >> disassembled code does indicate that it has improved significantly. >> >> Full patch follows: > > Thanks, I've added a few people who might have comments or suggestions > to this. Thank you! I tried following https://debbugs.gnu.org/Reporting.html#xcc ; I guess that didn't work, then, sorry about that. I'll try using a pseudo header rather than a standard mail header next time. (I'd selected the same people, plus Helmut, whom I've included in this message). Pip
bug-gnu-emacs@HIDDEN:bug#81115; Package emacs.
Full text available.
Received: (at 81115) by debbugs.gnu.org; 30 May 2026 10:56:43 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat May 30 06:56:42 2026
Received: from localhost ([127.0.0.1]:38722 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1wTHMv-0004a6-Dj
for submit <at> debbugs.gnu.org; Sat, 30 May 2026 06:56:42 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:59894)
by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.84_2) (envelope-from <eliz@HIDDEN>) id 1wTHMs-0004Zk-TR
for 81115 <at> debbugs.gnu.org; Sat, 30 May 2026 06:56:40 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e])
by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <eliz@HIDDEN>)
id 1wTHMm-0002lo-0Z; Sat, 30 May 2026 06:56:32 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From:
Date; bh=1xC3GZ16l8qkawv+deawVcJUuLYZKL/A6sPy7p3JSgc=; b=DTeuMw2NRqjn8HXDSmXX
98Vj/q+9SWz8xRxPFBdxDD0C0WiTpeVABvposhM5sdKWL6H3yPfna0ffzVFxaV/3o/bZzulVsKhZk
Mic+YIExgdMoLBrnLTOE7iCfiZrMwbRA/JijgY0Mdog+wwT5xv0qM9TJ8lN1YL06u0S+rvTnfDRkB
DkIycItIKrl2CZr+gPzR/U1nScuCJLqP/LXQ9jNoqph6WoggP/KHwEYRmiUNaOFCr7TQC4sqMQok/
tCUY8rbjjWmVmvmh5kye6CO8ANxk6E+Nf0qQd2i5VUG31/3ucdTr3MM3T6RBE7Is46oPbaYGHTEGq
zRWFIoHPaSG8Cw==;
Date: Sat, 30 May 2026 13:56:28 +0300
Message-Id: <8633z917z7.fsf@HIDDEN>
From: Eli Zaretskii <eliz@HIDDEN>
To: Pip Cet <pipcet@HIDDEN>, Stefan Monnier <monnier@HIDDEN>,
Mattias =?utf-8?Q?Engdeg=C3=A5rd?= <mattiase@HIDDEN>,
Paul Eggert <eggert@HIDDEN>
In-Reply-To: <87ik8b7h4c.fsf@HIDDEN> (bug-gnu-emacs@HIDDEN)
Subject: Re: bug#81115: 32.0.50; Improving FOR_EACH_TAIL
References: <87ik8b7h4c.fsf@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 81115
Cc: 81115 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)
> Date: Mon, 25 May 2026 13:28:17 +0000
> From: Pip Cet via "Bug reports for GNU Emacs,
> the Swiss army knife of text editors" <bug-gnu-emacs@HIDDEN>
>
> This is a proposal to improve the implementation of our FOR_EACH_TAIL
> macro, by reducing the impact of cycle detection and quit checks: the
> common case is a short proper list, so let's optimize for that!
>
> Here's the proposed code:
>
> #define FOR_EACH_TAIL_STEP_CYCLEP(tail, check_quit) \
> ((tail) = XCDR (tail), \
> (__builtin_expect (BASE_EQ (tail, li.tortoise), 0) \
> || (__builtin_expect (!((++li.l) & (FOR_EACH_TAIL_THRESHOLD-1)), 0) \
> && (((check_quit) ? maybe_quit () : (void) 0, false) \
> || (POWER_OF_2 (li.l) && (li.tortoise = (tail), false))))))
>
> First, correctness: both algorithms work by checking that the i-th
> element we're looking at isn't the same that we saw k_i steps k_i ago
> (the old algorithm skips some checks, but that doesn't matter). To
> ensure we detect cycles starting arbitarily late, we need to ensure (i -
> k_i) is unbounded. To ensure we detect cycles of any given length, we
> need to ensure that (k_i) is also unbounded, and reaches multiples of
> each natural number infinitely often. Both properties are unchanged by
> the modification of a finite number of (k_i), which is all I've done:
> the old sequence is
>
> 1, 2, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6, 7, 8, ....
>
> (the last check before a "1" is skipped)
>
> the new sequence is
>
> 1, 2, 3, ..., 255, 256, 1, 2, ..., 255, 256, 1, 2, ..., 511, 512, ...
>
> (no checks are skipped)
>
> In practice, the new algorithm reduces the memory overhead of cycle
> checking to two words (easily kept in registers), rather than four
> words, and eliminates all quit checking for lists shorter than 256
> elements.
>
> That "256" is a tunable parameter, but there is a weak-ish argument for
> using 256: x86 CPUs have special instructions to access the least
> significant byte of a register, and those may save a few bytes in the
> generated code.
>
> The existing algorithm, on current systems, uses 4 words of storage: one
> for the "tortoise" object, one for a 64-bit counter, and two more for an
> 80-bit counter that is split across two variables.
>
> The new algorithm uses 2 words of storage: one for the "tortoise" object
> and one for a simple counter indicating how many iterations have
> happened. It is incremented once per iteration and never otherwise
> modified, and it is used only to check whether it is a multiple of 256
> or a power of two.
>
> The existing algorithm checks quits very often when operating on short
> lists: in the first 256 iterations, there are 7 calls to maybe_quit, and
> 4 of them happen even for a list of 30 elements.
>
> The new algorithm checks quits only after 256 iterations, and then once
> per 256 iterations.
>
> The structure of the new algorithm is as follows:
>
> 1. cycle check; unlikely branch to cold code if cycle detected
> 2. increment the counter
> 3. check the least significant byte of the counter
> 4. continue the loop if != 0 (likely)
> 5. check quit
> 6. if the counter is a power of two, advance the tortoise
> 7. continue the loop
>
> The old algorithm did this:
>
> 1. decrement the LSB 16 bits of the split counter
> 2. if nonzero, perform cycle check, then continue the loop
> 3. check quit
> 4. decrement the MSB 64 bits of the split counter
> 5. if nonzero, continue the loop
> 6. reset the split counter to the next power of two
> 7. advance the tortoise
> 8. continue the loop
>
> The new algorithm sometimes does sometimes consider elements in a cycle
> several times, as does the old one. That the new algorithm, in the worst
> case, considers the same element 255 times rather than twice seems a
> small price to pay. It should be noted that this happens only for
> dotted lists with a linear and a circular part, not for simple short
> cycles which are now detected after the first loop.
>
> The old step 6, in particular, is gone entirely: no bit shifts are
> necessary, we don't have to reinitialize a count-down timer; instead, we
> rely on the very fast POWER_OF_2 macro.
>
> These changes preserve the precise detection of cycle length in Brent's
> algorithm, which Fnthcdr relies on for performance. However, Fnthcdr
> previously assumed that simple short cycles weren't detected as early as
> the new code does, leading to integer overflow.
>
> If necessary, we can split the changes into three phases: Phase 1 is
> rewriting the counter logic to use a single counter. Phase 2 is
> reducing the frequency of quit checks and tortoise advancement to happen
> only from the 256th iteration. Phase 3 is adding __builtin_expect
> expressions, which make the C code less readable but improve the
> generated code.
>
> A final improvement is the termination condition: the old code used
> !CONSP (tail), which is logically correct but misleading: the usual case
> is NILP (tail), and telling the compiler so explicitly does improve the
> code somewhat. We still need to handle !NILP (tail) && !CONSP (tail),
> but that is unlikely.
>
> Further improvements are desperately needed in this area: we need to
> move to checking only a single flag in maybe_quit (Helmut has a patch,
> IIRC), and not to call probably_quit when quit is inhibited, for
> example. slow_eq should be removed again once we reimplement a
> positioning mechanism based on cons cells, or we should give SWP their
> own tag to speed up EQ in the common case.
>
> These questions still need to be answered:
>
> 1. Is 256 the right multiple to use for quit checks and tortoise
> advancement?
>
> 2. Do we want to use __builtin_expect here, and if so, do we want to
> hide it behind a LIKELY macro as most other projects do? Can we add a
> "probability" weight to LIKELY to facilitate optimization further?
>
> For debug builds only, we probably want to call maybe_quit at least once
> in each FOR_EACH_TAIL, to catch bugs when quitting happens. For
> production builds, the threshold of 256 is so high that these unsafe
> quits will happen very rarely. However, I think this change should also
> involve renaming FOR_EACH_TAIL to FOR_EACH_TAIL_QUIT and
> FOR_EACH_TAIL_NOQUIT depending on whether quitting is safe, so it'll
> have to happen separately.
>
> I've not been able to perform reliable benchmarks of this, and it's not
> quite clear what we should be benchmarking: obviously there is some
> slowdown for circular lists, but that's intentional; in exchange, the
> lists we're actually seeing in this macro should be faster, but the
> effect is likely to be too small to measure reliably on hardware which
> changes its clock frequency unpredictably. However, looking at the
> disassembled code does indicate that it has improved significantly.
>
> Full patch follows:
Thanks, I've added a few people who might have comments or suggestions
to this.
> >From 26cdf0d948dd7ff86c9a9f9602b23dee3464ee66 Mon Sep 17 00:00:00 2001
> From: Pip Cet <pipcet@HIDDEN>
> Date: Mon, 25 May 2026 09:10:23 +0000
> Subject: [PATCH 1/2] Improve FOR_EACH_TAIL (bug#XXXXX)
>
> * src/lisp.h (struct for_each_tail_internal): Reduce to two words.
> (FOR_EACH_TAIL_BASIC): Add compiler hint to indicate that tail is most
> likely Qnil after the loop and a non-nil non-cons is unlikely.
> (FOR_EACH_TAIL_STEP_CYCLEP): Rewrite.
> ---
> src/lisp.h | 48 ++++++++++++++++++++++++++++++++----------------
> 1 file changed, 32 insertions(+), 16 deletions(-)
>
> diff --git a/src/lisp.h b/src/lisp.h
> index d4978460f68..4a566f1aa66 100644
> --- a/src/lisp.h
> +++ b/src/lisp.h
> @@ -5833,7 +5833,8 @@ #define AUTO_STR_WITH_LEN(str, len) \
> set TAIL to the current cons. If the loop exits normally,
> set TAIL to the terminating non-cons, typically nil. The loop body
> should not modify the list’s top level structure other than by
> - perhaps deleting the current cons. */
> + perhaps deleting the current cons. In practice, this macro rarely
> + quits. */
>
> #define FOR_EACH_TAIL(tail) \
> FOR_EACH_TAIL_INTERNAL (tail, circular_list (tail), true)
> @@ -5847,16 +5848,17 @@ #define FOR_EACH_TAIL_SAFE(tail) \
> /* Iterator intended for use only within FOR_EACH_TAIL_INTERNAL. */
> struct for_each_tail_internal
> {
> + /* The object we're comparing to. */
> Lisp_Object tortoise;
> - intptr_t max, n;
> - unsigned short int q;
> + /* How many iterations of the algorithm have happened. */
> + ptrdiff_t l;
> };
>
> /* Like FOR_EACH_TAIL (LIST), except evaluate CYCLE if a cycle is
> found, and check for quit if CHECK_QUIT. This is an internal macro
> intended for use only by the above macros.
>
> - Use Brent’s teleporting tortoise-hare algorithm. See:
> + Modifies Brent’s teleporting tortoise-hare algorithm. See:
> Brent RP. BIT. 1980;20(2):176-184. doi:10.1007/BF01933190
> https://maths-people.anu.edu.au/~brent/pd/rpb051i.pdf
>
> @@ -5871,20 +5873,34 @@ #define FOR_EACH_TAIL_INTERNAL(tail, cycle, check_quit) \
> FOR_EACH_TAIL_STEP_CYCLEP (tail, check_quit) \
> ? (cycle) : (void) 0)
>
> -#define FOR_EACH_TAIL_BASIC(tail, stepper) \
> - for (struct for_each_tail_internal li = { tail, 2, 0, 2 }; \
> - CONSP (tail); stepper)
> +#define FOR_EACH_TAIL_BASIC(tail, stepper) \
> + for (struct for_each_tail_internal li = { tail, 0 }; \
> + CONSP (tail) || (__builtin_expect (!NILP (tail), 0), false); \
> + stepper)
>
> -/* Step TAIL and return whether a cycle has been detected.
> - If CHECK_QUIT then check for quit occasionally. */
> -#define FOR_EACH_TAIL_STEP_CYCLEP(tail, check_quit) \
> - ((tail) = XCDR (tail), \
> - ((--li.q != 0 \
> - || ((check_quit) ? maybe_quit () : (void) 0, 0 < --li.n) \
> - || (li.q = li.n = li.max <<= 1, li.n >>= USHRT_WIDTH, \
> - li.tortoise = (tail), false)) \
> - && BASE_EQ (tail, li.tortoise)))
> +/* How many iterations until we start advancing the tortoise, which is
> + necessary to detect dotted lists which end in a cycle. Also, number
> + of iterations until we check for quits. Must be a power of two. */
>
> +#define FOR_EACH_TAIL_THRESHOLD 256
> +
> +static_assert (POWER_OF_2 (FOR_EACH_TAIL_THRESHOLD));
> +
> +/* Step TAIL and return whether a cycle has been detected. If
> + CHECK_QUIT then check for quit occasionally. We only check for quits
> + once every 256 iterations, and we don't advance the tortoise until
> + that happens for the first time.
> +
> + The C comma operator is used here to avoid statement expressions: the
> + expression evaluates to the value of BASE_EQ (tail, li.tortoise), but
> + has additional side effects, possibly exiting nonlocally. */
> +
> +#define FOR_EACH_TAIL_STEP_CYCLEP(tail, check_quit) \
> + ((tail) = XCDR (tail), \
> + (__builtin_expect (BASE_EQ (tail, li.tortoise), 0) \
> + || (__builtin_expect (!((++li.l) & (FOR_EACH_TAIL_THRESHOLD-1)), 0) \
> + && (((check_quit) ? maybe_quit () : (void) 0, false) \
> + || (POWER_OF_2 (li.l) && (li.tortoise = (tail), false))))))
>
> /* Do a `for' loop over alist values. */
>
> --
> 2.54.0
>
> >From c3a53e7458ec123bd903563c4c6bda6954ccdffe Mon Sep 17 00:00:00 2001
> From: Pip Cet <pipcet@HIDDEN>
> Date: Mon, 25 May 2026 11:28:38 +0000
> Subject: [PATCH 2/2] Avoid relying on FOR_EACH_TAIL internals in 'Fnthcdr'
> (bug#XXXXX)
>
> The new FOR_EACH_TAIL code detects simple cycles sooner than the old
> code did, leading to integer overflows.
>
> * src/fns.c (Fnthcdr): Avoid integer overflow if cycle is detected
> early.
> ---
> src/fns.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/src/fns.c b/src/fns.c
> index e3d297102ca..2d9a3a3619b 100644
> --- a/src/fns.c
> +++ b/src/fns.c
> @@ -1824,7 +1824,10 @@ DEFUN ("nthcdr", Fnthcdr, Snthcdr, 2, 2, 0,
> mpz_export (&iz, NULL, -1, sizeof iz, 0, 0, mpz[0]);
> num += iz;
> }
> - num += cycle_length - large_num % cycle_length;
> + if (num < cycle_length)
> + num += cycle_length;
> + num -= large_num % cycle_length;
> + eassume (num >= 0);
> }
> num %= cycle_length;
>
> --
> 2.54.0
>
>
>
>
>
bug-gnu-emacs@HIDDEN:bug#81115; Package emacs.
Full text available.
Received: (at submit) by debbugs.gnu.org; 25 May 2026 13:28:42 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon May 25 09:28:41 2026
Received: from localhost ([127.0.0.1]:47431 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1wRVMG-0007UX-Dx
for submit <at> debbugs.gnu.org; Mon, 25 May 2026 09:28:41 -0400
Received: from lists1p.gnu.org ([2001:470:142::17]:41688)
by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.84_2) (envelope-from <pipcet@HIDDEN>)
id 1wRVMC-0007UD-4W
for submit <at> debbugs.gnu.org; Mon, 25 May 2026 09:28:38 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <pipcet@HIDDEN>)
id 1wRVM6-0003cy-Ei
for bug-gnu-emacs@HIDDEN; Mon, 25 May 2026 09:28:30 -0400
Received: from mail-06.mail-europe.com ([85.9.210.45])
by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
(Exim 4.90_1) (envelope-from <pipcet@HIDDEN>)
id 1wRVM2-0003sC-Uj
for bug-gnu-emacs@HIDDEN; Mon, 25 May 2026 09:28:30 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com;
s=protonmail3; t=1779715701; x=1779974901;
bh=Jw5Yqq4obsurlDgBKL1frkC2hjNMUrOr7p/M6H7EuVY=;
h=Date:To:From:Subject:Message-ID:Feedback-ID:From:To:Cc:Date:
Subject:Reply-To:Feedback-ID:Message-ID:BIMI-Selector;
b=doF0x6Hi7DOIccPs3e1Nd3PHatnEs/Vg5r82zG6vVs7gNIbDHYXHdPnDuWSblpRW+
r9JFtCpjOqQBUW3lFYM1M081vJiIEys80IMoNBi7kwqHV+MaEchBFPzKDT4Z91tVRZ
+JyvnUd2g11AIOFGp8O0imF5L65NcFTqDHkspOg4GXf4PXj85EgsfBaguMQNhDeEeg
Hnur71nR4DEutZIaw8qQSlbMcyIbMpXBt6uE1T6Yo6dIEy34EVeHbnXhx60nvr+TTt
D3uH7P0QdrJIT3O5u40Q9VaSvS7Fep4t0YFimAZGUK1VztHRWhxDXt+iWJmQ8Iy6gU
H05SctesokkEg==
Date: Mon, 25 May 2026 13:28:17 +0000
To: bug-gnu-emacs@HIDDEN
From: Pip Cet <pipcet@HIDDEN>
Subject: 32.0.50; Improving FOR_EACH_TAIL
Message-ID: <87ik8b7h4c.fsf@HIDDEN>
Feedback-ID: 112775352:user:proton
X-Pm-Message-ID: 5c6cb3e74ce27cac18c2f0e38bfc17235b6fd401
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Received-SPF: pass client-ip=85.9.210.45; envelope-from=pipcet@HIDDEN;
helo=mail-06.mail-europe.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: 1.7 (+)
X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org",
has NOT identified this incoming email as spam. The original
message has been attached to this so you can view it or label
similar future email. If you have any questions, see
the administrator of that system for details.
Content preview: This is a proposal to improve the implementation of our
FOR_EACH_TAIL
macro, by reducing the impact of cycle detection and quit checks: the common
case is a short proper list, so let's optimize for th [...]
Content analysis details: (1.7 points, 10.0 required)
pts rule name description
---- ---------------------- --------------------------------------------------
-0.0 SPF_HELO_PASS SPF: HELO matches SPF record
1.0 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail)
0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail
provider (pipcet[at]protonmail.com)
0.7 SPOOFED_FREEMAIL No description available.
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.0 (/)
This is a proposal to improve the implementation of our FOR_EACH_TAIL
macro, by reducing the impact of cycle detection and quit checks: the
common case is a short proper list, so let's optimize for that!
Here's the proposed code:
#define FOR_EACH_TAIL_STEP_CYCLEP(tail, check_quit)=09=09=09\
((tail) =3D XCDR (tail),=09=09=09=09=09=09\
(__builtin_expect (BASE_EQ (tail, li.tortoise), 0)=09=09=09\
|| (__builtin_expect (!((++li.l) & (FOR_EACH_TAIL_THRESHOLD-1)), 0)=09\
=09&& (((check_quit) ? maybe_quit () : (void) 0, false)=09=09\
=09 || (POWER_OF_2 (li.l) && (li.tortoise =3D (tail), false))))))
First, correctness: both algorithms work by checking that the i-th
element we're looking at isn't the same that we saw k_i steps k_i ago
(the old algorithm skips some checks, but that doesn't matter). To
ensure we detect cycles starting arbitarily late, we need to ensure (i -
k_i) is unbounded. To ensure we detect cycles of any given length, we
need to ensure that (k_i) is also unbounded, and reaches multiples of
each natural number infinitely often. Both properties are unchanged by
the modification of a finite number of (k_i), which is all I've done:
the old sequence is
1, 2, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6, 7, 8, ....
(the last check before a "1" is skipped)
the new sequence is
1, 2, 3, ..., 255, 256, 1, 2, ..., 255, 256, 1, 2, ..., 511, 512, ...
(no checks are skipped)
In practice, the new algorithm reduces the memory overhead of cycle
checking to two words (easily kept in registers), rather than four
words, and eliminates all quit checking for lists shorter than 256
elements.
That "256" is a tunable parameter, but there is a weak-ish argument for
using 256: x86 CPUs have special instructions to access the least
significant byte of a register, and those may save a few bytes in the
generated code.
The existing algorithm, on current systems, uses 4 words of storage: one
for the "tortoise" object, one for a 64-bit counter, and two more for an
80-bit counter that is split across two variables.
The new algorithm uses 2 words of storage: one for the "tortoise" object
and one for a simple counter indicating how many iterations have
happened. It is incremented once per iteration and never otherwise
modified, and it is used only to check whether it is a multiple of 256
or a power of two.
The existing algorithm checks quits very often when operating on short
lists: in the first 256 iterations, there are 7 calls to maybe_quit, and
4 of them happen even for a list of 30 elements.
The new algorithm checks quits only after 256 iterations, and then once
per 256 iterations.
The structure of the new algorithm is as follows:
1. cycle check; unlikely branch to cold code if cycle detected
2. increment the counter
3. check the least significant byte of the counter
4. continue the loop if !=3D 0 (likely)
5. check quit
6. if the counter is a power of two, advance the tortoise
7. continue the loop
The old algorithm did this:
1. decrement the LSB 16 bits of the split counter
2. if nonzero, perform cycle check, then continue the loop
3. check quit
4. decrement the MSB 64 bits of the split counter
5. if nonzero, continue the loop
6. reset the split counter to the next power of two
7. advance the tortoise
8. continue the loop
The new algorithm sometimes does sometimes consider elements in a cycle
several times, as does the old one. That the new algorithm, in the worst
case, considers the same element 255 times rather than twice seems a
small price to pay. It should be noted that this happens only for
dotted lists with a linear and a circular part, not for simple short
cycles which are now detected after the first loop.
The old step 6, in particular, is gone entirely: no bit shifts are
necessary, we don't have to reinitialize a count-down timer; instead, we
rely on the very fast POWER_OF_2 macro.
These changes preserve the precise detection of cycle length in Brent's
algorithm, which Fnthcdr relies on for performance. However, Fnthcdr
previously assumed that simple short cycles weren't detected as early as
the new code does, leading to integer overflow.
If necessary, we can split the changes into three phases: Phase 1 is
rewriting the counter logic to use a single counter. Phase 2 is
reducing the frequency of quit checks and tortoise advancement to happen
only from the 256th iteration. Phase 3 is adding __builtin_expect
expressions, which make the C code less readable but improve the
generated code.
A final improvement is the termination condition: the old code used
!CONSP (tail), which is logically correct but misleading: the usual case
is NILP (tail), and telling the compiler so explicitly does improve the
code somewhat. We still need to handle !NILP (tail) && !CONSP (tail),
but that is unlikely.
Further improvements are desperately needed in this area: we need to
move to checking only a single flag in maybe_quit (Helmut has a patch,
IIRC), and not to call probably_quit when quit is inhibited, for
example. slow_eq should be removed again once we reimplement a
positioning mechanism based on cons cells, or we should give SWP their
own tag to speed up EQ in the common case.
These questions still need to be answered:
1. Is 256 the right multiple to use for quit checks and tortoise
advancement?
2. Do we want to use __builtin_expect here, and if so, do we want to
hide it behind a LIKELY macro as most other projects do? Can we add a
"probability" weight to LIKELY to facilitate optimization further?
For debug builds only, we probably want to call maybe_quit at least once
in each FOR_EACH_TAIL, to catch bugs when quitting happens. For
production builds, the threshold of 256 is so high that these unsafe
quits will happen very rarely. However, I think this change should also
involve renaming FOR_EACH_TAIL to FOR_EACH_TAIL_QUIT and
FOR_EACH_TAIL_NOQUIT depending on whether quitting is safe, so it'll
have to happen separately.
I've not been able to perform reliable benchmarks of this, and it's not
quite clear what we should be benchmarking: obviously there is some
slowdown for circular lists, but that's intentional; in exchange, the
lists we're actually seeing in this macro should be faster, but the
effect is likely to be too small to measure reliably on hardware which
changes its clock frequency unpredictably. However, looking at the
disassembled code does indicate that it has improved significantly.
Full patch follows:
From 26cdf0d948dd7ff86c9a9f9602b23dee3464ee66 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@HIDDEN>
Date: Mon, 25 May 2026 09:10:23 +0000
Subject: [PATCH 1/2] Improve FOR_EACH_TAIL (bug#XXXXX)
* src/lisp.h (struct for_each_tail_internal): Reduce to two words.
(FOR_EACH_TAIL_BASIC): Add compiler hint to indicate that tail is most
likely Qnil after the loop and a non-nil non-cons is unlikely.
(FOR_EACH_TAIL_STEP_CYCLEP): Rewrite.
---
src/lisp.h | 48 ++++++++++++++++++++++++++++++++----------------
1 file changed, 32 insertions(+), 16 deletions(-)
diff --git a/src/lisp.h b/src/lisp.h
index d4978460f68..4a566f1aa66 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -5833,7 +5833,8 @@ #define AUTO_STR_WITH_LEN(str, len)=09=09=09=09=09\
set TAIL to the current cons. If the loop exits normally,
set TAIL to the terminating non-cons, typically nil. The loop body
should not modify the list=E2=80=99s top level structure other than by
- perhaps deleting the current cons. */
+ perhaps deleting the current cons. In practice, this macro rarely
+ quits. */
=20
#define FOR_EACH_TAIL(tail) \
FOR_EACH_TAIL_INTERNAL (tail, circular_list (tail), true)
@@ -5847,16 +5848,17 @@ #define FOR_EACH_TAIL_SAFE(tail) \
/* Iterator intended for use only within FOR_EACH_TAIL_INTERNAL. */
struct for_each_tail_internal
{
+ /* The object we're comparing to. */
Lisp_Object tortoise;
- intptr_t max, n;
- unsigned short int q;
+ /* How many iterations of the algorithm have happened. */
+ ptrdiff_t l;
};
=20
/* Like FOR_EACH_TAIL (LIST), except evaluate CYCLE if a cycle is
found, and check for quit if CHECK_QUIT. This is an internal macro
intended for use only by the above macros.
=20
- Use Brent=E2=80=99s teleporting tortoise-hare algorithm. See:
+ Modifies Brent=E2=80=99s teleporting tortoise-hare algorithm. See:
Brent RP. BIT. 1980;20(2):176-184. doi:10.1007/BF01933190
https://maths-people.anu.edu.au/~brent/pd/rpb051i.pdf
=20
@@ -5871,20 +5873,34 @@ #define FOR_EACH_TAIL_INTERNAL(tail, cycle, check_q=
uit)=09=09=09\
=09=09 FOR_EACH_TAIL_STEP_CYCLEP (tail, check_quit)=09\
=09=09 ? (cycle) : (void) 0)
=20
-#define FOR_EACH_TAIL_BASIC(tail, stepper)=09=09=09\
- for (struct for_each_tail_internal li =3D { tail, 2, 0, 2 };=09\
- CONSP (tail); stepper)
+#define FOR_EACH_TAIL_BASIC(tail, stepper)=09=09=09=09\
+ for (struct for_each_tail_internal li =3D { tail, 0 };=09=09=09\
+ CONSP (tail) || (__builtin_expect (!NILP (tail), 0), false);=09\
+ stepper)
=20
-/* Step TAIL and return whether a cycle has been detected.
- If CHECK_QUIT then check for quit occasionally. */
-#define FOR_EACH_TAIL_STEP_CYCLEP(tail, check_quit)=09=09\
- ((tail) =3D XCDR (tail),=09=09=09=09=09\
- ((--li.q !=3D 0=09=09=09=09=09=09\
- || ((check_quit) ? maybe_quit () : (void) 0, 0 < --li.n)=09\
- || (li.q =3D li.n =3D li.max <<=3D 1, li.n >>=3D USHRT_WIDTH,=09\
-=09 li.tortoise =3D (tail), false))=09=09=09=09\
- && BASE_EQ (tail, li.tortoise)))
+/* How many iterations until we start advancing the tortoise, which is
+ necessary to detect dotted lists which end in a cycle. Also, number
+ of iterations until we check for quits. Must be a power of two. */
=20
+#define FOR_EACH_TAIL_THRESHOLD 256
+
+static_assert (POWER_OF_2 (FOR_EACH_TAIL_THRESHOLD));
+
+/* Step TAIL and return whether a cycle has been detected. If
+ CHECK_QUIT then check for quit occasionally. We only check for quits
+ once every 256 iterations, and we don't advance the tortoise until
+ that happens for the first time.
+
+ The C comma operator is used here to avoid statement expressions: the
+ expression evaluates to the value of BASE_EQ (tail, li.tortoise), but
+ has additional side effects, possibly exiting nonlocally. */
+
+#define FOR_EACH_TAIL_STEP_CYCLEP(tail, check_quit)=09=09=09\
+ ((tail) =3D XCDR (tail),=09=09=09=09=09=09\
+ (__builtin_expect (BASE_EQ (tail, li.tortoise), 0)=09=09=09\
+ || (__builtin_expect (!((++li.l) & (FOR_EACH_TAIL_THRESHOLD-1)), 0)=09=
\
+=09&& (((check_quit) ? maybe_quit () : (void) 0, false)=09=09\
+=09 || (POWER_OF_2 (li.l) && (li.tortoise =3D (tail), false))))))
=20
/* Do a `for' loop over alist values. */
=20
--=20
2.54.0
From c3a53e7458ec123bd903563c4c6bda6954ccdffe Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@HIDDEN>
Date: Mon, 25 May 2026 11:28:38 +0000
Subject: [PATCH 2/2] Avoid relying on FOR_EACH_TAIL internals in 'Fnthcdr'
(bug#XXXXX)
The new FOR_EACH_TAIL code detects simple cycles sooner than the old
code did, leading to integer overflows.
* src/fns.c (Fnthcdr): Avoid integer overflow if cycle is detected
early.
---
src/fns.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/fns.c b/src/fns.c
index e3d297102ca..2d9a3a3619b 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -1824,7 +1824,10 @@ DEFUN ("nthcdr", Fnthcdr, Snthcdr, 2, 2, 0,
=09 mpz_export (&iz, NULL, -1, sizeof iz, 0, 0, mpz[0]);
=09 num +=3D iz;
=09}
- num +=3D cycle_length - large_num % cycle_length;
+ if (num < cycle_length)
+=09num +=3D cycle_length;
+ num -=3D large_num % cycle_length;
+ eassume (num >=3D 0);
}
num %=3D cycle_length;
=20
--=20
2.54.0
Pip Cet <pipcet@HIDDEN>:bug-gnu-emacs@HIDDEN.
Full text available.bug-gnu-emacs@HIDDEN:bug#81115; Package emacs.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.