GNU bug report logs - #80021
suboptimal Lisp_Object tag check (TAGGEDP)

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Date: Wed, 17 Dec 2025 13:59:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

To reply to this bug, email your comments to 80021 AT debbugs.gnu.org.
There is no need to reopen the bug first.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#80021; Package emacs. (Wed, 17 Dec 2025 13:59:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mattias Engdegård <mattias.engdegard <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 17 Dec 2025 13:59:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Emacs Bug Report <bug-gnu-emacs <at> gnu.org>
Cc: Pip Cet <pipcet <at> protonmail.com>, Paul Eggert <eggert <at> cs.ucla.edu>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: suboptimal Lisp_Object tag check (TAGGEDP)
Date: Wed, 17 Dec 2025 14:58:13 +0100

[Message part 1 (text/plain, inline)]

The obvious way to test whether a Lisp_Object X has the tag bits T (assuming USE_LSB_TAG) would be

  (X & 0b111) == T

but the current expression used in TAGGEDP is

  (((unsigned)X - T) & 0b111) == 0

because for x86, GCC compiles the two expressions to

  MOV r,X
  AND r,7
  CMP r,T
  JE tag_matched

and

  LEA r,[X-T]
  TEST r,7
  JE tag_matched

respectively, because the latter is shorter (in bytes) and so makes more efficient use of cache and i-fetch (and possibly decode) bandwidth. (The cast to unsigned just prods GCC to save a REX prefix byte in LEA.) Previous discussion at

  https://lists.gnu.org/r/emacs-devel/2018-08/msg00828.html

However, once it's in cache the runtime should be the same: the MOV is free so in both cases we end up with a single ALU op before the fused compare-and-branch macro-op. More importantly, it's common to have multiple type tests for the same object in sequence and now the 'clever' code becomes less so:

  if (TAGGEDP(X,T1)) goto is_T1;
  else if (TAGGEDP(X,T2)) goto is_T2;

becomes

  MOV r,X
  AND r,7
  CMP r,T1
  JE is_T1
  CMP r,T2
  JE is_T2

and

  LEA r,[X-T1]
  TEST r,7
  JE is_T1
  LEA r,[X-T2]
  TEST r,7
  JE is_T2

and the second version is likely a cycle slower if the first test fails.

Of course this whole business is entirely x86-specific: on other platforms, the 'obvious' code is as good or better.
It's also compiler-specific: recent versions of Clang normalise and usually emit the 'obvious' code for both.

Thus we should keep the current 'clever' TAGGEDP for GCC on x86, for now, and use the 'obvious' form for other compilers and targets. Suggested patch attached.

Addendum: FIXNUMP can also be coded as

  TAGGEDP(X,0b010) || TAGGEDP(X,0b010)

which seems daft but could be better when testing other types at the same time, as in FIXNUMP(X) || FLOATP(X), because it exposes the tag to CSE.

[taggedp.diff (application/octet-stream, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#80021; Package emacs. (Wed, 17 Dec 2025 14:42:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: Emacs Bug Report <bug-gnu-emacs <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>,
 Pip Cet <pipcet <at> protonmail.com>
Subject: Re: suboptimal Lisp_Object tag check (TAGGEDP)
Date: Wed, 17 Dec 2025 09:40:51 -0500

> Thus we should keep the current 'clever' TAGGEDP for GCC on x86, for now,
> and use the 'obvious' form for other compilers and targets. Suggested
> patch attached.

FWIW, I'd prefer to use the same code for all architectures.
There's already way too many `#if`s in that code.


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#80021; Package emacs. (Wed, 17 Dec 2025 19:50:02 GMT) Full text and rfc822 format available.

Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: Emacs Bug Report <bug-gnu-emacs <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: suboptimal Lisp_Object tag check (TAGGEDP)
Date: Wed, 17 Dec 2025 19:48:59 +0000

Mattias Engdegård <mattias.engdegard <at> gmail.com> writes:

> The obvious way to test whether a Lisp_Object X has the tag bits T (assuming USE_LSB_TAG) would be
>
>   (X & 0b111) == T
>
> but the current expression used in TAGGEDP is
>
>   (((unsigned)X - T) & 0b111) == 0
>
> because for x86, GCC compiles the two expressions to
>
>   MOV r,X
>   AND r,7
>   CMP r,T
>   JE tag_matched
>
> and
>
>   LEA r,[X-T]
>   TEST r,7
>   JE tag_matched

IIRC, the initial idea was that the untagged pointer would now be in
"r", ready to be used, but I'm not sure GCC performs that optimization;
today's major platforms, AArch64 and x86, both have unscaled memory
offsets (LDU on AArch64, ordinary loads on x86) so it doesn't matter
much how the tag bits are removed or whether that happens once or twice.

> respectively, because the latter is shorter (in bytes) and so makes
> more efficient use of cache and i-fetch (and possibly decode)
> bandwidth. (The cast to unsigned just prods GCC to save a REX prefix
> byte in LEA.) Previous discussion at

Casting to unsigned does of course prevent the re-use of r on 64-bit
machines, so maybe we should carefully inspect the assembler code around
the call sites with and without the cast.

>   https://lists.gnu.org/r/emacs-devel/2018-08/msg00828.html
>
> However, once it's in cache the runtime should be the same: the MOV is
> free so in both cases we end up with a single ALU op before the fused
> compare-and-branch macro-op. More importantly, it's common to have
> multiple type tests for the same object in sequence and now the
> 'clever' code becomes less so:
>
>   if (TAGGEDP(X,T1)) goto is_T1;
>   else if (TAGGEDP(X,T2)) goto is_T2;
>
> becomes
>
>   MOV r,X
>   AND r,7
>   CMP r,T1
>   JE is_T1
>   CMP r,T2
>   JE is_T2
>
> and
>
>   LEA r,[X-T1]
>   TEST r,7
>   JE is_T1
>   LEA r,[X-T2]
>   TEST r,7
>   JE is_T2
>
> and the second version is likely a cycle slower if the first test fails.

A whole cycle?! Color me surprised. I fail to see the hazard which would
cause such a drastic slowdown. Does lea somehow not rename the register
as efficiently as mov does?

On x86, checking two tags at once is probably done most compactly by
generating a 32-bit mask, then left-shifting by the value of the lisp
word (which ignores all but the LSB 5 bits) and checking the MSB of the
result: (((mask) << (lisp_word)) < 0).

> Of course this whole business is entirely x86-specific: on other platforms, the 'obvious' code is as good or better.

I think of it as the equivalent of fsincos: you're usually going to need
both, so you might as well compute what we optimistically assume is the
untagged (aligned) pointer while checking the tag. I don't think current
CPUs care much about an aligned pointer being present in a register for
prefetching or speculation.

> It's also compiler-specific: recent versions of Clang normalise and usually emit the 'obvious' code for both.

That's probably not a bad decision on their part, as many such clever
optimizations are no longer worth it and normalizing them is a net gain.

> Thus we should keep the current 'clever' TAGGEDP for GCC on x86, for
> now, and use the 'obvious' form for other compilers and
> targets. Suggested patch attached.

Given the weird 32-bit cast, I agree, though maybe we should simply use
the version without the cast in the general case (or maybe in all cases:
GCC is now smart enough to use a bit test for FIXNUMP(x) || FLOATP(x),
so maybe its CSE has also improved and it can reuse the pointer?).

> Addendum: FIXNUMP can also be coded as
>
>   TAGGEDP(X,0b010) || TAGGEDP(X,0b010)
>
> which seems daft but could be better when testing other types at the same time, as in FIXNUMP(X) || FLOATP(X), because it exposes the tag to CSE.

I think you mean TAGGEDP(X, 0b010) || TAGGEDP(X, 0b110)?  I think that
might be easier for humans to understand, particularly if they're not
fully aware of the double fixnum tag.

(I'm still planning to distinguish conses and "fat conses" based on the
4th tag bit; it would not help to change all other users of TAGGEDP to
check two tags instead of one).

Pip

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#80021; Package emacs. (Thu, 18 Dec 2025 05:59:02 GMT) Full text and rfc822 format available.

Message #14 received at 80021 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 80021 <at> debbugs.gnu.org, mattias.engdegard <at> gmail.com, eggert <at> cs.ucla.edu,
 pipcet <at> protonmail.com
Subject: Re: bug#80021: suboptimal Lisp_Object tag check (TAGGEDP)
Date: Thu, 18 Dec 2025 07:57:53 +0200

> Cc: 80021 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, pipcet <at> protonmail.com
> Date: Wed, 17 Dec 2025 09:40:51 -0500
> From:  Stefan Monnier via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> > Thus we should keep the current 'clever' TAGGEDP for GCC on x86, for now,
> > and use the 'obvious' form for other compilers and targets. Suggested
> > patch attached.
> 
> FWIW, I'd prefer to use the same code for all architectures.
> There's already way too many `#if`s in that code.

I tend to agree.  In addition to the proliferation of #ifdef's, the
probability is quite low that we will always have experts on board who
understand the intricacies and fine points of this stuff on such a low
level.  So this will become a maintenance headache, given the
relatively high rate of introducing new processors and versions of
processors into the industry.

So unless this change has a dramatic effect on performance in
real-life usage of Emacs (as opposed to synthetic benchmarks), I'd
rather we didn't install this kind of changes.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#80021; Package emacs. (Thu, 18 Dec 2025 16:51:02 GMT) Full text and rfc822 format available.

Message #17 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: Emacs Bug Report <bug-gnu-emacs <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: suboptimal Lisp_Object tag check (TAGGEDP)
Date: Thu, 18 Dec 2025 17:50:30 +0100

17 dec. 2025 kl. 20.48 skrev Pip Cet <pipcet <at> protonmail.com>:

> IIRC, the initial idea was that the untagged pointer would now be in
> "r", ready to be used, but I'm not sure GCC performs that optimization;
> today's major platforms, AArch64 and x86, both have unscaled memory
> offsets (LDU on AArch64, ordinary loads on x86) so it doesn't matter
> much how the tag bits are removed or whether that happens once or twice.

Indeed, the construct doesn't seem to be very useful that way, even if the truncating cast is dropped.

>> and the second version is likely a cycle slower if the first test fails.
> 
> A whole cycle?!

No, sorry about that, there's indeed no dependence there. There is an extra instruction to be tracked which adds to the execution resource pressure, and the code becomes longer, but there is no latency directly due to dependence.

> On x86, checking two tags at once is probably done most compactly by
> generating a 32-bit mask, then left-shifting by the value of the lisp
> word (which ignores all but the LSB 5 bits) and checking the MSB of the
> result: (((mask) << (lisp_word)) < 0).

That's indeed what compilers typically generate once they understand what the test is about (with some variations) but, not with the 'clever' test.

> I think of it as the equivalent of fsincos: you're usually going to need
> both, so you might as well compute what we optimistically assume is the
> untagged (aligned) pointer while checking the tag.

But we often don't need the untagged pointer at that point, and it doesn't cost much (not even latency as there's no dependence; see above) even when we do.

>> It's also compiler-specific: recent versions of Clang normalise and usually emit the 'obvious' code for both.
> 
> That's probably not a bad decision on their part, as many such clever
> optimizations are no longer worth it and normalizing them is a net gain.

Definitely, it shouldn't matter how we formulate the condition.

>> Thus we should keep the current 'clever' TAGGEDP for GCC on x86, for
>> now, and use the 'obvious' form for other compilers and
>> targets. Suggested patch attached.
> 
> Given the weird 32-bit cast, I agree, though maybe we should simply use
> the version without the cast in the general case (or maybe in all cases:
> GCC is now smart enough to use a bit test for FIXNUMP(x) || FLOATP(x),
> so maybe its CSE has also improved and it can reuse the pointer?).

I wouldn't mind using (X&7)==T everywhere and trusting GCC to do its job, but the reason for the current code was apparently dissatisfaction with its ability to do just that. The current code is clearly smaller but not faster (on the old machine used for testing).

>>  TAGGEDP(X,0b010) || TAGGEDP(X,0b010)
>> 
>> which seems daft but could be better when testing other types at the same time, as in FIXNUMP(X) || FLOATP(X), because it exposes the tag to CSE.
> 
> I think you mean TAGGEDP(X, 0b010) || TAGGEDP(X, 0b110)?

Yes, sorry about the typo.

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Fri, 19 Dec 2025 17:14:02 GMT) Full text and rfc822 format available.

Notification sent to Mattias Engdegård <mattias.engdegard <at> gmail.com>:
bug acknowledged by developer. (Fri, 19 Dec 2025 17:14:02 GMT) Full text and rfc822 format available.

Message #22 received at 80021-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 80021-done <at> debbugs.gnu.org, Pip Cet <pipcet <at> protonmail.com>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: suboptimal Lisp_Object tag check (TAGGEDP)
Date: Fri, 19 Dec 2025 09:13:00 -0800

[Message part 1 (text/plain, inline)]

On 2025-12-18 08:50, Mattias Engdegård wrote:
> I wouldn't mind using (X&7)==T everywhere and trusting GCC to do its job, but the reason for the current code was apparently dissatisfaction with its ability to do just that. The current code is clearly smaller but not faster (on the old machine used for testing).

The 2018 benchmarks[1] that prompted the tricky TAGGEDP business were 
done on an AMD Phenom II X4 910e, a circa-2010 "Deneb" product that is 
no longer a good representative of what Emacs currently runs on (though 
it's still the CPU in my old office desktop at UCLA!). Also, they 
predate native compilation.

I just now did similar benchmarks, this time with yesterday's Emacs 
master (commit 5e6df127a96fe10b3f659626d64e9ba13fd12873) running Ubuntu 
25.10 which uses gcc (Ubuntu 15.2.0-4ubuntu4), and on an Intel Xeon 
W-1350, a circa-2021 "Rocket Lake-S" product that is a better 
representative nowadays. Ubuntu's GCC, unlike the 2018 benchmarks, uses 
-fstack-protector. And the new benchmarks are based on current (not 
2018) Emacs sources and these use native compilation.

On these benchmarks the tricky TAGGEDP business still shrank the temacs 
text segment (in this case by 0.6%), but on the "make compile-always" 
benchmark they made Emacs 3.0% slower.

So let's just go back to using (X&7)==T everywhere, as you suggest. This 
should address the comments on this bug report. I installed the attached 
patch to do that, and am boldly closing the bug report.

[1]: https://lists.gnu.org/r/emacs-devel/2018-08/msg00845.html

[0001-Revert-to-simpler-and-we-hope-faster-TAGGEDP.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#80021; Package emacs. (Sat, 20 Dec 2025 17:20:01 GMT) Full text and rfc822 format available.

Message #25 received at 80021 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 80021 <at> debbugs.gnu.org, Pip Cet <pipcet <at> protonmail.com>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: suboptimal Lisp_Object tag check (TAGGEDP)
Date: Sat, 20 Dec 2025 18:18:51 +0100

19 dec. 2025 kl. 18.13 skrev Paul Eggert <eggert <at> cs.ucla.edu>:

> So let's just go back to using (X&7)==T everywhere, as you suggest. This should address the comments on this bug report. I installed the attached patch to do that, and am boldly closing the bug report.

Thank you!

Tying up loose ends for completeness: FIXNUMP is currently defined as

[1]  (X&3)==0b10

and I decided to check whether

[2]  (X&7)==0b010 || (X&7)==0b110

might be better as synthetic tests show that it should.

While [2] does result in slightly smaller code (2 KiB), there is a tiny performance regression that could be a ghost (cache line placement artefact) or not. On closer inspection code generation disappoints a bit so perhaps the compiler devs should be contacted first.

There's also FIXNATP, currently using the straightforward

[3]  FIXNUMP(X) && XFIXNUM(X)>=0

but it could also be written (for USE_LSB_TAG)

[4]  FIXNUMP(X) && X>=0
[5]  (X & 0x8000000000000003) == 0b10

which both result in the same code, only has a single branch, and seem smaller and faster on ARM64 where that big constant fits in a small immediate.
Finally,

[5]  (ROTATE_LEFT(X,1)&7)==0b010

is very compact on x86 and is also single-branch. In summary, nothing conclusive yet.

This bug report was last modified 22 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #80021 suboptimal Lisp_Object tag check (TAGGEDP)

GNU bug report logs - #80021
suboptimal Lisp_Object tag check (TAGGEDP)