GNU bug report logs - #78620
[PATCH] Make obarray's buckets a Lisp_Vector

Previous Next

Package: emacs;

Reported by: Helmut Eller <eller.helmut <at> gmail.com>

Date: Wed, 28 May 2025 12:05:04 UTC

Severity: normal

Tags: notabug, patch, wontfix

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 78620 in the body.
You can then email your comments to 78620 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#78620; Package emacs. (Wed, 28 May 2025 12:05:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Helmut Eller <eller.helmut <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 28 May 2025 12:05:05 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Helmut Eller <eller.helmut <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: [PATCH] Make obarray's buckets a Lisp_Vector
Date: Wed, 28 May 2025 14:03:29 +0200
[Message part 1 (text/plain, inline)]
Making the Lisp_Obarray.buckets field a proper Lisp_Vector instead of a
plain Lisp_Object[] would make some things simpler: some #ifdef HAVE_MPS
can be removed and the dumper is a bit simpler too.

The patch introduces functions vector_ref/vector_set which are analogues
to AREF/ASET; maybe that is contentious.

[0001-Make-obarray-s-buckets-a-Lisp_Vector.patch (text/x-diff, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78620; Package emacs. (Wed, 28 May 2025 13:24:05 GMT) Full text and rfc822 format available.

Message #8 received at 78620 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Helmut Eller <eller.helmut <at> gmail.com>
Cc: 78620 <at> debbugs.gnu.org
Subject: Re: bug#78620: [PATCH] Make obarray's buckets a Lisp_Vector
Date: Wed, 28 May 2025 13:23:23 +0000
"Helmut Eller" <eller.helmut <at> gmail.com> writes:

> Making the Lisp_Obarray.buckets field a proper Lisp_Vector instead of a
> plain Lisp_Object[] would make some things simpler: some #ifdef HAVE_MPS
> can be removed and the dumper is a bit simpler too.

Or we could make it a Lisp_Object, and use the generic vectorlike code.
I really do not understand your reasoning for wanting to do the
opposite, and use more raw pointers in heap objects.  What is the
advantage?

It's less type-safe, cannot be done in many places, and generally makes
hacking harder.  The code size impact is small on x86 and should be
minimal on aarch64.

I realize there's no consensus for following my suggestion (if it's on
the heap, and thus fixable, use a Lisp_Object; if you need a raw
pointer, it should only ever be in a local variable).  But I don't think
there's any consensus for doing the opposite either.

But all this may well mean I'm missing the point here.

> The patch introduces functions vector_ref/vector_set which are analogues
> to AREF/ASET; maybe that is contentious.

Again, I don't understand the impulse behind avoiding Lisp_Objects.

As for less fundamental things that should probably be fixed:

I notice that vector_size doesn't mask out the GC mark bit, but it is
used instead of gc_asize in the new AREF.  So the new AREF is
problematic when used during GC.

The pdumper code should probably use strong weight for the bucket
vector, not WEIGHT_NORMAL, in order to avoid changing the dump layout
unpredictably.

Since size_bits and the size of the new vectors are redundant, it'd be
nice to assert things are as they should be in obarray_size.

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78620; Package emacs. (Wed, 28 May 2025 16:50:02 GMT) Full text and rfc822 format available.

Message #11 received at 78620 <at> debbugs.gnu.org (full text, mbox):

From: Helmut Eller <eller.helmut <at> gmail.com>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: 78620 <at> debbugs.gnu.org
Subject: Re: bug#78620: [PATCH] Make obarray's buckets a Lisp_Vector
Date: Wed, 28 May 2025 18:49:34 +0200
[Message part 1 (text/plain, inline)]
On Wed, May 28 2025, Pip Cet wrote:

> "Helmut Eller" <eller.helmut <at> gmail.com> writes:
>
>> Making the Lisp_Obarray.buckets field a proper Lisp_Vector instead of a
>> plain Lisp_Object[] would make some things simpler: some #ifdef HAVE_MPS
>> can be removed and the dumper is a bit simpler too.
>
> Or we could make it a Lisp_Object, and use the generic vectorlike code.
> I really do not understand your reasoning for wanting to do the
> opposite, and use more raw pointers in heap objects.  What is the
> advantage?

The advantage is better static type checking and better documentation.
Compare the field declarations:

  struct Lisp_Vector *buckets;

with
  
  Lisp_Object buckets;

Which one describes the data structure better?  Clearly the first.

[...]
> I realize there's no consensus for following my suggestion (if it's on
> the heap, and thus fixable, use a Lisp_Object; if you need a raw
> pointer, it should only ever be in a local variable).  But I don't think
> there's any consensus for doing the opposite either.

I think that in structures, raw pointers to objects on the GC heap will
not go away.  So we can as well take advantage of them.

[...]
> As for less fundamental things that should probably be fixed:
>
> I notice that vector_size doesn't mask out the GC mark bit, but it is
> used instead of gc_asize in the new AREF.  So the new AREF is
> problematic when used during GC.
[...]
> The pdumper code should probably use strong weight for the bucket
> vector, not WEIGHT_NORMAL, in order to avoid changing the dump layout
> unpredictably.
[...]
> Since size_bits and the size of the new vectors are redundant, it'd be
> nice to assert things are as they should be in obarray_size.

Valid points.  See amended patch below.

[0001-Make-obarray-s-buckets-a-Lisp_Vector.patch (text/x-diff, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78620; Package emacs. (Wed, 28 May 2025 19:15:01 GMT) Full text and rfc822 format available.

Message #14 received at 78620 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Helmut Eller <eller.helmut <at> gmail.com>
Cc: 78620 <at> debbugs.gnu.org
Subject: Re: bug#78620: [PATCH] Make obarray's buckets a Lisp_Vector
Date: Wed, 28 May 2025 19:14:03 +0000
"Helmut Eller" <eller.helmut <at> gmail.com> writes:

> On Wed, May 28 2025, Pip Cet wrote:
>
>> "Helmut Eller" <eller.helmut <at> gmail.com> writes:
>>
>>> Making the Lisp_Obarray.buckets field a proper Lisp_Vector instead of a
>>> plain Lisp_Object[] would make some things simpler: some #ifdef HAVE_MPS
>>> can be removed and the dumper is a bit simpler too.
>>
>> Or we could make it a Lisp_Object, and use the generic vectorlike code.
>> I really do not understand your reasoning for wanting to do the
>> opposite, and use more raw pointers in heap objects.  What is the
>> advantage?
>
> The advantage is better static type checking

Not much of an advantage, given how often we cast pointers in C, and we
give up any hope of automated dynamic type checking.

I'd prefer to dynamically check heap structures have the right type in
(another user of) the shared tracing code you suggested.  It might not
have been obvious, but, to me, edge naming implies edge typechecking,
and this is what I wrote:

    case PVEC_OVERLAY:
      EDGES (EDGE_PLIST (plist));
    case PVEC_FINALIZER:
      EDGES (EDGE_FUNCTION (function));
    case PVEC_SYMBOL_WITH_POS:
      EDGES (EDGE_BARE_SYMBOL (sym),
	     EDGE_FIXNUM (pos));

Please ignore that the edge list should be closer to the struct
definition.  Let's focus on the benefits of having a simple option to
trace the heap (continuously in a background thread if we must) and
verify that every finalizer's "function" field satisfies FUNCTIONP
(let's pretend the nil case doesn't happen right now).  I don't see how
we could even do that statically.

So we can do better than limited static typechecking if we use
Lisp_Objects and predicates rather than C pointers and C types, IMHO.

> and better documentation.

Not really:

> Compare the field declarations:
>
>   struct Lisp_Vector *buckets;
>
> with
>
>   Lisp_Object buckets;
>
> Which one describes the data structure better?  Clearly the first.

Neither one describes it particularly well, though.  Is buckets allowed
to be NULL?  What's the size of buckets?  Is it a real Lisp_Vector or a
pointer to a vectorlike cast to struct Lisp_Vector * for convenience?

So we need to compare

struct Lisp_Vector *buckets; /* A real Lisp vector, never NULL.  */

to

Lisp_Object buckets; /* A vector.  */

>> I realize there's no consensus for following my suggestion (if it's on
>> the heap, and thus fixable, use a Lisp_Object; if you need a raw
>> pointer, it should only ever be in a local variable).  But I don't think
>> there's any consensus for doing the opposite either.
>
> I think that in structures, raw pointers to objects on the GC heap will
> not go away.
> So we can as well take advantage of them.

I think we'd be giving up the advantages of strict,
dynamically-verified, rich predicates, and all we get is plain old C
typechecks which can be fooled by a simple pointer cast (or, worse,
cannot be, necessitating many more near-equivalent definitions).

> Valid points.  See amended patch below.

My opinion (which doesn't count for anything): Replacing Lisp_Object
*buckets by struct Lisp_Vector *buckets is an improvement.  Replacing
it by Lisp_Object buckets would be a more significant improvement.
Modifying the generic code doesn't improve things: more definitions to
learn, and ASIZE/vector_size are confusingly different now.

> +  eassert ((1 << o->size_bits) == (size_t) vector_size (o->buckets));

will break for o->size_bits = 32, I think.

My suggestion would be to apply this without introducing the API
doublets, spelling out v->contents[idx] as we do elsewhere.

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78620; Package emacs. (Thu, 29 May 2025 15:03:02 GMT) Full text and rfc822 format available.

Message #17 received at 78620 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: 78620 <at> debbugs.gnu.org
Cc: Pip Cet <pipcet <at> protonmail.com>, Helmut Eller <eller.helmut <at> gmail.com>
Subject: bug#78620: [PATCH] Make obarray's buckets a Lisp_Vector
Date: Thu, 29 May 2025 17:02:17 +0200
(I just saw this bug by chance -- if you want my comments on code I wrote, CC: me explicitly)

Thank you for caring about the obarray internals. The reason why the obarray and hash-table implementations don't use Lisp allocations for their tables is that with the current GC, it's much more efficient to allocate and free them explicitly to handle growth.

MPS may alter the balance but the concern remains real: eager freeing of memory immediately when it becomes dead is inherently more memory-efficient than waiting for it to be collected, and since the freed memory can be taken into use at once it improves locality.

This is the reason why it's written that way, and plenty of measurements were made at the time. There may still be good arguments for changing the structure but be careful, and don't do it for aesthetics alone.

(And no, I'm not very happy with the internal obarray representation which was basically kept unchanged when the new obarray objects were added. Especially the way symbols in each bucket are chained together with ->next pointers, that's a bit unfortunate. But it works, and is a lot faster than it was before.)





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78620; Package emacs. (Thu, 29 May 2025 16:37:02 GMT) Full text and rfc822 format available.

Message #20 received at 78620 <at> debbugs.gnu.org (full text, mbox):

From: Helmut Eller <eller.helmut <at> gmail.com>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: Pip Cet <pipcet <at> protonmail.com>, 78620 <at> debbugs.gnu.org
Subject: Re: bug#78620: [PATCH] Make obarray's buckets a Lisp_Vector
Date: Thu, 29 May 2025 18:35:52 +0200
On Thu, May 29 2025, Mattias Engdegård wrote:

[...]
> This is the reason why it's written that way, and plenty of
> measurements were made at the time. There may still be good arguments
> for changing the structure but be careful, and don't do it for
> aesthetics alone.

Okay, then I withdraw my proposal.

Helmut




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78620; Package emacs. (Fri, 30 May 2025 04:20:03 GMT) Full text and rfc822 format available.

Message #23 received at 78620 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Helmut Eller <eller.helmut <at> gmail.com>
Cc: mattias.engdegard <at> gmail.com, 78620 <at> debbugs.gnu.org, pipcet <at> protonmail.com
Subject: Re: bug#78620: [PATCH] Make obarray's buckets a Lisp_Vector
Date: Fri, 30 May 2025 07:19:43 +0300
tags 78620 notabug wontfix
close 78620
thanks

> Cc: Pip Cet <pipcet <at> protonmail.com>, 78620 <at> debbugs.gnu.org
> From: Helmut Eller <eller.helmut <at> gmail.com>
> Date: Thu, 29 May 2025 18:35:52 +0200
> 
> On Thu, May 29 2025, Mattias Engdegård wrote:
> 
> [...]
> > This is the reason why it's written that way, and plenty of
> > measurements were made at the time. There may still be good arguments
> > for changing the structure but be careful, and don't do it for
> > aesthetics alone.
> 
> Okay, then I withdraw my proposal.

I'm therefore closing this bug.




Added tag(s) wontfix and notabug. Request was from Eli Zaretskii <eliz <at> gnu.org> to control <at> debbugs.gnu.org. (Fri, 30 May 2025 04:20:03 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 78620 <at> debbugs.gnu.org and Helmut Eller <eller.helmut <at> gmail.com> Request was from Eli Zaretskii <eliz <at> gnu.org> to control <at> debbugs.gnu.org. (Fri, 30 May 2025 04:20:04 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 27 Jun 2025 11:24:08 GMT) Full text and rfc822 format available.

This bug report was last modified 5 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.