GNU bug report logs -
#78162
[PATCH] hash-table-contains-p: Avoid creating a symbol on every call
Previous Next
To reply to this bug, email your comments to 78162 AT debbugs.gnu.org.
There is no need to reopen the bug first.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78162
; Package
emacs
.
(Wed, 30 Apr 2025 10:04:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Daniel Mendler <mail <at> daniel-mendler.de>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Wed, 30 Apr 2025 10:04:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Tags: patch
For performance reasons avoid creating a symbol on every call to
`hash-table-contains-p'. See also https://github.com/emacs-compat/compat/issues/71.
[0001-hash-table-contains-p-Avoid-creating-a-symbol-on-eve.patch (text/patch, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78162
; Package
emacs
.
(Wed, 30 Apr 2025 11:42:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 78162 <at> debbugs.gnu.org (full text, mbox):
"Daniel Mendler via \"Bug reports for GNU Emacs, the Swiss army knife of text editors\"" <bug-gnu-emacs <at> gnu.org> writes:
> Tags: patch
>
> For performance reasons avoid creating a symbol on every call to
> `hash-table-contains-p'. See also https://github.com/emacs-compat/compat/issues/71.
If the performance of hash-table-contains-p is an issue, we should
reimplement it in C, which can use magic values which aren't accessible
from Lisp.
Exposing the magic value to Lisp will cause trouble, even if it's "just"
a defconst as in your patch. The current code is somewhat inefficient
but seems to me to be correct. Even using an uninterned symbol using
its special read syntax (#:missing) may be a problem because of the
position table the reader sometimes keeps.
I say the code "seems" to be correct because depending on how the
defsubst is inlined, an environment may be accessible to the debugger in
which the magic value is leaked. Byte-compiling the function results in
code which never leaks the value to Lisp, so in practice there shouldn't
be a problem once subr.elc exists. (As the problem is the defun, not
the bytecode, turning the defsubst into a defun wouldn't help).
My guess is most code using hash-table-contains-p will be inefficient
anyway, because it will either attempt to retrieve the entry, to create
a new entry, or to remove the entry, all of which mean we have to hash
the key again. All of that can be optimized, of course.
Pip
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78162
; Package
emacs
.
(Wed, 30 Apr 2025 14:01:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 78162 <at> debbugs.gnu.org (full text, mbox):
Pip Cet <pipcet <at> protonmail.com> writes:
> "Daniel Mendler via \"Bug reports for GNU Emacs, the Swiss army knife of text editors\"" <bug-gnu-emacs <at> gnu.org> writes:
>
>> Tags: patch
>>
>> For performance reasons avoid creating a symbol on every call to
>> `hash-table-contains-p'. See also https://github.com/emacs-compat/compat/issues/71.
>
> If the performance of hash-table-contains-p is an issue, we should
> reimplement it in C, which can use magic values which aren't accessible
> from Lisp.
For a proper hash table I expect this function to be efficient and not
allocate. A magic value becomes a problem if it is actually used in
Lisp. Is this likely? This should not happen by accident, only if
`hash-table--missing` is used intentionally. Note that the actual symbol
which is used for the check is uninterned.
> Pip
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78162
; Package
emacs
.
(Thu, 01 May 2025 09:39:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 78162 <at> debbugs.gnu.org (full text, mbox):
"Daniel Mendler" <mail <at> daniel-mendler.de> writes:
> Pip Cet <pipcet <at> protonmail.com> writes:
>
>> "Daniel Mendler via \"Bug reports for GNU Emacs, the Swiss army knife of text editors\"" <bug-gnu-emacs <at> gnu.org> writes:
>>
>>> Tags: patch
>>>
>>> For performance reasons avoid creating a symbol on every call to
>>> `hash-table-contains-p'. See also https://github.com/emacs-compat/compat/issues/71.
>>
>> If the performance of hash-table-contains-p is an issue, we should
>> reimplement it in C, which can use magic values which aren't accessible
>> from Lisp.
>
> For a proper hash table I expect this function to be efficient and not
> allocate.
C would work, then. As it avoids both your performance concerns and my
concerns about leaking magic values to Lisp, how about this?
From 2a0103a8e67268eed0860baca397a0713335ba0d Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet <at> protonmail.com>
Subject: [PATCH] Implement 'hash-table-contains-p' in C (bug#78162)
* lisp/emacs-lisp/byte-opt.el (side-effect-free-fns): Add
'hash-table-contains-p'.
* lisp/subr.el (hash-table-contains-p): Function removed.
* src/fns.c (Fhash_table_contains_p): New function.
(syms_of_fns): Register it.
* test/lisp/subr-tests.el (hash-table-contains-p): Move...
* test/src/fns-tests.el (test-hash-table-contains-p): ...here.
---
lisp/emacs-lisp/byte-opt.el | 2 +-
lisp/subr.el | 7 -------
src/fns.c | 12 ++++++++++++
test/lisp/subr-tests.el | 12 ------------
test/src/fns-tests.el | 12 ++++++++++++
5 files changed, 25 insertions(+), 20 deletions(-)
diff --git a/lisp/emacs-lisp/byte-opt.el b/lisp/emacs-lisp/byte-opt.el
index 652c79e9c93..f93946635a0 100644
--- a/lisp/emacs-lisp/byte-opt.el
+++ b/lisp/emacs-lisp/byte-opt.el
@@ -1761,7 +1761,7 @@ byte-optimize-set
compare-strings concat copy-alist copy-hash-table copy-sequence elt
equal equal-including-properties
featurep get
- gethash hash-table-count hash-table-rehash-size
+ gethash hash-table-contains-p hash-table-count hash-table-rehash-size
hash-table-rehash-threshold hash-table-size hash-table-test
hash-table-weakness
length length< length= length>
diff --git a/lisp/subr.el b/lisp/subr.el
index a5c850ebe5e..6a0487e96ee 100644
--- a/lisp/subr.el
+++ b/lisp/subr.el
@@ -7436,13 +7436,6 @@ string-trim
(declare (important-return-value t))
(string-trim-left (string-trim-right string trim-right) trim-left))
-(defsubst hash-table-contains-p (key table)
- "Return non-nil if TABLE has an element with KEY."
- (declare (side-effect-free t)
- (important-return-value t))
- (let ((missing (make-symbol "missing")))
- (not (eq (gethash key table missing) missing))))
-
;; The initial anchoring is for better performance in searching matches.
(defconst regexp-unmatchable "\\`a\\`"
"Standard regexp guaranteed not to match any string at all.")
diff --git a/src/fns.c b/src/fns.c
index 21916b6fb46..a3b45232700 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -5866,6 +5866,17 @@ DEFUN ("gethash", Fgethash, Sgethash, 2, 3, 0,
}
+DEFUN ("hash-table-contains-p", Fhash_table_contains_p, Shash_table_contains_p,
+ 2, 2, 0,
+ doc: /* Return non-nil if TABLE has an element with KEY. */)
+ (Lisp_Object key, Lisp_Object table)
+{
+ struct Lisp_Hash_Table *h = check_hash_table (table);
+ ptrdiff_t i = hash_find (h, key);
+ return i >= 0 ? Qt : Qnil;
+}
+
+
DEFUN ("puthash", Fputhash, Sputhash, 3, 3, 0,
doc: /* Associate KEY with VALUE in hash table TABLE.
If KEY is already present in table, replace its current value with
@@ -6686,6 +6697,7 @@ syms_of_fns (void)
defsubr (&Shash_table_p);
defsubr (&Sclrhash);
defsubr (&Sgethash);
+ defsubr (&Shash_table_contains_p);
defsubr (&Sputhash);
defsubr (&Sremhash);
defsubr (&Smaphash);
diff --git a/test/lisp/subr-tests.el b/test/lisp/subr-tests.el
index 024cbe85bba..f3adfcaed62 100644
--- a/test/lisp/subr-tests.el
+++ b/test/lisp/subr-tests.el
@@ -1493,17 +1493,5 @@ subr--subst-char-in-string
(props-out (object-intervals out)))
(should (equal props-out props-in))))))))
-(ert-deftest hash-table-contains-p ()
- (let ((h (make-hash-table)))
- (should-not (hash-table-contains-p 'problems h))
- (should-not (hash-table-contains-p 'cookie h))
- (should-not (hash-table-contains-p 'milk h))
- (puthash 'problems 99 h)
- (puthash 'cookie nil h)
- (puthash 'milk 'missing h)
- (should (hash-table-contains-p 'problems h))
- (should (hash-table-contains-p 'cookie h))
- (should (hash-table-contains-p 'milk h))))
-
(provide 'subr-tests)
;;; subr-tests.el ends here
diff --git a/test/src/fns-tests.el b/test/src/fns-tests.el
index e6abcdc34e7..d3e9512434a 100644
--- a/test/src/fns-tests.el
+++ b/test/src/fns-tests.el
@@ -1149,6 +1149,18 @@ test-hash-table-wrong-keywords
(should (make-hash-table :rehash-threshold 123)) ; obsolete and ignored
(should-error (make-hash-table :some-random-keyword 123)))
+(ert-deftest test-hash-table-contains-p ()
+ (let ((h (make-hash-table)))
+ (should-not (hash-table-contains-p 'problems h))
+ (should-not (hash-table-contains-p 'cookie h))
+ (should-not (hash-table-contains-p 'milk h))
+ (puthash 'problems 99 h)
+ (puthash 'cookie nil h)
+ (puthash 'milk 'missing h)
+ (should (hash-table-contains-p 'problems h))
+ (should (hash-table-contains-p 'cookie h))
+ (should (hash-table-contains-p 'milk h))))
+
(ert-deftest test-remhash ()
(let ((h (make-hash-table))
(val "anything"))
--
2.48.1
> A magic value becomes a problem if it is actually used in Lisp. Is
> this likely? This should not happen by accident, only if
> `hash-table--missing` is used intentionally.
I think it's reasonable to build a hash table associating each symbol's
name to its value, in order to detect which symbol values change as code
is executed. With your patch, this code will result in a hash table
which has a false negative for (hash-table-contains-p
"hash-table--missing" h):
(defun dump-obarray ()
(let ((h (make-hash-table :test 'equal)))
(mapatoms (lambda (sym) (when (boundp sym) (puthash (symbol-name sym) (symbol-value sym) h))))
h))
(let ((old (dump-obarray))
(new (dump-obarray)))
(maphash (lambda (key value)
(if (hash-table-contains-p key old)
(unless (eq (gethash key old) (gethash key new))
(message "value of %S changed" key))
(message "%S newly defined to be %S" key value)))
new))
With the C implementation, this would work, of course.
> Note that the actual symbol which is used for the check is uninterned.
I understand that it's traditional to use uninterned symbols for cases
like this one, but strings or conses (depending on whether you want
extra information) would work just as well, and be a bit cheaper.
Floats would work and be cheapest, but might break if our representation
of them changes, as it has in my local tree.
In any case, IMHO, it's not the magic value that's the problem here,
it's that it's stored as a global symbol's value and thus becomes
reachable via the obarray.
Pip
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78162
; Package
emacs
.
(Thu, 01 May 2025 15:38:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 78162 <at> debbugs.gnu.org (full text, mbox):
Pip Cet <pipcet <at> protonmail.com> writes:
> "Daniel Mendler" <mail <at> daniel-mendler.de> writes:
>
>> Pip Cet <pipcet <at> protonmail.com> writes:
>>
>>> "Daniel Mendler via \"Bug reports for GNU Emacs, the Swiss army knife of text editors\"" <bug-gnu-emacs <at> gnu.org> writes:
>>>
>>>> Tags: patch
>>>>
>>>> For performance reasons avoid creating a symbol on every call to
>>>> `hash-table-contains-p'. See also https://github.com/emacs-compat/compat/issues/71.
>>>
>>> If the performance of hash-table-contains-p is an issue, we should
>>> reimplement it in C, which can use magic values which aren't accessible
>>> from Lisp.
>>
>> For a proper hash table I expect this function to be efficient and not
>> allocate.
>
> C would work, then. As it avoids both your performance concerns and my
> concerns about leaking magic values to Lisp, how about this?
Looks good. Thanks.
> Pip
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78162
; Package
emacs
.
(Sat, 10 May 2025 10:20:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 78162 <at> debbugs.gnu.org (full text, mbox):
> Cc: 78162 <at> debbugs.gnu.org, Stefan Kangas <stefankangas <at> gmail.com>
> Date: Thu, 01 May 2025 17:37:19 +0200
> From: Daniel Mendler via "Bug reports for GNU Emacs,
> the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>
> Pip Cet <pipcet <at> protonmail.com> writes:
>
> > "Daniel Mendler" <mail <at> daniel-mendler.de> writes:
> >
> >> Pip Cet <pipcet <at> protonmail.com> writes:
> >>
> >>> "Daniel Mendler via \"Bug reports for GNU Emacs, the Swiss army knife of text editors\"" <bug-gnu-emacs <at> gnu.org> writes:
> >>>
> >>>> Tags: patch
> >>>>
> >>>> For performance reasons avoid creating a symbol on every call to
> >>>> `hash-table-contains-p'. See also https://github.com/emacs-compat/compat/issues/71.
> >>>
> >>> If the performance of hash-table-contains-p is an issue, we should
> >>> reimplement it in C, which can use magic values which aren't accessible
> >>> from Lisp.
> >>
> >> For a proper hash table I expect this function to be efficient and not
> >> allocate.
> >
> > C would work, then. As it avoids both your performance concerns and my
> > concerns about leaking magic values to Lisp, how about this?
>
> Looks good. Thanks.
Stefan, any comments to the patch?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78162
; Package
emacs
.
(Sun, 11 May 2025 05:32:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 78162 <at> debbugs.gnu.org (full text, mbox):
> In any case, IMHO, it's not the magic value that's the problem here,
> it's that it's stored as a global symbol's value and thus becomes
> reachable via the obarray.
Then an alternative would be the simpler patch below?
> Stefan, any comments to the patch?
At this point we're firmly in bikeshed territory.
I agree with Daniel that the function should not allocate, but I also
agree with Pip Cet that the performance of the function shouldn't matter
very much, so maybe my ELisp version provides a middle ground?
[ Ideally, this function would be replaced with one which doesn't return
just `t` but an actual "handle" on the hash-table slot, so that we can
then access that slot without redoing the key lookup. It would allow
doing thing like `push` or `incf` on a hash-table entry without
looking up the key twice.
But doing so without allocating an object is hard with the current
hash-table design (and would come with its own downsides). ]
Stefan
diff --git a/lisp/subr.el b/lisp/subr.el
index 79288921bed..962ae61cfd8 100644
--- a/lisp/subr.el
+++ b/lisp/subr.el
@@ -7436,11 +7436,11 @@
(declare (important-return-value t))
(string-trim-left (string-trim-right string trim-right) trim-left))
-(defsubst hash-table-contains-p (key table)
+(let ((missing (make-symbol "missing")))
+ (defsubst hash-table-contains-p (key table)
"Return non-nil if TABLE has an element with KEY."
(declare (side-effect-free t)
(important-return-value t))
- (let ((missing (make-symbol "missing")))
(not (eq (gethash key table missing) missing))))
;; The initial anchoring is for better performance in searching matches.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78162
; Package
emacs
.
(Sun, 11 May 2025 06:03:02 GMT)
Full text and
rfc822 format available.
Message #26 received at submit <at> debbugs.gnu.org (full text, mbox):
Stefan Monnier via "Bug reports for GNU Emacs, the Swiss army knife of
text editors" <bug-gnu-emacs <at> gnu.org> writes:
> [ Ideally, this function would be replaced with one which doesn't return
> just `t` but an actual "handle" on the hash-table slot, so that we can
> then access that slot without redoing the key lookup. It would allow
> doing thing like `push` or `incf` on a hash-table entry without
> looking up the key twice.
> But doing so without allocating an object is hard with the current
> hash-table design (and would come with its own downsides). ]
A micro-optimization could be for the hash table to remember the last
key lookup, so that something like this hits a fast path for the second
lookup of the same key:
(when (hash-table-contains-p key table)
(push 'x (gethash table key)))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78162
; Package
emacs
.
(Sun, 11 May 2025 08:03:02 GMT)
Full text and
rfc822 format available.
Message #29 received at submit <at> debbugs.gnu.org (full text, mbox):
On Mai 11 2025, Stefan Monnier via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
> diff --git a/lisp/subr.el b/lisp/subr.el
> index 79288921bed..962ae61cfd8 100644
> --- a/lisp/subr.el
> +++ b/lisp/subr.el
> @@ -7436,11 +7436,11 @@
> (declare (important-return-value t))
> (string-trim-left (string-trim-right string trim-right) trim-left))
>
> -(defsubst hash-table-contains-p (key table)
> +(let ((missing (make-symbol "missing")))
> + (defsubst hash-table-contains-p (key table)
> "Return non-nil if TABLE has an element with KEY."
> (declare (side-effect-free t)
> (important-return-value t))
> - (let ((missing (make-symbol "missing")))
Or use '#:missing instead of (make-symbol "missing").
--
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78162
; Package
emacs
.
(Sun, 11 May 2025 08:03:03 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78162
; Package
emacs
.
(Sun, 11 May 2025 09:46:02 GMT)
Full text and
rfc822 format available.
Message #35 received at 78162 <at> debbugs.gnu.org (full text, mbox):
"Stefan Monnier" <monnier <at> iro.umontreal.ca> writes:
>> In any case, IMHO, it's not the magic value that's the problem here,
>> it's that it's stored as a global symbol's value and thus becomes
>> reachable via the obarray.
>
> Then an alternative would be the simpler patch below?
I tried hard to find something wrong with it, but couldn't.
The debugging experience when single-stepping through that function
isn't very good, but that's a separate bug (the debugger should probably
bind print-gensym).
I have no opinion on make-symbol vs #: syntax. As it's all evaluated
prior to dumping, it doesn't matter much how we construct the symbol: we
dump the final byte code, which looks good to me.
> [ Ideally, this function would be replaced with one which doesn't return
> just `t` but an actual "handle" on the hash-table slot, so that we can
> then access that slot without redoing the key lookup. It would allow
> doing thing like `push` or `incf` on a hash-table entry without
> looking up the key twice.
> But doing so without allocating an object is hard with the current
> hash-table design (and would come with its own downsides). ]
I think the low-hanging fruit is probably changing the hash table
implementation so it caches the last key looked up and its index, but
even that minor complication hasn't been necessary so far, I think.
This might change if we make sxhash more resilient against hash
collisions by increasing SXHASH_MAX_DEPTH or SXHASH_MAX_LENGTH.
Pip
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78162
; Package
emacs
.
(Mon, 12 May 2025 08:18:01 GMT)
Full text and
rfc822 format available.
Message #38 received at 78162 <at> debbugs.gnu.org (full text, mbox):
On Sun, May 11 2025, Stefan Monnier via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:
> [ Ideally, this function would be replaced with one which doesn't return
> just `t` but an actual "handle" on the hash-table slot, so that we can
> then access that slot without redoing the key lookup. It would allow
> doing thing like `push` or `incf` on a hash-table entry without
> looking up the key twice.
> But doing so without allocating an object is hard with the current
> hash-table design (and would come with its own downsides). ]
Maybe something like the hashtable-update! function from R6RS would
work:
(hashtable-update! hashtable key proc default)
Proc should accept one argument, should return a single value, and
should not mutate hashtable. The hashtable-update! procedure applies
proc to the value in hashtable associated with key, or to default if
hashtable does not contain an association for key. The hashtable is
then changed to associate key with the value returned by proc.
The behavior of hashtable-update! is equivalent to the following
code, but may be implemented more efficiently in cases where the
implementation can avoid multiple lookups of the same key:
(hashtable-set! hashtable key
(proc (hashtable-ref hashtable key default)))
A problem with hashtable-update! seems to be that the proc argument is
often a lambda expression and would allocate. Though, people who care
about that could perhaps use dynamic variables to avoid allocation a
closure. On the positive side, this it doesn't need an explicit
"handle".
Helmut
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78162
; Package
emacs
.
(Mon, 12 May 2025 08:19:02 GMT)
Full text and
rfc822 format available.
Reply sent
to
Stefan Monnier <monnier <at> iro.umontreal.ca>
:
You have taken responsibility.
(Mon, 12 May 2025 21:22:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Daniel Mendler <mail <at> daniel-mendler.de>
:
bug acknowledged by developer.
(Mon, 12 May 2025 21:22:02 GMT)
Full text and
rfc822 format available.
Message #46 received at 78162-done <at> debbugs.gnu.org (full text, mbox):
>> Then an alternative would be the simpler patch below?
> I tried hard to find something wrong with it, but couldn't.
Push to `master`, closing,
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78162
; Package
emacs
.
(Tue, 13 May 2025 09:00:02 GMT)
Full text and
rfc822 format available.
Message #49 received at 78162-done <at> debbugs.gnu.org (full text, mbox):
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:
>>> Then an alternative would be the simpler patch below?
>> I tried hard to find something wrong with it, but couldn't.
>
> Push to `master`, closing,
Thanks. I've ported back the function to Compat on the emacs-31 branch.
> Stefan
This bug report was last modified 11 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.