GNU bug report logs - #51089
28.0.60; Using read-symbol-shorthands (("-" . "foo-")) shouldn't shadow the '-' symbol

Previous Next

Package: emacs;

Reported by: João Távora <joaotavora <at> gmail.com>

Date: Thu, 7 Oct 2021 20:46:02 UTC

Severity: normal

Found in version 28.0.60

Done: João Távora <joaotavora <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 51089 in the body.
You can then email your comments to 51089 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#51089; Package emacs. (Thu, 07 Oct 2021 20:46:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to João Távora <joaotavora <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 07 Oct 2021 20:46:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: João Távora <joaotavora <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org, rms <at> gnu.org, eliz <at> gnu.org
Subject: 28.0.60; Using read-symbol-shorthands (("-" . "foo-")) shouldn't
 shadow the '-' symbol
Date: Thu, 07 Oct 2021 21:44:57 +0100
First reported in

    https://lists.gnu.org/archive/html/emacs-devel/2021-09/msg02297.html

Richard Stallman writes:

> It looks like "-" as a shorthand prefix should not rename `-'.  I can
> imagine various ways to fix this, some more general and some less
> general.

This is true.  We must fix this before importing Magnar Sveen's dash.el
library (which uses the very short '-' prefix) and any of its users in a
way that avoids the namespace pollution.

As Richard states, there are various ways to fix this.  I discuss
briefly 2 of them.  The first doesn't have any drawbacks in my opinion,
the second has a small one but it's overcome reasonably easy.

(1) The very simplest fix (and perhaps the most correct one) is just to
special-case the case of the '-' prefix and the '-' function.  That's
because the '-' symbol is the only one that exactly matches the
name-prefix separator that is used in Emacs.  In other words, I can't
think of any other "legitimate" use case for the shorthands feature that
would shadow a similar one-character symbol.  For example using

   (("/" . "some-longhand/"))

would shadow the '/' symbol but count as as "the user knows what she's
doing".


(2) Another natural, more generic, way would be to demand that the
shorthand in the 'car's of the elements of read-symbol-shorthands is
strictly shorter then the form about to be renamed.  In lread.c, I think
it would amount to this:

diff --git a/src/lread.c b/src/lread.c
index 07580d11d1..2950abf982 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -4666,7 +4666,7 @@ oblookup_considering_shorthand (Lisp_Object obarray, const char *in,
 	 version of the symbol name with xrealloc.  This isn't
 	 strictly needed, but it could later be used as a way for
 	 multiple transformations on a single symbol name.  */
-      if (sh_prefix_size <= size_byte
+      if (sh_prefix_size < size_byte
 	  && memcmp (SSDATA (sh_prefix), in, sh_prefix_size) == 0)
 	{
 	  ptrdiff_t lh_prefix_size = SBYTES (lh_prefix);

However, this would also forbid another proposed use for shorthands,
which is to use them to rename whole collections symbols and allow them
to be used without a prefix.  An example would be the 'cl-' family of
symbols, which have "grown" a 'cl-' prefix some versions go, but which
some people would still like to use without that prefix.  Examples are

   cl-loop
   cl-first
   cl-plusp

Currently, it is possible to use shorthands to do

  (("cl-loop" . "loop")
   ("cl-first" . "first")
   ("cl-plusp" . "plusp"))

With the proposed fix above, it wouldn't be.

Therefore, I propos that for this use case, we introduce new syntax in
read-symbol-shorthands.  A trailing '$' character in the shorthand
portion would mean that it's OK to bypass that "strickly shorter"
limitation and rename the whole name.  This could be done with something
like this:

diff --git a/src/lread.c b/src/lread.c
index 07580d11d1..1bcaf5c64f 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -4658,6 +4658,13 @@ oblookup_considering_shorthand (Lisp_Object obarray, const char *in,
       if (!STRINGP (sh_prefix) || !STRINGP (lh_prefix))
 	continue;
       ptrdiff_t sh_prefix_size = SBYTES (sh_prefix);
+      bool replace_whole = false;
+
+      if (SCHARS (sh_prefix) == (size + 1) &&
+	  SSDATA (sh_prefix)[sh_prefix_size - 1] == '$') {
+	replace_whole = true;
+	sh_prefix_size--;
+      }
 
       /* Compare the prefix of the transformation pair to the symbol
 	 name.  If a match occurs, do the renaming and exit the loop.
@@ -4666,7 +4673,7 @@ oblookup_considering_shorthand (Lisp_Object obarray, const char *in,
 	 version of the symbol name with xrealloc.  This isn't
 	 strictly needed, but it could later be used as a way for
 	 multiple transformations on a single symbol name.  */
-      if (sh_prefix_size <= size_byte
+      if ((sh_prefix_size < size_byte || replace_whole)
 	  && memcmp (SSDATA (sh_prefix), in, sh_prefix_size) == 0)
 	{
 	  ptrdiff_t lh_prefix_size = SBYTES (lh_prefix);

In addition to the C-side change, the Elisp code dealing with
read-symbol-shorthands prefixes needs to be changed to account for the
trailing "$".  Fortunately, there's not a lot of that code around it.

João




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51089; Package emacs. (Fri, 08 Oct 2021 05:51:02 GMT) Full text and rfc822 format available.

Message #8 received at 51089 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: João Távora <joaotavora <at> gmail.com>
Cc: 51089 <at> debbugs.gnu.org, rms <at> gnu.org
Subject: Re: 28.0.60; Using read-symbol-shorthands (("-" . "foo-")) shouldn't
 shadow the '-' symbol
Date: Fri, 08 Oct 2021 08:49:30 +0300
> From: João Távora <joaotavora <at> gmail.com>
> Date: Thu, 07 Oct 2021 21:44:57 +0100
> 
> (1) The very simplest fix (and perhaps the most correct one) is just to
> special-case the case of the '-' prefix and the '-' function.  That's
> because the '-' symbol is the only one that exactly matches the
> name-prefix separator that is used in Emacs.  In other words, I can't
> think of any other "legitimate" use case for the shorthands feature that
> would shadow a similar one-character symbol.  For example using
> 
>    (("/" . "some-longhand/"))
> 
> would shadow the '/' symbol but count as as "the user knows what she's
> doing".

Shouldn't we disallow doing this when the car is any existing symbol?
Why do you think the above use with "/" is legitimate, and when could
it be useful?  And how "/" is different from "-"?  The fact that we
tend to use '-' as name separator is not 100% true: some symbols use
other separators: '/', '+', '>', etc.

Here's another idea: disallow the car from being a string that
includes only punctuation characters.  WDYT?

> (2) Another natural, more generic, way would be to demand that the
> shorthand in the 'car's of the elements of read-symbol-shorthands is
> strictly shorter then the form about to be renamed.  In lread.c, I think
> it would amount to this:

I don't think I like this artificial restriction.  I'm aware that you
are convinced that's how this feature should be used, but hard-coding
your personal opinions, not necessarily shared by others, in such a
low-level code is something we should avoid, I think.  And that's
besides the fact that this changes sensitive parts of Emacs on the
release branch.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51089; Package emacs. (Fri, 08 Oct 2021 07:44:02 GMT) Full text and rfc822 format available.

Message #11 received at 51089 <at> debbugs.gnu.org (full text, mbox):

From: João Távora <joaotavora <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 51089 <at> debbugs.gnu.org, rms <at> gnu.org
Subject: Re: 28.0.60; Using read-symbol-shorthands (("-" . "foo-"))
 shouldn't shadow the '-' symbol
Date: Fri, 08 Oct 2021 08:43:46 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

> Here's another idea: disallow the car from being a string that
> includes only punctuation characters.  WDYT?

That doesn't work, right?  That's precisely what we want to include in
dash.el.

    (("-" . "magnar-dash-"))


So that e.g. the existing 

    (defun -some ...)

in dash.el can be read as if it had been written:

    (defun magnar-dash-some ...)

Maybe you want to mean something else?  Maybe you mean "disallow the
thing to be renamed to include only punctuation characters?".

If so, then I think I agree.  it'd be just a generalization of what I
suggested.

>> (2) Another natural, more generic, way would be to demand that the
>> shorthand in the 'car's of the elements of read-symbol-shorthands is
>> strictly shorter then the form about to be renamed.  In lread.c, I think
>> it would amount to this:
>
> I don't think I like this artificial restriction.

I proposed and coded what I thought you had explicitly agreed with in

    https://lists.gnu.org/archive/html/emacs-devel/2021-10/msg00100.html

> I'm aware that you are convinced that's how this feature should be
> used, but hard-coding your personal opinions, not necessarily shared
> by others, in such a low-level code is something we should avoid, I
> think.  And that's besides the fact that this changes sensitive parts
> of Emacs on the release branch.

I have absolutely no idea what "personal opinions" you are referring to.

* If you're talking about the "shorthand being shorter than the
  longhand", I think I've said that's a matter of usual fact, and not even
  my opinion on how to use this.  But it's not 100% so, in the original
  posting of this bug I presented the below example use case I'd like the
  'cl-loop' to be used as 'loop': shorthand is longer there.

* If you're talking about something else, please clarify.

At any rate, shorthand/longhand size comparisons is NOT what's at stake
here.  In this proposal, what's at stake is to require the shorthand to
be "strictly shorter", rather than the current "of less or equal length"
than the thing being renamed (it's clear that the thing being renamed
can't be longer than the shorthand).

This solves the shadowing problem '-' but breaks the idea of using
cl-lib.el symbols without a prefix (someone had this idea early on, and
I think it's not unthinkable).  So this proposal added a mechanism to
re-allow that.

João




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51089; Package emacs. (Fri, 08 Oct 2021 11:06:01 GMT) Full text and rfc822 format available.

Message #14 received at 51089 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: João Távora <joaotavora <at> gmail.com>
Cc: 51089 <at> debbugs.gnu.org, rms <at> gnu.org
Subject: Re: 28.0.60; Using read-symbol-shorthands (("-" . "foo-"))
 shouldn't shadow the '-' symbol
Date: Fri, 08 Oct 2021 14:04:52 +0300
> From: João Távora <joaotavora <at> gmail.com>
> Cc: 51089 <at> debbugs.gnu.org,  rms <at> gnu.org
> Date: Fri, 08 Oct 2021 08:43:46 +0100
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > Here's another idea: disallow the car from being a string that
> > includes only punctuation characters.  WDYT?
> 
> That doesn't work, right?  That's precisely what we want to include in
> dash.el.
> 
>     (("-" . "magnar-dash-"))
> 
> 
> So that e.g. the existing 
> 
>     (defun -some ...)
> 
> in dash.el can be read as if it had been written:
> 
>     (defun magnar-dash-some ...)
> 
> Maybe you want to mean something else?  Maybe you mean "disallow the
> thing to be renamed to include only punctuation characters?".

Yes, sorry.

> If so, then I think I agree.  it'd be just a generalization of what I
> suggested.

Then let's go for it.

> >> (2) Another natural, more generic, way would be to demand that the
> >> shorthand in the 'car's of the elements of read-symbol-shorthands is
> >> strictly shorter then the form about to be renamed.  In lread.c, I think
> >> it would amount to this:
> >
> > I don't think I like this artificial restriction.
> 
> I proposed and coded what I thought you had explicitly agreed with in
> 
>     https://lists.gnu.org/archive/html/emacs-devel/2021-10/msg00100.html

That wasn't really an agreement, was it?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51089; Package emacs. (Fri, 08 Oct 2021 14:03:02 GMT) Full text and rfc822 format available.

Message #17 received at 51089 <at> debbugs.gnu.org (full text, mbox):

From: João Távora <joaotavora <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 51089 <at> debbugs.gnu.org, Richard Stallman <rms <at> gnu.org>
Subject: Re: 28.0.60; Using read-symbol-shorthands (("-" . "foo-")) shouldn't
 shadow the '-' symbol
Date: Fri, 8 Oct 2021 15:02:32 +0100
[Message part 1 (text/plain, inline)]
On Fri, Oct 8, 2021 at 12:05 PM Eli Zaretskii <eliz <at> gnu.org> wrote:

>
> > If so, then I think I agree.  it'd be just a generalization of what I
> > suggested.
>
> Then let's go for it.
>

OK.

I'll come up with a C patch when I have time if no-one beats me to it (what
is
the easiest C-way of checking if every character is a multibyte string is
punctuation?  Call into Elisp?)


> > >> (2) Another natural, more generic, way would be to demand that the
> > >> shorthand in the 'car's of the elements of read-symbol-shorthands is
> > >> strictly shorter then the form about to be renamed.  In lread.c, I
> think
> > >> it would amount to this:
> > >
> > > I don't think I like this artificial restriction.
> >
> > I proposed and coded what I thought you had explicitly agreed with in
> >
> >     https://lists.gnu.org/archive/html/emacs-devel/2021-10/msg00100.html
>
> That wasn't really an agreement, was it?
>

Sounded like one :-) At any rate, different from the negative points you
make
which I still suspect were made on some later misunderstanding.  It'd be
nice to know that you understand what this alternative does, and if indeed
it was not a misunderstanding, also to understand why you think it's
"artificial"
and what are my "personal opinions about the use of this feature" that you
were referring to.

It's not that your "no punctuation only" fix doesn't cut it for now (I
think it does),
but someday we might want to enhance it and maybe use similar technique
to (2).

Thanks,
João
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51089; Package emacs. (Fri, 08 Oct 2021 16:03:02 GMT) Full text and rfc822 format available.

Message #20 received at 51089 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: João Távora <joaotavora <at> gmail.com>
Cc: 51089 <at> debbugs.gnu.org, rms <at> gnu.org
Subject: Re: 28.0.60; Using read-symbol-shorthands (("-" . "foo-")) shouldn't
 shadow the '-' symbol
Date: Fri, 08 Oct 2021 19:01:40 +0300
> From: João Távora <joaotavora <at> gmail.com>
> Date: Fri, 8 Oct 2021 15:02:32 +0100
> Cc: 51089 <at> debbugs.gnu.org, Richard Stallman <rms <at> gnu.org>
> 
> what is 
> the easiest C-way of checking if every character is a multibyte string is 
> punctuation?  Call into Elisp?

No need to call into Lisp.  See alphanumericp and friends in
character.c for how to do it.  We need a new function like those.  I
will post the general categories of punctuation characters in a while.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51089; Package emacs. (Fri, 08 Oct 2021 19:08:02 GMT) Full text and rfc822 format available.

Message #23 received at 51089 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: joaotavora <at> gmail.com
Cc: 51089 <at> debbugs.gnu.org, rms <at> gnu.org
Subject: Re: bug#51089: 28.0.60;
 Using read-symbol-shorthands (("-" . "foo-")) shouldn't shadow the
 '-' symbol
Date: Fri, 08 Oct 2021 22:07:22 +0300
> Date: Fri, 08 Oct 2021 19:01:40 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 51089 <at> debbugs.gnu.org, rms <at> gnu.org
> 
> > From: João Távora <joaotavora <at> gmail.com>
> > Date: Fri, 8 Oct 2021 15:02:32 +0100
> > Cc: 51089 <at> debbugs.gnu.org, Richard Stallman <rms <at> gnu.org>
> > 
> > what is 
> > the easiest C-way of checking if every character is a multibyte string is 
> > punctuation?  Call into Elisp?
> 
> No need to call into Lisp.  See alphanumericp and friends in
> character.c for how to do it.  We need a new function like those.  I
> will post the general categories of punctuation characters in a while.

Actually, do we care about non-ASCII characters in this context?
Because if we only care about ASCII punctuation, it would be easier to
provide an explicit string made of such characters and see if the
character to test is in the string.  WDYT?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51089; Package emacs. (Sat, 09 Oct 2021 11:22:02 GMT) Full text and rfc822 format available.

Message #26 received at 51089 <at> debbugs.gnu.org (full text, mbox):

From: João Távora <joaotavora <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 51089 <at> debbugs.gnu.org, rms <at> gnu.org
Subject: Re: bug#51089: 28.0.60; Using read-symbol-shorthands (("-" .
 "foo-")) shouldn't shadow the '-' symbol
Date: Sat, 09 Oct 2021 12:21:36 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> Date: Fri, 08 Oct 2021 19:01:40 +0300
>> From: Eli Zaretskii <eliz <at> gnu.org>
>> Cc: 51089 <at> debbugs.gnu.org, rms <at> gnu.org
>> 
>> > From: João Távora <joaotavora <at> gmail.com>
>> > Date: Fri, 8 Oct 2021 15:02:32 +0100
>> > Cc: 51089 <at> debbugs.gnu.org, Richard Stallman <rms <at> gnu.org>
>> > 
>> > what is 
>> > the easiest C-way of checking if every character is a multibyte string is 
>> > punctuation?  Call into Elisp?
>> 
>> No need to call into Lisp.  See alphanumericp and friends in
>> character.c for how to do it.  We need a new function like those.  I
>> will post the general categories of punctuation characters in a while.
>
> Actually, do we care about non-ASCII characters in this context?
> Because if we only care about ASCII punctuation, it would be easier to
> provide an explicit string made of such characters and see if the
> character to test is in the string.  WDYT?

You mean iterate the string being analysed by bytes, and use strchr() on
byte_i and a constant you'll provide me?  And if we reach a byte that's
part of a multibyte one, we bail, knowing that it's not all ASCII
punctuation...  Is that the idea?  Should work, yes.

João




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51089; Package emacs. (Sat, 09 Oct 2021 11:54:01 GMT) Full text and rfc822 format available.

Message #29 received at 51089 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: João Távora <joaotavora <at> gmail.com>
Cc: 51089 <at> debbugs.gnu.org, rms <at> gnu.org
Subject: Re: bug#51089: 28.0.60; Using read-symbol-shorthands (("-" .
 "foo-")) shouldn't shadow the '-' symbol
Date: Sat, 09 Oct 2021 14:52:52 +0300
> From: João Távora <joaotavora <at> gmail.com>
> Cc: 51089 <at> debbugs.gnu.org,  rms <at> gnu.org
> Date: Sat, 09 Oct 2021 12:21:36 +0100
> 
> > Actually, do we care about non-ASCII characters in this context?
> > Because if we only care about ASCII punctuation, it would be easier to
> > provide an explicit string made of such characters and see if the
> > character to test is in the string.  WDYT?
> 
> You mean iterate the string being analysed by bytes, and use strchr() on
> byte_i and a constant you'll provide me?  And if we reach a byte that's
> part of a multibyte one, we bail, knowing that it's not all ASCII
> punctuation...  Is that the idea?  Should work, yes.

Yes.  But I think we could use strcspn for an easier, one-line, test
of the same.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51089; Package emacs. (Sun, 10 Oct 2021 13:43:01 GMT) Full text and rfc822 format available.

Message #32 received at 51089 <at> debbugs.gnu.org (full text, mbox):

From: João Távora <joaotavora <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 51089 <at> debbugs.gnu.org, rms <at> gnu.org
Subject: Re: bug#51089: 28.0.60; Using read-symbol-shorthands (("-" .
 "foo-")) shouldn't shadow the '-' symbol
Date: Sun, 10 Oct 2021 14:42:18 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: João Távora <joaotavora <at> gmail.com>
>> Cc: 51089 <at> debbugs.gnu.org,  rms <at> gnu.org
>> Date: Sat, 09 Oct 2021 12:21:36 +0100
>> 
>> > Actually, do we care about non-ASCII characters in this context?
>> > Because if we only care about ASCII punctuation, it would be easier to
>> > provide an explicit string made of such characters and see if the
>> > character to test is in the string.  WDYT?
>> 
>> You mean iterate the string being analysed by bytes, and use strchr() on
>> byte_i and a constant you'll provide me?  And if we reach a byte that's
>> part of a multibyte one, we bail, knowing that it's not all ASCII
>> punctuation...  Is that the idea?  Should work, yes.
>
> Yes.  But I think we could use strcspn for an easier, one-line, test
> of the same.

OK.  I've now pushed a failing test that should pass once the fix is in
place.  I'll follow up with that later, after reading up on strcpsn().
If someone wants to beats me to it, I recommend reusing the existing
boolean variable skip_shorthand in lread.c.  A tweak to the manual
describing this exception is probably also needed.

João




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51089; Package emacs. (Mon, 11 Oct 2021 00:19:03 GMT) Full text and rfc822 format available.

Message #35 received at 51089 <at> debbugs.gnu.org (full text, mbox):

From: João Távora <joaotavora <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 51089 <at> debbugs.gnu.org, Richard Stallman <rms <at> gnu.org>
Subject: Re: bug#51089: 28.0.60; Using read-symbol-shorthands (("-" . "foo-"))
 shouldn't shadow the '-' symbol
Date: Mon, 11 Oct 2021 01:18:43 +0100
[Message part 1 (text/plain, inline)]
On Sat, Oct 9, 2021 at 12:53 PM Eli Zaretskii <eliz <at> gnu.org> wrote:

>
> > You mean iterate the string being analysed by bytes, and use strchr() on
> > byte_i and a constant you'll provide me?  And if we reach a byte that's
> > part of a multibyte one, we bail, knowing that it's not all ASCII
> > punctuation...  Is that the idea?  Should work, yes.
>
> Yes.  But I think we could use strcspn for an easier, one-line, test
> of the same.
>

I tried to use strcspn() to discover if a C string is entirely comprised
of punctuation (as is required by your idea), but couldn't. That function
deals with prefixes, it's not a "every()" kind of operation.  If you're
seeing something clever to do with it, please tell me, because I'm not.

João
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51089; Package emacs. (Mon, 11 Oct 2021 00:50:01 GMT) Full text and rfc822 format available.

Message #38 received at 51089 <at> debbugs.gnu.org (full text, mbox):

From: João Távora <joaotavora <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 51089 <at> debbugs.gnu.org, Richard Stallman <rms <at> gnu.org>
Subject: Re: bug#51089: 28.0.60; Using read-symbol-shorthands (("-" .
 "foo-")) shouldn't shadow the '-' symbol
Date: Mon, 11 Oct 2021 01:49:46 +0100
João Távora <joaotavora <at> gmail.com> writes:

> On Sat, Oct 9, 2021 at 12:53 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
>
>  > You mean iterate the string being analysed by bytes, and use strchr() on
>  > byte_i and a constant you'll provide me?  And if we reach a byte that's
>  > part of a multibyte one, we bail, knowing that it's not all ASCII
>  > punctuation...  Is that the idea?  Should work, yes.
>
>  Yes.  But I think we could use strcspn for an easier, one-line, test
>  of the same.
>
> I tried to use strcspn() to discover if a C string is entirely comprised
> of punctuation (as is required by your idea), but couldn't. That function
> deals with prefixes, it's not a "every()" kind of operation.  If you're
> seeing something clever to do with it, please tell me, because I'm not.

Seems I didn't try very hard :-)  Using strspn() instead of strcpsn() does
work.  Patch below, which a constant string of ASCII punctuation that
you'll probably want to tweak.

João

diff --git a/doc/lispref/symbols.texi b/doc/lispref/symbols.texi
index 9c33e2c8ec..5494b042e5 100644
--- a/doc/lispref/symbols.texi
+++ b/doc/lispref/symbols.texi
@@ -675,6 +675,11 @@ Shorthands
 
 This variable may only be set in file-local variables (@pxref{File Variables, ,
 Local Variables in Files, emacs, The GNU Emacs Manual}).
+
+As an exception to the above rule, symbol forms comprised entirely of
+ASCII punctuation are exempt from this transformation.  This avoids
+shadowing important symbols like @code{-} or @code{/} when using
+these strings as shorthand prefixes..
 @end defvar
 
 Here's an example of shorthands usage in a hypothetical string
diff --git a/src/lread.c b/src/lread.c
index 07580d11d1..8d23761a4b 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -3805,7 +3805,9 @@ read1 (Lisp_Object readcharfun, int *pch, bool first_in_list)
 	      ptrdiff_t longhand_bytes = 0;
 
 	      Lisp_Object tem;
-	      if (skip_shorthand)
+
+	      if (skip_shorthand ||
+		  strspn(read_buffer, "!@#$%&^*_+-/=<>") >= nbytes)
 		tem = oblookup (obarray, read_buffer, nchars, nbytes);
 	      else
 		tem = oblookup_considering_shorthand (obarray, read_buffer,
diff --git a/test/lisp/progmodes/elisp-mode-tests.el b/test/lisp/progmodes/elisp-mode-tests.el
index e816d3c1b0..ebdfe5f067 100644
--- a/test/lisp/progmodes/elisp-mode-tests.el
+++ b/test/lisp/progmodes/elisp-mode-tests.el
@@ -1094,7 +1094,6 @@ elisp-shorthand-escape
     (should (unintern "f-test4---"))))
 
 (ert-deftest elisp-dont-shadow-punctuation-only-symbols ()
-  :expected-result :failed ;  bug#51089
   (let* ((shorthanded-form '(- 42 (-foo 42)))
          (expected-longhand-form '(- 42 (fooey-foo 42)))
          (observed (let ((read-symbol-shorthands





Reply sent to João Távora <joaotavora <at> gmail.com>:
You have taken responsibility. (Mon, 11 Oct 2021 21:37:01 GMT) Full text and rfc822 format available.

Notification sent to João Távora <joaotavora <at> gmail.com>:
bug acknowledged by developer. (Mon, 11 Oct 2021 21:37:01 GMT) Full text and rfc822 format available.

Message #43 received at 51089-done <at> debbugs.gnu.org (full text, mbox):

From: João Távora <joaotavora <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 51089-done <at> debbugs.gnu.org, Richard Stallman <rms <at> gnu.org>
Subject: Re: bug#51089: 28.0.60; Using read-symbol-shorthands (("-" .
 "foo-")) shouldn't shadow the '-' symbol
Date: Mon, 11 Oct 2021 22:36:14 +0100
João Távora <joaotavora <at> gmail.com> writes:
>>  Yes.  But I think we could use strcspn for an easier, one-line, test
>>  of the same.
> Seems I didn't try very hard :-)  Using strspn() instead of strcpsn() does
> work.  Patch below, with a constant string of ASCII punctuation that
> you'll probably want to tweak.

Hello Eli,

I took the initiative and pushed the patch to emacs-28 after settling on
the string:

   "^*+-/<=>_|"

I chose this string after analysing the Emacs Lisp symbols that are only
punctuation.  I used this Elisp form (very inneficient, but did the job).

   (let ((all-punctuation))
     (mapatoms (lambda (symbol)
                 (when (with-temp-buffer
                         (insert (symbol-name symbol))
                         (goto-char (point-min))
                         (skip-syntax-forward "_")
                         (eobp))
                   (push symbol all-punctuation))))
     all-punctuation)

This returns the following symbols, which I think is the full set of
"protected" ones:

   ;; => (& * + - / < = > _ | ** ++ -- -/ -> /= <= <> >= || ¬ --- --> ->>)

One of them isn't ASCII, but as we discussed it's likely not used a
shorthand prefix.

Closing this bug.  Let me know if I should reopen.
João




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51089; Package emacs. (Tue, 12 Oct 2021 13:23:02 GMT) Full text and rfc822 format available.

Message #46 received at 51089-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: João Távora <joaotavora <at> gmail.com>
Cc: 51089-done <at> debbugs.gnu.org, rms <at> gnu.org
Subject: Re: bug#51089: 28.0.60; Using read-symbol-shorthands (("-" .
 "foo-")) shouldn't shadow the '-' symbol
Date: Tue, 12 Oct 2021 16:22:11 +0300
> From: João Távora <joaotavora <at> gmail.com>
> Cc: 51089-done <at> debbugs.gnu.org,  Richard Stallman <rms <at> gnu.org>
> Date: Mon, 11 Oct 2021 22:36:14 +0100
> 
> João Távora <joaotavora <at> gmail.com> writes:
> >>  Yes.  But I think we could use strcspn for an easier, one-line, test
> >>  of the same.
> > Seems I didn't try very hard :-)  Using strspn() instead of strcpsn() does
> > work.  Patch below, with a constant string of ASCII punctuation that
> > you'll probably want to tweak.
> 
> Hello Eli,
> 
> I took the initiative and pushed the patch to emacs-28 after settling on
> the string:
> 
>    "^*+-/<=>_|"
> 
> I chose this string after analysing the Emacs Lisp symbols that are only
> punctuation.  I used this Elisp form (very inneficient, but did the job).
> 
>    (let ((all-punctuation))
>      (mapatoms (lambda (symbol)
>                  (when (with-temp-buffer
>                          (insert (symbol-name symbol))
>                          (goto-char (point-min))
>                          (skip-syntax-forward "_")
>                          (eobp))
>                    (push symbol all-punctuation))))
>      all-punctuation)
> 
> This returns the following symbols, which I think is the full set of
> "protected" ones:
> 
>    ;; => (& * + - / < = > _ | ** ++ -- -/ -> /= <= <> >= || ¬ --- --> ->>)
> 
> One of them isn't ASCII, but as we discussed it's likely not used a
> shorthand prefix.

Thanks, looks okay to me (modulo some minor changes in wording of
comments and documentation).




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 10 Nov 2021 12:24:12 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 139 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.