GNU bug report logs - #55738
character escape bugs in the reader

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> acm.org>

Date: Tue, 31 May 2022 11:34:01 UTC

Severity: normal

Fixed in version 29.1

Done: Stefan Kangas <stefankangas <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 55738 in the body.
You can then email your comments to 55738 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#55738; Package emacs. (Tue, 31 May 2022 11:34:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mattias Engdegård <mattiase <at> acm.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Tue, 31 May 2022 11:34:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: bug-gnu-emacs <at> gnu.org
Subject: character escape bugs in the reader
Date: Tue, 31 May 2022 13:33:09 +0200
Some character escape oddities observed in the Emacs reader:

1. ?\LF => -1

This is clearly a bug (no character literal should be -1) and an artefact of the underlying implementation.
The correct value should be 10.
(In string literals \LF is ignored entirely, as documented.)

2. The Control modifier (\C- or \^) is nonidempotent. For example,
?\C-a => 1
?\C-\C-a => #x4000001

Similarly, "\C-\C-a" signals a reader error.

This too is an artefact of the implementation. The correct value should be as if only a single control modifier were present, eg. ?\C-\C-a => 1.

3. Control-space yields NUL in strings but not as a char literal:
"\C-SPC" => "NUL"
"\^SPC"  => "NUL"
?\C-SPC => #x4000020
?\^SPC  => #x4000020

Emacs takes a conservative stance and normally only generates control characters from upper and lower case ASCII letters and the symbols ?@[\]^_ because that agrees with custom and suffices for all C0 controls. Since most terminals also map Control-SPC to NUL, it would be more consistent to do so in both string and character literals.

The first two bugs are straightforward to fix (I have a patch) and doing so is unlikely to cause any harm.
I honestly don't think making ?\C-SPC => 0 would either (because of how key binding words) but we should investigate further just in case.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55738; Package emacs. (Wed, 01 Jun 2022 10:01:01 GMT) Full text and rfc822 format available.

Message #8 received at 55738 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: 55738 <at> debbugs.gnu.org
Subject: bug#55738: character escape bugs in the reader
Date: Wed, 1 Jun 2022 12:00:21 +0200
[Message part 1 (text/plain, inline)]
Suggested patch. It does not address the third bug above (\C-SPC).

[0001-Fix-reader-char-escape-bugs-bug-55738.patch (application/octet-stream, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55738; Package emacs. (Wed, 01 Jun 2022 13:44:01 GMT) Full text and rfc822 format available.

Message #11 received at 55738 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: 55738 <at> debbugs.gnu.org
Subject: Re: bug#55738: character escape bugs in the reader
Date: Wed, 01 Jun 2022 15:42:47 +0200
Mattias Engdegård <mattiase <at> acm.org> writes:

> Make the character literal ?\LF (linefeed) generate 10, not -1.
>
> Ensure that Control escape sequences in character literals are
> idempotent: ?\C-\C-a and ?\^\^a mean the same thing as ?\C-a and ?\^a,
> generating the control character with value 1.  "\C-\C-a" no longer
> signals an error.

I think both changes make sense.

> * src/lread.c (read_escape): Make nonrecursive and only combine
> the base char with modifiers at the end, creating control chars
> if applicable.  Remove the `stringp` argument; assume character
> literal syntax.  Never return -1.
> (read_string_literal): Handle string-specific escape semantics here
> and simplify.

And also sounds good.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55738; Package emacs. (Wed, 01 Jun 2022 17:57:02 GMT) Full text and rfc822 format available.

Message #14 received at 55738 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 55738 <at> debbugs.gnu.org
Subject: Re: bug#55738: character escape bugs in the reader
Date: Wed, 1 Jun 2022 19:56:46 +0200
1 juni 2022 kl. 15.42 skrev Lars Ingebrigtsen <larsi <at> gnus.org>:

> I think both changes make sense.

Thank you, now in master.

What to do with ?\C-SPC is less clear.  Actually there are no immediate plans to do anything about it at all although the behaviour a bit incongruent (and undocumented).






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55738; Package emacs. (Wed, 01 Jun 2022 20:49:02 GMT) Full text and rfc822 format available.

Message #17 received at 55738 <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 55738 <at> debbugs.gnu.org
Subject: Re: bug#55738: character escape bugs in the reader
Date: Wed, 01 Jun 2022 23:48:42 +0300
[Message part 1 (text/plain, inline)]
Mattias Engdegård [2022-06-01 19:56 +0200] wrote:

> Thank you, now in master.

Thanks, but I think this patch gave rise to the attached build error.

-- 
Basil

$ uname -a
Linux tia 5.17.0-1-amd64 #1 SMP PREEMPT Debian 5.17.3-1 (2022-04-18) x86_64 GNU/Linux
$ cat /etc/debian_version 
bookworm/sid
$ echo $LANG $XMODIFIERS
en_IE.UTF-8 @im=ibus
$ gcc-12 --version | head -1
gcc-12 (Debian 12.1.0-2) 12.1.0
$ make --version | head -2
GNU Make 4.3
Built for x86_64-pc-linux-gnu

[make.txt.gz (application/gzip, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55738; Package emacs. (Wed, 01 Jun 2022 20:54:02 GMT) Full text and rfc822 format available.

Message #20 received at 55738 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: "Basil L. Contovounesios" <contovob <at> tcd.ie>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 55738 <at> debbugs.gnu.org
Subject: Re: bug#55738: character escape bugs in the reader
Date: Wed, 1 Jun 2022 22:53:36 +0200
1 juni 2022 kl. 22.48 skrev Basil L. Contovounesios <contovob <at> tcd.ie>:

> Thanks, but I think this patch gave rise to the attached build error.

Yes, so I found out when trying to bootstrap after having pushed it. (Next time I'll do things in the opposite order.)
Reverted for now. Sorry!





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55738; Package emacs. (Wed, 01 Jun 2022 21:07:01 GMT) Full text and rfc822 format available.

Message #23 received at 55738 <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 55738 <at> debbugs.gnu.org
Subject: Re: bug#55738: character escape bugs in the reader
Date: Thu, 02 Jun 2022 00:05:59 +0300
Mattias Engdegård [2022-06-01 22:53 +0200] wrote:

> Reverted for now. Sorry!

No worries, thanks for working on this!

-- 
Basil




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55738; Package emacs. (Thu, 02 Jun 2022 15:13:02 GMT) Full text and rfc822 format available.

Message #26 received at 55738 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: "Basil L. Contovounesios" <contovob <at> tcd.ie>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, Stefan Kangas <stefankangas <at> gmail.com>,
 55738 <at> debbugs.gnu.org
Subject: Re: bug#55738: character escape bugs in the reader
Date: Thu, 2 Jun 2022 17:12:22 +0200
Looks like there is code like

                     (setq bits (+ bits ?\C-\^@))

where the author wanted just the 'control' bit, and even

  (should (equal (kbd "C-RET") [?\C-\C-m]))

and while these are debatable in style (I'd prefer ?C-\0 to produce the control bit), the risk of breaking external code is too great.

Thus I'm scaling back ambitions a bit and have committed a much-reduced patch that just deals with the ?\LF part.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55738; Package emacs. (Fri, 03 Jun 2022 11:08:02 GMT) Full text and rfc822 format available.

Message #29 received at 55738 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: "Basil L. Contovounesios" <contovob <at> tcd.ie>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, Stefan Kangas <stefankangas <at> gmail.com>,
 55738 <at> debbugs.gnu.org
Subject: Re: bug#55738: character escape bugs in the reader
Date: Fri, 3 Jun 2022 13:07:08 +0200
> Thus I'm scaling back ambitions a bit and have committed a much-reduced patch that just deals with the ?\LF part.

?\LF now signals an error because it's practically always a mistake; see https://lists.gnu.org/archive/html/emacs-devel/2022-06/msg00140.html for some context.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55738; Package emacs. (Sat, 18 Jun 2022 06:56:02 GMT) Full text and rfc822 format available.

Message #32 received at 55738 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: "Basil L. Contovounesios" <contovob <at> tcd.ie>,
 Lars Ingebrigtsen <larsi <at> gnus.org>, 55738 <at> debbugs.gnu.org
Subject: Re: bug#55738: character escape bugs in the reader
Date: Sat, 18 Jun 2022 08:54:47 +0200
Mattias Engdegård <mattiase <at> acm.org> writes:

>   (should (equal (kbd "C-RET") [?\C-\C-m]))
>
> and while these are debatable in style [...]

I don't know what other style you have in mind, but feel free to fix
this to use a better style, if possible.  Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55738; Package emacs. (Sat, 18 Jun 2022 09:35:02 GMT) Full text and rfc822 format available.

Message #35 received at 55738 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: "Basil L. Contovounesios" <contovob <at> tcd.ie>,
 Lars Ingebrigtsen <larsi <at> gnus.org>, 55738 <at> debbugs.gnu.org
Subject: Re: bug#55738: character escape bugs in the reader
Date: Sat, 18 Jun 2022 11:34:19 +0200
18 juni 2022 kl. 08.54 skrev Stefan Kangas <stefankangas <at> gmail.com>:

> I don't know what other style you have in mind, but feel free to fix
> this to use a better style, if possible.

Done, thank you for the reminder. Ideally we should try to do away with the old TTY-centric coupling between Control-m, RET, 13, and the <return> key (etc) but that's for another day.





bug Marked as fixed in versions 29.1. Request was from Stefan Kangas <stefan <at> marxist.se> to control <at> debbugs.gnu.org. (Tue, 28 Jun 2022 21:23:02 GMT) Full text and rfc822 format available.

Reply sent to Stefan Kangas <stefankangas <at> gmail.com>:
You have taken responsibility. (Wed, 06 Sep 2023 01:57:03 GMT) Full text and rfc822 format available.

Notification sent to Mattias Engdegård <mattiase <at> acm.org>:
bug acknowledged by developer. (Wed, 06 Sep 2023 01:57:03 GMT) Full text and rfc822 format available.

Message #42 received at 55738-done <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: "Basil L. Contovounesios" <contovob <at> tcd.ie>, 55738-done <at> debbugs.gnu.org,
 Lars Ingebrigtsen <larsi <at> gnus.org>
Subject: Re: bug#55738: character escape bugs in the reader
Date: Tue, 5 Sep 2023 18:56:12 -0700
Mattias Engdegård <mattiase <at> acm.org> writes:

> 18 juni 2022 kl. 08.54 skrev Stefan Kangas <stefankangas <at> gmail.com>:
>
>> I don't know what other style you have in mind, but feel free to fix
>> this to use a better style, if possible.
>
> Done, thank you for the reminder. Ideally we should try to do away with the old
> TTY-centric coupling between Control-m, RET, 13, and the <return> key (etc) but
> that's for another day.

I guess there's nothing more to do here, so I'm closing this bug.

Please reopen if I missed something.

Thanks.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 04 Oct 2023 11:24:19 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 219 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.