GNU bug report logs - #53260
char-syntax differs in interpreter and bytecode

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> acm.org>

Date: Fri, 14 Jan 2022 16:44:02 UTC

Severity: normal

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 53260 in the body.
You can then email your comments to 53260 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#53260; Package emacs. (Fri, 14 Jan 2022 16:44:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mattias Engdegård <mattiase <at> acm.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 14 Jan 2022 16:44:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: bug-gnu-emacs <at> gnu.org
Subject: char-syntax differs in interpreter and bytecode
Date: Fri, 14 Jan 2022 17:43:00 +0100
Fchar_syntax and the bytecode Bchar_syntax differ:

Fchar_syntax calls SETUP_BUFFER_SYNTAX_TABLE. Bchar_syntax does not.
Bchar_syntax converts arguments to multibyte. Fchar_syntax does not.

The last property can be used to get different behaviour:

(let ((cs (byte-compile (lambda (x) (char-syntax x)))))
  (with-temp-buffer
    (let ((st (make-syntax-table)))
      (set-buffer-multibyte nil)
      (modify-syntax-entry 128 "_" st)
      (set-syntax-table st)
      (list (funcall cs 128) (char-syntax 128)))))
-> (119 95)

Not sure how to expose the presence or absence of SETUP_BUFFER_SYNTAX_TABLE. Suggestions?

And, most importantly, what would be the correct code?

(I suppose char-syntax is rare enough that we could call Fchar_syntax from Bchar_syntax and thus avoid any future divergence.)





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53260; Package emacs. (Sat, 15 Jan 2022 08:37:01 GMT) Full text and rfc822 format available.

Message #8 received at 53260 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: 53260 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#53260: char-syntax differs in interpreter and bytecode
Date: Sat, 15 Jan 2022 09:36:16 +0100
Mattias Engdegård <mattiase <at> acm.org> writes:

> Fchar_syntax and the bytecode Bchar_syntax differ:
>
> Fchar_syntax calls SETUP_BUFFER_SYNTAX_TABLE. Bchar_syntax does not.
> Bchar_syntax converts arguments to multibyte. Fchar_syntax does not.

[...]

> And, most importantly, what would be the correct code?

Hm.  Perhaps Stefan has an opinion; added to the CCs.

> (I suppose char-syntax is rare enough that we could call Fchar_syntax
> from Bchar_syntax and thus avoid any future divergence.)

Used 172 times in-core, which isn't that rare...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53260; Package emacs. (Sat, 15 Jan 2022 14:47:01 GMT) Full text and rfc822 format available.

Message #11 received at 53260 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: Mattias Engdegård <mattiase <at> acm.org>,
 53260 <at> debbugs.gnu.org
Subject: Re: bug#53260: char-syntax differs in interpreter and bytecode
Date: Sat, 15 Jan 2022 09:46:30 -0500
Lars Ingebrigtsen [2022-01-15 09:36:16] wrote:
> Mattias Engdegård <mattiase <at> acm.org> writes:
>> Fchar_syntax and the bytecode Bchar_syntax differ:
>> Fchar_syntax calls SETUP_BUFFER_SYNTAX_TABLE. Bchar_syntax does not.
>> Bchar_syntax converts arguments to multibyte.  Fchar_syntax does not.
> [...]
>> And, most importantly, what would be the correct code?
> Hm.  Perhaps Stefan has an opinion; added to the CCs.

My past opinion is in its docstring:

    If you’re trying to determine the syntax of characters in the buffer,
    this is probably the wrong function to use, because it can’t take
    ‘syntax-table’ text properties into account.  Consider using
    ‘syntax-after’ instead.

The "can't" is because `char-syntax` doesn't know where the char comes from.

>> (I suppose char-syntax is rare enough that we could call Fchar_syntax
>> from Bchar_syntax and thus avoid any future divergence.)
> Used 172 times in-core, which isn't that rare...

I think he meant "rare" w.r.t dynamic count rather than static count.


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53260; Package emacs. (Sat, 15 Jan 2022 17:30:03 GMT) Full text and rfc822 format available.

Message #14 received at 53260 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 53260 <at> debbugs.gnu.org
Subject: Re: bug#53260: char-syntax differs in interpreter and bytecode
Date: Sat, 15 Jan 2022 18:29:41 +0100
15 jan. 2022 kl. 15.46 skrev Stefan Monnier <monnier <at> iro.umontreal.ca>:

>    If you’re trying to determine the syntax of characters in the buffer,
>    this is probably the wrong function to use, because it can’t take
>    ‘syntax-table’ text properties into account.  Consider using
>    ‘syntax-after’ instead.
> 
> The "can't" is because `char-syntax` doesn't know where the char comes from.

This is true and it leaves a narrower use for `char-syntax` in mode-specific code -- ie, when syntax-table text properties do not need to be taken into account.

I propose we do the following:

1. Remove SETUP_BUFFER_SYNTAX_TABLE() from Fchar_syntax because as far as I can tell it has no effect at all.

2. Remove make_char_multibyte(c) from Bchar_syntax because it seems to be the wrong thing to do: in a unibyte buffer, wouldn't the syntax table be indexed by byte value (so that char 255 in the buffer corresponds to entry 255 in the syntax table rather than entry 0x3fffff)?

3. Now both implementations are identical. Replace the one in the byte-code interpreter with a call to Fchar_syntax.

> I think he meant "rare" w.r.t dynamic count rather than static count.

Yes, that's right.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53260; Package emacs. (Sat, 15 Jan 2022 17:59:01 GMT) Full text and rfc822 format available.

Message #17 received at 53260 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: larsi <at> gnus.org, 53260 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#53260: char-syntax differs in interpreter and bytecode
Date: Sat, 15 Jan 2022 19:57:48 +0200
> From: Mattias Engdegård <mattiase <at> acm.org>
> Date: Sat, 15 Jan 2022 18:29:41 +0100
> Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 53260 <at> debbugs.gnu.org
> 
> 2. Remove make_char_multibyte(c) from Bchar_syntax because it seems to be the wrong thing to do: in a unibyte buffer, wouldn't the syntax table be indexed by byte value (so that char 255 in the buffer corresponds to entry 255 in the syntax table rather than entry 0x3fffff)?

I don't think we want to support unibyte buffers which have some text
that is syntactically significant.  A unibyte buffer is just a stream
of raw bytes, they are not characters.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53260; Package emacs. (Sat, 15 Jan 2022 22:52:01 GMT) Full text and rfc822 format available.

Message #20 received at 53260 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 53260 <at> debbugs.gnu.org
Subject: Re: bug#53260: char-syntax differs in interpreter and bytecode
Date: Sat, 15 Jan 2022 17:51:26 -0500
> 1. Remove SETUP_BUFFER_SYNTAX_TABLE() from Fchar_syntax because as far as
> I can tell it has no effect at all.

Sounds good.

> 2. Remove make_char_multibyte(c) from Bchar_syntax because it seems to be
> the wrong thing to do: in a unibyte buffer, wouldn't the syntax table be
> indexed by byte value (so that char 255 in the buffer corresponds to entry
> 255 in the syntax table rather than entry 0x3fffff)?

Doesn't sound right: char tables are indexed by chars (i.e. Unicode code
points) not by bytes, so we need to convert the byte into a char
before indexing.

> 3. Now both implementations are identical. Replace the one in the byte-code
> interpreter with a call to Fchar_syntax.

Sounds good.


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53260; Package emacs. (Sun, 16 Jan 2022 11:06:02 GMT) Full text and rfc822 format available.

Message #23 received at 53260 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 53260 <at> debbugs.gnu.org
Subject: Re: bug#53260: char-syntax differs in interpreter and bytecode [PATCH]
Date: Sun, 16 Jan 2022 12:04:51 +0100
[Message part 1 (text/plain, inline)]
15 jan. 2022 kl. 23.51 skrev Stefan Monnier <monnier <at> iro.umontreal.ca>:

> Doesn't sound right: char tables are indexed by chars (i.e. Unicode code
> points) not by bytes, so we need to convert the byte into a char
> before indexing.

Sure, I'm happy to do it either way. Chars retrieved from unibyte buffers or strings really should be converted to multibyte before used with char-syntax; unibyte buffers are not very common but strings slightly more so.

[0001-Fix-Fchar_syntax-for-non-ASCII-in-unibyte-buffers.patch (application/octet-stream, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53260; Package emacs. (Thu, 20 Jan 2022 09:31:02 GMT) Full text and rfc822 format available.

Message #26 received at 53260 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: 53260 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#53260: char-syntax differs in interpreter and bytecode [PATCH]
Date: Thu, 20 Jan 2022 10:30:23 +0100
Mattias Engdegård <mattiase <at> acm.org> writes:

> Sure, I'm happy to do it either way. Chars retrieved from unibyte
> buffers or strings really should be converted to multibyte before used
> with char-syntax; unibyte buffers are not very common but strings
> slightly more so.

Makes sense to me.  Unless Stefan has any further comments, please go
ahead and push.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Reply sent to Mattias Engdegård <mattiase <at> acm.org>:
You have taken responsibility. (Thu, 20 Jan 2022 10:48:02 GMT) Full text and rfc822 format available.

Notification sent to Mattias Engdegård <mattiase <at> acm.org>:
bug acknowledged by developer. (Thu, 20 Jan 2022 10:48:02 GMT) Full text and rfc822 format available.

Message #31 received at 53260-done <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 53260-done <at> debbugs.gnu.org
Subject: Re: bug#53260: char-syntax differs in interpreter and bytecode [PATCH]
Date: Thu, 20 Jan 2022 11:47:01 +0100
20 jan. 2022 kl. 10.30 skrev Lars Ingebrigtsen <larsi <at> gnus.org>:

> Makes sense to me.  Unless Stefan has any further comments, please go
> ahead and push.

Thank you, pushed with a necessary modification: SETUP_BUFFER_SYNTAX_TABLE() is indeed necessary in Fchar_syntax because syntax.c has its own local #define SYNTAX() and doesn't use the one in syntax.h. Lovely.





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 17 Feb 2022 12:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 67 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.