GNU bug report logs -
#53260
char-syntax differs in interpreter and bytecode
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 53260 in the body.
You can then email your comments to 53260 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#53260
; Package
emacs
.
(Fri, 14 Jan 2022 16:44:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Mattias Engdegård <mattiase <at> acm.org>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Fri, 14 Jan 2022 16:44:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Fchar_syntax and the bytecode Bchar_syntax differ:
Fchar_syntax calls SETUP_BUFFER_SYNTAX_TABLE. Bchar_syntax does not.
Bchar_syntax converts arguments to multibyte. Fchar_syntax does not.
The last property can be used to get different behaviour:
(let ((cs (byte-compile (lambda (x) (char-syntax x)))))
(with-temp-buffer
(let ((st (make-syntax-table)))
(set-buffer-multibyte nil)
(modify-syntax-entry 128 "_" st)
(set-syntax-table st)
(list (funcall cs 128) (char-syntax 128)))))
-> (119 95)
Not sure how to expose the presence or absence of SETUP_BUFFER_SYNTAX_TABLE. Suggestions?
And, most importantly, what would be the correct code?
(I suppose char-syntax is rare enough that we could call Fchar_syntax from Bchar_syntax and thus avoid any future divergence.)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#53260
; Package
emacs
.
(Sat, 15 Jan 2022 08:37:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 53260 <at> debbugs.gnu.org (full text, mbox):
Mattias Engdegård <mattiase <at> acm.org> writes:
> Fchar_syntax and the bytecode Bchar_syntax differ:
>
> Fchar_syntax calls SETUP_BUFFER_SYNTAX_TABLE. Bchar_syntax does not.
> Bchar_syntax converts arguments to multibyte. Fchar_syntax does not.
[...]
> And, most importantly, what would be the correct code?
Hm. Perhaps Stefan has an opinion; added to the CCs.
> (I suppose char-syntax is rare enough that we could call Fchar_syntax
> from Bchar_syntax and thus avoid any future divergence.)
Used 172 times in-core, which isn't that rare...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#53260
; Package
emacs
.
(Sat, 15 Jan 2022 14:47:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 53260 <at> debbugs.gnu.org (full text, mbox):
Lars Ingebrigtsen [2022-01-15 09:36:16] wrote:
> Mattias Engdegård <mattiase <at> acm.org> writes:
>> Fchar_syntax and the bytecode Bchar_syntax differ:
>> Fchar_syntax calls SETUP_BUFFER_SYNTAX_TABLE. Bchar_syntax does not.
>> Bchar_syntax converts arguments to multibyte. Fchar_syntax does not.
> [...]
>> And, most importantly, what would be the correct code?
> Hm. Perhaps Stefan has an opinion; added to the CCs.
My past opinion is in its docstring:
If you’re trying to determine the syntax of characters in the buffer,
this is probably the wrong function to use, because it can’t take
‘syntax-table’ text properties into account. Consider using
‘syntax-after’ instead.
The "can't" is because `char-syntax` doesn't know where the char comes from.
>> (I suppose char-syntax is rare enough that we could call Fchar_syntax
>> from Bchar_syntax and thus avoid any future divergence.)
> Used 172 times in-core, which isn't that rare...
I think he meant "rare" w.r.t dynamic count rather than static count.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#53260
; Package
emacs
.
(Sat, 15 Jan 2022 17:30:03 GMT)
Full text and
rfc822 format available.
Message #14 received at 53260 <at> debbugs.gnu.org (full text, mbox):
15 jan. 2022 kl. 15.46 skrev Stefan Monnier <monnier <at> iro.umontreal.ca>:
> If you’re trying to determine the syntax of characters in the buffer,
> this is probably the wrong function to use, because it can’t take
> ‘syntax-table’ text properties into account. Consider using
> ‘syntax-after’ instead.
>
> The "can't" is because `char-syntax` doesn't know where the char comes from.
This is true and it leaves a narrower use for `char-syntax` in mode-specific code -- ie, when syntax-table text properties do not need to be taken into account.
I propose we do the following:
1. Remove SETUP_BUFFER_SYNTAX_TABLE() from Fchar_syntax because as far as I can tell it has no effect at all.
2. Remove make_char_multibyte(c) from Bchar_syntax because it seems to be the wrong thing to do: in a unibyte buffer, wouldn't the syntax table be indexed by byte value (so that char 255 in the buffer corresponds to entry 255 in the syntax table rather than entry 0x3fffff)?
3. Now both implementations are identical. Replace the one in the byte-code interpreter with a call to Fchar_syntax.
> I think he meant "rare" w.r.t dynamic count rather than static count.
Yes, that's right.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#53260
; Package
emacs
.
(Sat, 15 Jan 2022 17:59:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 53260 <at> debbugs.gnu.org (full text, mbox):
> From: Mattias Engdegård <mattiase <at> acm.org>
> Date: Sat, 15 Jan 2022 18:29:41 +0100
> Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 53260 <at> debbugs.gnu.org
>
> 2. Remove make_char_multibyte(c) from Bchar_syntax because it seems to be the wrong thing to do: in a unibyte buffer, wouldn't the syntax table be indexed by byte value (so that char 255 in the buffer corresponds to entry 255 in the syntax table rather than entry 0x3fffff)?
I don't think we want to support unibyte buffers which have some text
that is syntactically significant. A unibyte buffer is just a stream
of raw bytes, they are not characters.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#53260
; Package
emacs
.
(Sat, 15 Jan 2022 22:52:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 53260 <at> debbugs.gnu.org (full text, mbox):
> 1. Remove SETUP_BUFFER_SYNTAX_TABLE() from Fchar_syntax because as far as
> I can tell it has no effect at all.
Sounds good.
> 2. Remove make_char_multibyte(c) from Bchar_syntax because it seems to be
> the wrong thing to do: in a unibyte buffer, wouldn't the syntax table be
> indexed by byte value (so that char 255 in the buffer corresponds to entry
> 255 in the syntax table rather than entry 0x3fffff)?
Doesn't sound right: char tables are indexed by chars (i.e. Unicode code
points) not by bytes, so we need to convert the byte into a char
before indexing.
> 3. Now both implementations are identical. Replace the one in the byte-code
> interpreter with a call to Fchar_syntax.
Sounds good.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#53260
; Package
emacs
.
(Sun, 16 Jan 2022 11:06:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 53260 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
15 jan. 2022 kl. 23.51 skrev Stefan Monnier <monnier <at> iro.umontreal.ca>:
> Doesn't sound right: char tables are indexed by chars (i.e. Unicode code
> points) not by bytes, so we need to convert the byte into a char
> before indexing.
Sure, I'm happy to do it either way. Chars retrieved from unibyte buffers or strings really should be converted to multibyte before used with char-syntax; unibyte buffers are not very common but strings slightly more so.
[0001-Fix-Fchar_syntax-for-non-ASCII-in-unibyte-buffers.patch (application/octet-stream, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#53260
; Package
emacs
.
(Thu, 20 Jan 2022 09:31:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 53260 <at> debbugs.gnu.org (full text, mbox):
Mattias Engdegård <mattiase <at> acm.org> writes:
> Sure, I'm happy to do it either way. Chars retrieved from unibyte
> buffers or strings really should be converted to multibyte before used
> with char-syntax; unibyte buffers are not very common but strings
> slightly more so.
Makes sense to me. Unless Stefan has any further comments, please go
ahead and push.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Reply sent
to
Mattias Engdegård <mattiase <at> acm.org>
:
You have taken responsibility.
(Thu, 20 Jan 2022 10:48:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Mattias Engdegård <mattiase <at> acm.org>
:
bug acknowledged by developer.
(Thu, 20 Jan 2022 10:48:02 GMT)
Full text and
rfc822 format available.
Message #31 received at 53260-done <at> debbugs.gnu.org (full text, mbox):
20 jan. 2022 kl. 10.30 skrev Lars Ingebrigtsen <larsi <at> gnus.org>:
> Makes sense to me. Unless Stefan has any further comments, please go
> ahead and push.
Thank you, pushed with a necessary modification: SETUP_BUFFER_SYNTAX_TABLE() is indeed necessary in Fchar_syntax because syntax.c has its own local #define SYNTAX() and doesn't use the one in syntax.h. Lovely.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 17 Feb 2022 12:24:07 GMT)
Full text and
rfc822 format available.
This bug report was last modified 2 years and 67 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.