GNU bug report logs -
#63535
Master branch: Error in forw_comment (syntax.c) handling of escaped LFs
Previous Next
Reported by: Alan Mackenzie <acm <at> muc.de>
Date: Tue, 16 May 2023 10:58:02 UTC
Severity: normal
Done: Alan Mackenzie <acm <at> muc.de>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 63535 in the body.
You can then email your comments to 63535 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#63535
; Package
emacs
.
(Tue, 16 May 2023 10:58:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Alan Mackenzie <acm <at> muc.de>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Tue, 16 May 2023 10:58:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello, Emacs.
In the master branch:
Consider the following C++ Mode buffer:
// comment \
comment line 2
line_3();
.. The backslash at the end of line 1 extends the comment into line 2.
Put point at // on L1, and do:
M-: (setq s (parse-partial-sexp (point) (+ (point) 9)))
.. s gets the parse state of the inside of the comment.
Now put point at EOL 1, between the backslash and the LF. Do
M-: (parse-partial-sexp (point) (point-max) nil nil s 'syntax-table)
.. This ought to leave point at BOL 2, since the syntax before the LF at
EOL 1 is that of a C++ comment, otherwise neutral. Instead, it leaves
point wrongly at BOL 3.
#########################################################################
The reason for this bug is at L+42 of forw_comment (in syntax.c). There
we have
&& !(comment_end_can_be_escaped && char_quoted (from, from_byte))
.. Checking char_quoted is wrong. Instead the function should check the
current parse state.
--
Alan Mackenzie (Nuremberg, Germany).
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#63535
; Package
emacs
.
(Tue, 16 May 2023 14:04:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 63535 <at> debbugs.gnu.org (full text, mbox):
On Tue, May 16, 2023 at 10:57:40 +0000, Alan Mackenzie wrote:
> Hello, Emacs.
> In the master branch:
> Consider the following C++ Mode buffer:
> // comment \
> comment line 2
> line_3();
> .. The backslash at the end of line 1 extends the comment into line 2.
> Put point at // on L1, and do:
> M-: (setq s (parse-partial-sexp (point) (+ (point) 9)))
> .. s gets the parse state of the inside of the comment.
> Now put point at EOL 1, between the backslash and the LF. Do
> M-: (parse-partial-sexp (point) (point-max) nil nil s 'syntax-table)
> .. This ought to leave point at BOL 2, since the syntax before the LF at
> EOL 1 is that of a C++ comment, otherwise neutral. Instead, it leaves
> point wrongly at BOL 3.
> #########################################################################
> The reason for this bug is at L+42 of forw_comment (in syntax.c). There
> we have
> && !(comment_end_can_be_escaped && char_quoted (from, from_byte))
> .. Checking char_quoted is wrong. Instead the function should check the
> current parse state.
And here is a patch which fixes it. I will apply this patch to master
soon if I don't hear any objection.
diff --git a/src/syntax.c b/src/syntax.c
index e9e04e2d638..76d9f16e4ed 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -2344,7 +2344,9 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
&& SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style
&& (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ?
(nesting > 0 && --nesting == 0) : nesting < 0)
- && !(comment_end_can_be_escaped && char_quoted (from, from_byte)))
+ && !(comment_end_can_be_escaped &&
+ (((prev_syntax & 0xff) == Sescape)
+ || ((prev_syntax & 0xff) == Scharquote))))
/* We have encountered a comment end of the same style
as the comment sequence which began this comment
section. */
--
Alan Mackenzie (Nuremberg, Germany).
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#63535
; Package
emacs
.
(Tue, 16 May 2023 15:44:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 63535 <at> debbugs.gnu.org (full text, mbox):
> Date: Tue, 16 May 2023 10:57:40 +0000
> From: Alan Mackenzie <acm <at> muc.de>
>
> Hello, Emacs.
>
> In the master branch:
Is it different on emacs-29?
> && !(comment_end_can_be_escaped && char_quoted (from, from_byte))
>
> .. Checking char_quoted is wrong. Instead the function should check the
> current parse state.
Why not both? IOW, please explain why char_quoted is not TRT here.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#63535
; Package
emacs
.
(Tue, 16 May 2023 16:16:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 63535 <at> debbugs.gnu.org (full text, mbox):
Hello, Eli.
On Tue, May 16, 2023 at 18:43:59 +0300, Eli Zaretskii wrote:
> > Date: Tue, 16 May 2023 10:57:40 +0000
> > From: Alan Mackenzie <acm <at> muc.de>
> > Hello, Emacs.
> > In the master branch:
> Is it different on emacs-29?
No, the bug has been there since ?2016, having been coded, almost
certainly, by me. ;-( The context in 2016 was making an escaped NL in
a C++ line comment continue the comment's fontification onto the next
line. The (then) new variable comment-end-can-be-escaped configured the
effect of the backslash at EOL.
I have been assuming that it is too unimportant a bug to go into
emacs-29 at this late stage.
> > && !(comment_end_can_be_escaped && char_quoted (from, from_byte))
> > .. Checking char_quoted is wrong. Instead the function should check the
> > current parse state.
> Why not both? IOW, please explain why char_quoted is not TRT here.
Because parse-partial-sexp is not scanning the backslash. The scan
starts one character after the backslash, and the syntactic effect of
that backslash is not in the OLDSTATE argument to parse-partial-sexp.
--
Alan Mackenzie (Nuremberg, Germany)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#63535
; Package
emacs
.
(Tue, 16 May 2023 16:30:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 63535 <at> debbugs.gnu.org (full text, mbox):
> Date: Tue, 16 May 2023 16:15:24 +0000
> Cc: 63535 <at> debbugs.gnu.org
> From: Alan Mackenzie <acm <at> muc.de>
>
> > > && !(comment_end_can_be_escaped && char_quoted (from, from_byte))
>
> > > .. Checking char_quoted is wrong. Instead the function should check the
> > > current parse state.
>
> > Why not both? IOW, please explain why char_quoted is not TRT here.
>
> Because parse-partial-sexp is not scanning the backslash. The scan
> starts one character after the backslash, and the syntactic effect of
> that backslash is not in the OLDSTATE argument to parse-partial-sexp.
Sorry, I still don't follow: char_quoted doesn't call
parse-partial-sexp, AFAICT. So why does it matter what
parse-partial-sexp does when we are discussing why char_quoted is not
TRT?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#63535
; Package
emacs
.
(Tue, 16 May 2023 17:00:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 63535 <at> debbugs.gnu.org (full text, mbox):
Hello, Eli.
On Tue, May 16, 2023 at 19:29:26 +0300, Eli Zaretskii wrote:
> > Date: Tue, 16 May 2023 16:15:24 +0000
> > Cc: 63535 <at> debbugs.gnu.org
> > From: Alan Mackenzie <acm <at> muc.de>
> > > > && !(comment_end_can_be_escaped && char_quoted (from, from_byte))
> > > > .. Checking char_quoted is wrong. Instead the function should check the
> > > > current parse state.
> > > Why not both? IOW, please explain why char_quoted is not TRT here.
> > Because parse-partial-sexp is not scanning the backslash. The scan
> > starts one character after the backslash, and the syntactic effect of
> > that backslash is not in the OLDSTATE argument to parse-partial-sexp.
> Sorry, I still don't follow: char_quoted doesn't call
> parse-partial-sexp, AFAICT.
parse-partial-sexp calls forw_comment which (wrongly) calls char_quoted.
> So why does it matter what parse-partial-sexp does when we are
> discussing why char_quoted is not TRT?
parse-partial-sexp is the context in which the bug becomes evident. If,
in the C++ line comment with escaped NL, you start parse-partial-sexp at
the NL, it behaves as though the scan started at the backslash. This is
the bug.
The cause of the bug is the use of char_quoted at line 42 of
forw_comment.
--
Alan Mackenzie (Nuremberg, Germany).
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#63535
; Package
emacs
.
(Tue, 16 May 2023 17:51:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 63535 <at> debbugs.gnu.org (full text, mbox):
> Date: Tue, 16 May 2023 16:58:55 +0000
> Cc: 63535 <at> debbugs.gnu.org
> From: Alan Mackenzie <acm <at> muc.de>
>
> On Tue, May 16, 2023 at 19:29:26 +0300, Eli Zaretskii wrote:
> > > Date: Tue, 16 May 2023 16:15:24 +0000
> > > Cc: 63535 <at> debbugs.gnu.org
> > > From: Alan Mackenzie <acm <at> muc.de>
>
> > > > > && !(comment_end_can_be_escaped && char_quoted (from, from_byte))
>
> > > > > .. Checking char_quoted is wrong. Instead the function should check the
> > > > > current parse state.
>
> > > > Why not both? IOW, please explain why char_quoted is not TRT here.
>
> > > Because parse-partial-sexp is not scanning the backslash. The scan
> > > starts one character after the backslash, and the syntactic effect of
> > > that backslash is not in the OLDSTATE argument to parse-partial-sexp.
>
> > Sorry, I still don't follow: char_quoted doesn't call
> > parse-partial-sexp, AFAICT.
>
> parse-partial-sexp calls forw_comment which (wrongly) calls char_quoted.
>
> > So why does it matter what parse-partial-sexp does when we are
> > discussing why char_quoted is not TRT?
>
> parse-partial-sexp is the context in which the bug becomes evident. If,
> in the C++ line comment with escaped NL, you start parse-partial-sexp at
> the NL, it behaves as though the scan started at the backslash. This is
> the bug.
>
> The cause of the bug is the use of char_quoted at line 42 of
> forw_comment.
Thanks, let's see what Stefan has to say about this.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#63535
; Package
emacs
.
(Wed, 17 May 2023 22:02:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 63535 <at> debbugs.gnu.org (full text, mbox):
Hi Alan,
> diff --git a/src/syntax.c b/src/syntax.c
> index e9e04e2d638..76d9f16e4ed 100644
> --- a/src/syntax.c
> +++ b/src/syntax.c
> @@ -2344,7 +2344,9 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
> && SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style
> && (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ?
> (nesting > 0 && --nesting == 0) : nesting < 0)
> - && !(comment_end_can_be_escaped && char_quoted (from, from_byte)))
> + && !(comment_end_can_be_escaped &&
> + (((prev_syntax & 0xff) == Sescape)
> + || ((prev_syntax & 0xff) == Scharquote))))
> /* We have encountered a comment end of the same style
> as the comment sequence which began this comment
> section. */
AFAIK this is your code, so you should know better, but AFAICT
`prev_syntax` is not updated in the loop, so it only reflects the syntax
before the beginning of the scanned text, rather than anything near `from`.
Are you sure this is right?
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#63535
; Package
emacs
.
(Mon, 22 May 2023 15:00:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 63535 <at> debbugs.gnu.org (full text, mbox):
Hello, Stefan.
On Wed, May 17, 2023 at 18:01:32 -0400, Stefan Monnier wrote:
> Hi Alan,
[ .... ]
> AFAIK this is your code, so you should know better, but AFAICT
> `prev_syntax` is not updated in the loop, so it only reflects the syntax
> before the beginning of the scanned text, rather than anything near `from`.
> Are you sure this is right?
Thanks, you are correct, the patch was not good.
It turned out to be quite tricky to get working. As well as
forw_comment, I had to amend scan_sexps_forward to make it return a
quoted state to its caller when this happens at the limit of the scan.
I think the following patch is better. Would you please have a look at
it, in the hope I haven't made any other silly mistakes. Thanks!
diff --git a/src/syntax.c b/src/syntax.c
index e9e04e2d638..94b2ac2b591 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -2338,13 +2338,16 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
return 0;
}
c = FETCH_CHAR_AS_MULTIBYTE (from_byte);
+ prev_syntax = syntax;
syntax = SYNTAX_WITH_FLAGS (c);
code = syntax & 0xff;
if (code == Sendcomment
&& SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style
&& (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ?
(nesting > 0 && --nesting == 0) : nesting < 0)
- && !(comment_end_can_be_escaped && char_quoted (from, from_byte)))
+ && !(comment_end_can_be_escaped
+ && ((prev_syntax & 0xff) == Sescape
+ || (prev_syntax & 0xff) == Scharquote)))
/* We have encountered a comment end of the same style
as the comment sequence which began this comment
section. */
@@ -2368,7 +2371,11 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
inc_both (&from, &from_byte);
UPDATE_SYNTAX_TABLE_FORWARD (from);
if (from == stop) continue; /* Failure */
- }
+ c = FETCH_CHAR_AS_MULTIBYTE (from_byte);
+ prev_syntax = syntax;
+ syntax = Smax;
+ code = syntax;
+ }
inc_both (&from, &from_byte);
UPDATE_SYNTAX_TABLE_FORWARD (from);
@@ -3349,7 +3356,14 @@ do { prev_from = from; \
are invalid now. Luckily, the `done' doesn't use them
and the INC_FROM sets them to a sane value without
looking at them. */
- if (!found) goto done;
+ if (!found)
+ {
+ if ((prev_from_syntax & 0xff) == Sescape
+ || (prev_from_syntax & 0xff) == Scharquote)
+ goto endquoted;
+ else
+ goto done;
+ }
INC_FROM;
state->incomment = 0;
state->comstyle = 0; /* reset the comment style */
> Stefan
--
Alan Mackenzie (Nuremberg, Germany).
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#63535
; Package
emacs
.
(Mon, 22 May 2023 15:17:01 GMT)
Full text and
rfc822 format available.
Message #32 received at 63535 <at> debbugs.gnu.org (full text, mbox):
> I think the following patch is better. Would you please have a look at
> it, in the hope I haven't made any other silly mistakes. Thanks!
I don't see any silly mistake there, sorry.
Stefan
PS: It does remind me that we really should do ourselves a favor and get rid
of the distinction between `Sescape` and `Scharquote`.
IIRC there's a risk of backward incompatibility, so it has to be done
progressively, but we should start the process. E.g. first declare one of the
two as obsolete, then emit a warning when we see it being used, ...
Reply sent
to
Alan Mackenzie <acm <at> muc.de>
:
You have taken responsibility.
(Mon, 22 May 2023 16:17:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Alan Mackenzie <acm <at> muc.de>
:
bug acknowledged by developer.
(Mon, 22 May 2023 16:17:01 GMT)
Full text and
rfc822 format available.
Message #37 received at 63535-done <at> debbugs.gnu.org (full text, mbox):
Hello, Stefan.
On Mon, May 22, 2023 at 11:16:40 -0400, Stefan Monnier wrote:
> > I think the following patch is better. Would you please have a look at
> > it, in the hope I haven't made any other silly mistakes. Thanks!
> I don't see any silly mistake there, sorry.
Thanks! I've committed the patch, and I'm closing the bug.
> Stefan
> PS: It does remind me that we really should do ourselves a favor and get rid
> of the distinction between `Sescape` and `Scharquote`.
> IIRC there's a risk of backward incompatibility, so it has to be done
> progressively, but we should start the process. E.g. first declare one of the
> two as obsolete, then emit a warning when we see it being used, ...
Yes. I think Sescape should be the survivor. I don't know if
Scharquote is used at all in Emacs code.
--
Alan Mackenzie (Nuremberg, Germany).
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Tue, 20 Jun 2023 11:24:06 GMT)
Full text and
rfc822 format available.
This bug report was last modified 1 year and 326 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.