GNU bug report logs -
#62717
29.0.60; c-ts-mode does not indent the first line in a function after RET
Previous Next
To reply to this bug, email your comments to 62717 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62717
; Package
emacs
.
(Fri, 07 Apr 2023 19:50:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Daniel Martín <mardani29 <at> yahoo.es>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Fri, 07 Apr 2023 19:50:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
emacs -Q
C-x b sample.c RET
M-x c-ts-mode RET
int main() {
If I press RET at the end of the line, the point is not indented.
The Tree-sitter tree for the code is
(translation_unit
(function_definition type: (primitive_type)
declarator:
(function_declarator declarator: (identifier)
parameters: (parameter_list ( )))
body: (compound_statement { })))
If I insert a closing bracket, that is:
int main() {
}
The newline indents correctly. The Tree-sitter tree is now
(function_definition type: (primitive_type)
declarator:
(function_declarator declarator: (identifier)
parameters: (parameter_list ( )))
body: (compound_statement { }))
It seems that Tree-sitter parses both code snippets without errors and
Emacs should automatically indent after RET in the first case.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62717
; Package
emacs
.
(Sat, 08 Apr 2023 07:17:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 62717 <at> debbugs.gnu.org (full text, mbox):
On 07/04/2023 22:48, Daniel Martín via Bug reports for GNU Emacs, the
Swiss army knife of text editors wrote:
> emacs -Q
> C-x b sample.c RET
> M-x c-ts-mode RET
>
> int main() {
>
> If I press RET at the end of the line, the point is not indented.
>
> The Tree-sitter tree for the code is
>
> (translation_unit
> (function_definition type: (primitive_type)
> declarator:
> (function_declarator declarator: (identifier)
> parameters: (parameter_list ( )))
> body: (compound_statement { })))
>
> If I insert a closing bracket, that is:
>
> int main() {
> }
>
> The newline indents correctly. The Tree-sitter tree is now
>
> (function_definition type: (primitive_type)
> declarator:
> (function_declarator declarator: (identifier)
> parameters: (parameter_list ( )))
> body: (compound_statement { }))
>
> It seems that Tree-sitter parses both code snippets without errors and
> Emacs should automatically indent after RET in the first case.
I've looked at what nvim-treesitter does for indentation, and at least
one of the steps looks like this:
https://github.com/nvim-treesitter/nvim-treesitter/blob/584ccea56e2d37b31ba292da2b539e1a4bb411ca/lua/nvim-treesitter/indent.lua#L129-L134
If the current line is empty, look at the end of the previous line and
compute based on the node there.
I'm not sure how this meshes with the fact that tree-sitter inserts a
"virtual" closer node at the end of the previous line, but the approach
is worth examining.
Daniel, you posted about testing nvim-treesitter with several scenarios.
Does it do the right thing with this one?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62717
; Package
emacs
.
(Sat, 08 Apr 2023 18:39:03 GMT)
Full text and
rfc822 format available.
Message #11 received at 62717 <at> debbugs.gnu.org (full text, mbox):
Dmitry Gutov <dmitry <at> gutov.dev> writes:
>
> I've looked at what nvim-treesitter does for indentation, and at least
> one of the steps looks like this:
>
> https://github.com/nvim-treesitter/nvim-treesitter/blob/584ccea56e2d37b31ba292da2b539e1a4bb411ca/lua/nvim-treesitter/indent.lua#L129-L134
>
> If the current line is empty, look at the end of the previous line and
> compute based on the node there.
>
> I'm not sure how this meshes with the fact that tree-sitter inserts a
> "virtual" closer node at the end of the previous line, but the
> approach is worth examining.
>
> Daniel, you posted about testing nvim-treesitter with several
> scenarios. Does it do the right thing with this one?
Yes, it works well in this scenario. Inserting a new line automatically
adds indentation.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62717
; Package
emacs
.
(Sun, 09 Apr 2023 04:12:04 GMT)
Full text and
rfc822 format available.
Message #14 received at 62717 <at> debbugs.gnu.org (full text, mbox):
On 08/04/2023 21:37, Daniel Martín wrote:
> Dmitry Gutov<dmitry <at> gutov.dev> writes:
>
>> I've looked at what nvim-treesitter does for indentation, and at least
>> one of the steps looks like this:
>>
>> https://github.com/nvim-treesitter/nvim-treesitter/blob/584ccea56e2d37b31ba292da2b539e1a4bb411ca/lua/nvim-treesitter/indent.lua#L129-L134
>>
>> If the current line is empty, look at the end of the previous line and
>> compute based on the node there.
>>
>> I'm not sure how this meshes with the fact that tree-sitter inserts a
>> "virtual" closer node at the end of the previous line, but the
>> approach is worth examining.
>>
>> Daniel, you posted about testing nvim-treesitter with several
>> scenarios. Does it do the right thing with this one?
> Yes, it works well in this scenario. Inserting a new line automatically
> adds indentation.
All right.
From reading the code, it looks like a semi-coincidence that this
example works fine: the algorithm is just different, looking for
indent/dedent nodes, there is nothing similar to our logic in
treesit--indent-1. Which can be good and bad, but it's likely that the
grammar (and tree-sitter itself) co-evolved together with that approach,
so it's no surprise the sharp edges match.
In particular, the virtual closer node seems to be skipped because the
search uses descendant_for_range, which seems to jump over zero-length
nodes.
We could try to hammer in that exception as a workaround, but the
resulting PARENT won't contain BOL anyway, and it's not 100% clear how
these fake nodes will look for other grammars. Indentation in
ruby-ts-mode, for example, won't magically start working right away in
the comparable situation (method definition without closer) because
there is also a missing body_statement node, requiring further changes
to indentation rules.
What does this mean for us? Short of reimplementing nvim-treesitter's
algorithm (and I haven't read Atom's or Zed's indentation code;
anybody's welcome to chime in with a summary of either), we could just
install the patch at the end of this message: it fixes this particular
case, in a bit hackish way, but at least it doesn't affect other languages.
Note that it still doesn't fix very similar cases, e.g.
int main () {
for (;;) {<RET>
(we need additional rules looking for ERROR nodes, like in nvim's
indent.scm), but in does fix
int main () {
for (;;) {}<RET>
and
int main () {
int foo;<RET>
I'm not sure, though, what is the big deal with adding the top-level
function's closing curly first thing before writing the body (after that
the parser starts working much better), so as far as I'm concerned this
patch is very optional. It does add some complexity, after all.
Adding Alan and Joao, who were interested in this scenario as well.
diff --git a/lisp/progmodes/c-ts-mode.el b/lisp/progmodes/c-ts-mode.el
index 981c7766375..9aaa8b32c73 100644
--- a/lisp/progmodes/c-ts-mode.el
+++ b/lisp/progmodes/c-ts-mode.el
@@ -859,6 +859,18 @@ c-ts-mode--defun-skipper
(goto-char (match-end 0)))
(treesit-default-defun-skipper))
+(defun c-ts-base--before-indent (args)
+ (pcase-let ((`(,node ,parent ,bol) args))
+ (when (null node)
+ (let ((smallest-node (treesit-node-at (point))))
+ ;; "Virtual" closer curly added by the
+ ;; parser's error recovery.
+ (when (and (equal (treesit-node-type smallest-node) "}")
+ (equal (treesit-node-end smallest-node)
+ (treesit-node-start smallest-node)))
+ (setq parent (treesit-node-parent smallest-node)))))
+ (list node parent bol)))
+
(defun c-ts-mode-indent-defun ()
"Indent the current top-level declaration syntactically.
@@ -904,6 +916,8 @@ c-ts-base-mode
;; function_definitions, so we need to find the top-level node.
(setq-local treesit-defun-prefer-top-level t)
+ (add-function :filter-args treesit-indent-function
#'c-ts-base--before-indent)
+
;; Indent.
(when (eq c-ts-mode-indent-style 'linux)
(setq-local indent-tabs-mode t))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62717
; Package
emacs
.
(Sun, 09 Apr 2023 05:19:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 62717 <at> debbugs.gnu.org (full text, mbox):
> Cc: Theodor Thornhill <theo <at> thornhill.no>, Yuan Fu <casouri <at> gmail.com>,
> 62717 <at> debbugs.gnu.org,
> João Távora <joaotavora <at> gmail.com>,
> Alan Mackenzie <acm <at> muc.de>
> Date: Sun, 9 Apr 2023 03:20:23 +0300
> From: Dmitry Gutov <dmitry <at> gutov.dev>
>
> What does this mean for us? Short of reimplementing nvim-treesitter's
> algorithm (and I haven't read Atom's or Zed's indentation code;
> anybody's welcome to chime in with a summary of either), we could just
> install the patch at the end of this message: it fixes this particular
> case, in a bit hackish way, but at least it doesn't affect other languages.
>
> Note that it still doesn't fix very similar cases, e.g.
>
> int main () {
> for (;;) {<RET>
>
> (we need additional rules looking for ERROR nodes, like in nvim's
> indent.scm), but in does fix
>
> int main () {
> for (;;) {}<RET>
>
> and
>
> int main () {
> int foo;<RET>
>
> I'm not sure, though, what is the big deal with adding the top-level
> function's closing curly first thing before writing the body (after that
> the parser starts working much better), so as far as I'm concerned this
> patch is very optional. It does add some complexity, after all.
If the patch solves some of the problems, passes the test suite, and
doesn't introduce any regressions you see, I think you should install
it (unless someone here objects).
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62717
; Package
emacs
.
(Sun, 09 Apr 2023 11:06:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 62717 <at> debbugs.gnu.org (full text, mbox):
Dmitry Gutov <dmitry <at> gutov.dev> writes:
>
> I'm not sure, though, what is the big deal with adding the top-level
> function's closing curly first thing before writing the body (after
> that the parser starts working much better), so as far as I'm
> concerned this patch is very optional. It does add some complexity,
> after all.
I think this problem also affects languages without curly braces like
Ruby or Python.
For example, if I insert this Ruby code in a buffer with ruby-ts-mode
enabled
def sample RET
The newline is not indented, but it is indented in ruby-mode.
>
> Adding Alan and Joao, who were interested in this scenario as well.
>
> diff --git a/lisp/progmodes/c-ts-mode.el b/lisp/progmodes/c-ts-mode.el
> index 981c7766375..9aaa8b32c73 100644
> --- a/lisp/progmodes/c-ts-mode.el
> +++ b/lisp/progmodes/c-ts-mode.el
Thanks for the patch. It works correctly and the existing c-ts-mode
tests pass. I suggest adding a test to prevent regressions:
diff --git a/test/lisp/progmodes/c-ts-mode-resources/indent.erts b/test/lisp/progmodes/c-ts-mode-resources/indent.erts
index 5cdefe2122c..221b3d809af 100644
--- a/test/lisp/progmodes/c-ts-mode-resources/indent.erts
+++ b/test/lisp/progmodes/c-ts-mode-resources/indent.erts
@@ -464,3 +464,17 @@ main (void)
|
}
=-=-=
+
+Name: Empty Line (Block Start)
+
+=-=
+int
+main (void)
+{
+|
+=-=
+int
+main (void)
+{
+ |
+=-=-=
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62717
; Package
emacs
.
(Sun, 09 Apr 2023 16:31:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 62717 <at> debbugs.gnu.org (full text, mbox):
On 09/04/2023 14:05, Daniel Martín wrote:
> Dmitry Gutov <dmitry <at> gutov.dev> writes:
>
>>
>> I'm not sure, though, what is the big deal with adding the top-level
>> function's closing curly first thing before writing the body (after
>> that the parser starts working much better), so as far as I'm
>> concerned this patch is very optional. It does add some complexity,
>> after all.
>
> I think this problem also affects languages without curly braces like
> Ruby or Python.
Python -- no, because it doesn't have closers. And its indentation
doesn't use tree-sitter anyway.
> For example, if I insert this Ruby code in a buffer with ruby-ts-mode
> enabled
>
> def sample RET
>
> The newline is not indented, but it is indented in ruby-mode.
Right. I mentioned that in the previous email: a fix for ruby-ts-mode
will involve a similar (but slightly different) advice together with a
new indentation rule. I'm not quite certain yet we want to go there,
because a top-level function is more rare in Ruby than in C/C++. And,
again, as soon as there is at least once 'end' below the current line in
the buffer, things start working much better. The user can either type
is manually, or use ruby-end-mode.
>> Adding Alan and Joao, who were interested in this scenario as well.
>>
>> diff --git a/lisp/progmodes/c-ts-mode.el b/lisp/progmodes/c-ts-mode.el
>> index 981c7766375..9aaa8b32c73 100644
>> --- a/lisp/progmodes/c-ts-mode.el
>> +++ b/lisp/progmodes/c-ts-mode.el
>
> Thanks for the patch. It works correctly and the existing c-ts-mode
> tests pass. I suggest adding a test to prevent regressions:
>
> diff --git a/test/lisp/progmodes/c-ts-mode-resources/indent.erts b/test/lisp/progmodes/c-ts-mode-resources/indent.erts
> index 5cdefe2122c..221b3d809af 100644
> --- a/test/lisp/progmodes/c-ts-mode-resources/indent.erts
> +++ b/test/lisp/progmodes/c-ts-mode-resources/indent.erts
> @@ -464,3 +464,17 @@ main (void)
> |
> }
> =-=-=
> +
> +Name: Empty Line (Block Start)
> +
> +=-=
> +int
> +main (void)
> +{
> +|
> +=-=
> +int
> +main (void)
> +{
> + |
> +=-=-=
Good idea, thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62717
; Package
emacs
.
(Sun, 09 Apr 2023 16:34:01 GMT)
Full text and
rfc822 format available.
Message #26 received at 62717 <at> debbugs.gnu.org (full text, mbox):
On 09/04/2023 08:19, Eli Zaretskii wrote:
>> Cc: Theodor Thornhill<theo <at> thornhill.no>, Yuan Fu<casouri <at> gmail.com>,
>> 62717 <at> debbugs.gnu.org,
>> João Távora<joaotavora <at> gmail.com>,
>> Alan Mackenzie<acm <at> muc.de>
>> Date: Sun, 9 Apr 2023 03:20:23 +0300
>> From: Dmitry Gutov<dmitry <at> gutov.dev>
>>
>> What does this mean for us? Short of reimplementing nvim-treesitter's
>> algorithm (and I haven't read Atom's or Zed's indentation code;
>> anybody's welcome to chime in with a summary of either), we could just
>> install the patch at the end of this message: it fixes this particular
>> case, in a bit hackish way, but at least it doesn't affect other languages.
>>
>> Note that it still doesn't fix very similar cases, e.g.
>>
>> int main () {
>> for (;;) {<RET>
>>
>> (we need additional rules looking for ERROR nodes, like in nvim's
>> indent.scm), but in does fix
>>
>> int main () {
>> for (;;) {}<RET>
>>
>> and
>>
>> int main () {
>> int foo;<RET>
>>
>> I'm not sure, though, what is the big deal with adding the top-level
>> function's closing curly first thing before writing the body (after that
>> the parser starts working much better), so as far as I'm concerned this
>> patch is very optional. It does add some complexity, after all.
> If the patch solves some of the problems, passes the test suite, and
> doesn't introduce any regressions you see, I think you should install
> it (unless someone here objects).
With this kind of change, it's hard to judge regression potential in
advance. I don't really write C/C++ myself with any regularity. And our
test suite doesn't really work incomplete parse trees, I think.
So if we want this in emacs-29, I think someone interested should try
running with this patch applied, at least for a little bit. Maybe just
wait for a couple of days, if Daniel is testing it already.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62717
; Package
emacs
.
(Sun, 09 Apr 2023 17:37:01 GMT)
Full text and
rfc822 format available.
Message #29 received at 62717 <at> debbugs.gnu.org (full text, mbox):
> Date: Sun, 9 Apr 2023 19:33:20 +0300
> Cc: mardani29 <at> yahoo.es, theo <at> thornhill.no, casouri <at> gmail.com,
> 62717 <at> debbugs.gnu.org, joaotavora <at> gmail.com, acm <at> muc.de
> From: Dmitry Gutov <dmitry <at> gutov.dev>
>
> > If the patch solves some of the problems, passes the test suite, and
> > doesn't introduce any regressions you see, I think you should install
> > it (unless someone here objects).
>
> With this kind of change, it's hard to judge regression potential in
> advance. I don't really write C/C++ myself with any regularity. And our
> test suite doesn't really work incomplete parse trees, I think.
>
> So if we want this in emacs-29, I think someone interested should try
> running with this patch applied, at least for a little bit. Maybe just
> wait for a couple of days, if Daniel is testing it already.
The pretest will be out VSN. Maybe you should install this, and we
can then back it out if people complain. FWIW, I intend to use only
c-ts-mode when I the pretest is out, so if there are regressions, I
think I will see it soon enough.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62717
; Package
emacs
.
(Mon, 10 Apr 2023 00:37:01 GMT)
Full text and
rfc822 format available.
Message #32 received at 62717 <at> debbugs.gnu.org (full text, mbox):
On 09/04/2023 20:37, Eli Zaretskii wrote:
>> Date: Sun, 9 Apr 2023 19:33:20 +0300
>> Cc:mardani29 <at> yahoo.es,theo <at> thornhill.no,casouri <at> gmail.com,
>> 62717 <at> debbugs.gnu.org,joaotavora <at> gmail.com,acm <at> muc.de
>> From: Dmitry Gutov<dmitry <at> gutov.dev>
>>
>>> If the patch solves some of the problems, passes the test suite, and
>>> doesn't introduce any regressions you see, I think you should install
>>> it (unless someone here objects).
>> With this kind of change, it's hard to judge regression potential in
>> advance. I don't really write C/C++ myself with any regularity. And our
>> test suite doesn't really work incomplete parse trees, I think.
>>
>> So if we want this in emacs-29, I think someone interested should try
>> running with this patch applied, at least for a little bit. Maybe just
>> wait for a couple of days, if Daniel is testing it already.
> The pretest will be out VSN. Maybe you should install this, and we
> can then back it out if people complain. FWIW, I intend to use only
> c-ts-mode when I the pretest is out, so if there are regressions, I
> think I will see it soon enough.
Okay, SGTM. I've pushed the patch to emacs-29.
Note that there remains a bunch of more complex cases that don't indent
well while there are no closing braces in the buffer.
Examples:
int main() {
for (;;) {<RET>
(including every variation where some chars are deleted from the end of
the second line), or
int main() {
if (2 == 2)<RET>
or
int main() {
if (2 == 2)
foo();
else<RET>
etc.
Enumerating every such case to create a special indentation logic seems
a little tiring. Though if 2-3 of them are determined to be the most
important ones, that might be doable.
But again, as long as there is at least one '}' after point, though,
indentation in all of these cases improves. Though perhaps not ideally
sometimes (e.g. for parenless if/else clauses the indentation starts out
without the additional level).
Anyway, the case described in the report should now be working. Whether
to close the bug or not, it's up to you and other interested parties.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#62717
; Package
emacs
.
(Mon, 10 Apr 2023 21:05:02 GMT)
Full text and
rfc822 format available.
Message #35 received at 62717 <at> debbugs.gnu.org (full text, mbox):
> On Apr 9, 2023, at 5:36 PM, Dmitry Gutov <dmitry <at> gutov.dev> wrote:
>
> On 09/04/2023 20:37, Eli Zaretskii wrote:
>>> Date: Sun, 9 Apr 2023 19:33:20 +0300
>>> Cc:mardani29 <at> yahoo.es,theo <at> thornhill.no,casouri <at> gmail.com,
>>> 62717 <at> debbugs.gnu.org,joaotavora <at> gmail.com,acm <at> muc.de
>>> From: Dmitry Gutov<dmitry <at> gutov.dev>
>>>
>>>> If the patch solves some of the problems, passes the test suite, and
>>>> doesn't introduce any regressions you see, I think you should install
>>>> it (unless someone here objects).
>>> With this kind of change, it's hard to judge regression potential in
>>> advance. I don't really write C/C++ myself with any regularity. And our
>>> test suite doesn't really work incomplete parse trees, I think.
>>>
>>> So if we want this in emacs-29, I think someone interested should try
>>> running with this patch applied, at least for a little bit. Maybe just
>>> wait for a couple of days, if Daniel is testing it already.
>> The pretest will be out VSN. Maybe you should install this, and we
>> can then back it out if people complain. FWIW, I intend to use only
>> c-ts-mode when I the pretest is out, so if there are regressions, I
>> think I will see it soon enough.
>
> Okay, SGTM. I've pushed the patch to emacs-29.
>
> Note that there remains a bunch of more complex cases that don't indent well while there are no closing braces in the buffer.
>
> Examples:
>
> int main() {
> for (;;) {<RET>
>
> (including every variation where some chars are deleted from the end of the second line), or
>
> int main() {
> if (2 == 2)<RET>
>
> or
>
> int main() {
> if (2 == 2)
> foo();
> else<RET>
>
> etc.
>
> Enumerating every such case to create a special indentation logic seems a little tiring. Though if 2-3 of them are determined to be the most important ones, that might be doable.
>
> But again, as long as there is at least one '}' after point, though, indentation in all of these cases improves. Though perhaps not ideally sometimes (e.g. for parenless if/else clauses the indentation starts out without the additional level).
>
> Anyway, the case described in the report should now be working. Whether to close the bug or not, it's up to you and other interested parties.
Thanks for working on this. Maybe we can incorporate the indent.scm method by adding another indent engine alongside the current “simple-indent”.
Yuan
This bug report was last modified 1 year and 208 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.