GNU bug report logs - #59637
29.0.50; Should treesit-range-settings support the possibility of separate parser for each region?

Previous Next

Package: emacs;

Reported by: miha <at> kamnitnik.top

Date: Sun, 27 Nov 2022 17:12:01 UTC

Severity: normal

Found in version 29.0.50

To reply to this bug, email your comments to 59637 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#59637; Package emacs. (Sun, 27 Nov 2022 17:12:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to miha <at> kamnitnik.top:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 27 Nov 2022 17:12:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: miha <at> kamnitnik.top
To: bug-gnu-emacs <at> gnu.org
Subject: 29.0.50; Should treesit-range-settings support the possibility of
 separate parser for each region?
Date: Sun, 27 Nov 2022 18:12:42 +0100
[Message part 1 (text/plain, inline)]
As far as I understand, the current behaviour of
treesit-parser-set-included-ranges is that the concatenation of text
from different regions in the same range set is considered as one
program. This means that for this html program

    <html>
      <script>
        /* comment start
      </script>
      <script>
        alert('hello');
      </script>
    </html>

treesitter would consider "alert('hello');" to be inside a comment and
the second script tag would contain an error about missing comment
end.

However, testing this in Firefox, it seems that the first script tag is
the erroneous one here and the alert function call isn't inside a
comment. So I guess the correct way to parse this html document would be
to have two instances of javascript parser, one for each region. On the
other hand, we should consider if this is worth the added complexity and
performance degradation.

Thanks and best regards.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59637; Package emacs. (Sun, 27 Nov 2022 17:30:02 GMT) Full text and rfc822 format available.

Message #8 received at 59637 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: miha <at> kamnitnik.top, 59637 <at> debbugs.gnu.org
Cc: Yuan Fu <casouri <at> gmail.com>
Subject: Re: bug#59637: 29.0.50; Should treesit-range-settings support the
 possibility of separate parser for each region?
Date: Sun, 27 Nov 2022 09:28:56 -0800
miha--- via "Bug reports for GNU Emacs, the Swiss army knife of text
editors" <bug-gnu-emacs <at> gnu.org> writes:

> As far as I understand, the current behaviour of
> treesit-parser-set-included-ranges is that the concatenation of text
> from different regions in the same range set is considered as one
> program. This means that for this html program
>
>     <html>
>       <script>
>         /* comment start
>       </script>
>       <script>
>         alert('hello');
>       </script>
>     </html>
>
> treesitter would consider "alert('hello');" to be inside a comment and
> the second script tag would contain an error about missing comment
> end.
>
> However, testing this in Firefox, it seems that the first script tag is
> the erroneous one here and the alert function call isn't inside a
> comment. So I guess the correct way to parse this html document would be
> to have two instances of javascript parser, one for each region. On the
> other hand, we should consider if this is worth the added complexity and
> performance degradation.
>
> Thanks and best regards.

Copying in Yuan Fu.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59637; Package emacs. (Mon, 28 Nov 2022 22:52:01 GMT) Full text and rfc822 format available.

Message #11 received at 59637 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 59637 <at> debbugs.gnu.org, miha <at> kamnitnik.top
Subject: Re: bug#59637: 29.0.50; Should treesit-range-settings support the 
 possibility of separate parser for each region?
Date: Mon, 28 Nov 2022 14:51:30 -0800
Stefan Kangas <stefankangas <at> gmail.com> writes:

> miha--- via "Bug reports for GNU Emacs, the Swiss army knife of text
> editors" <bug-gnu-emacs <at> gnu.org> writes:
>
>> As far as I understand, the current behaviour of
>> treesit-parser-set-included-ranges is that the concatenation of text
>> from different regions in the same range set is considered as one
>> program. This means that for this html program
>>
>>     <html>
>>       <script>
>>         /* comment start
>>       </script>
>>       <script>
>>         alert('hello');
>>       </script>
>>     </html>
>>
>> treesitter would consider "alert('hello');" to be inside a comment and
>> the second script tag would contain an error about missing comment
>> end.
>>
>> However, testing this in Firefox, it seems that the first script tag is
>> the erroneous one here and the alert function call isn't inside a
>> comment. So I guess the correct way to parse this html document would be
>> to have two instances of javascript parser, one for each region. On the
>> other hand, we should consider if this is worth the added complexity and
>> performance degradation.
>>
>> Thanks and best regards.

Yeah it makes sense, but as you say the isolation comes at a cost and I
don’t know if it can be justified right now, because the complexity in
assinging different parsers for each range which can disappear/appear as
the user edits the buffer. Plus the current framework kind of assumes
one parser for each language, so we need some non-trivial change to make
"one parser per range" work smoothly.

For now, I think it’s best to just turn off error highlighting and rely
on tree-sitter’s error recovery. I think that’s what everybody else
does.

In the future if we make the framework more flexible and makes "one
parser per range" easier to implement we can try adding support for it.

>
> Copying in Yuan Fu.

Thanks :-)

Yuan




This bug report was last modified 1 year and 157 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.