GNU bug report logs - #60656
30.0.50; tree-sitter: editing a buffer invalidates visited node instances

Previous Next

Package: emacs;

Reported by: Mickey Petersen <mickey <at> masteringemacs.org>

Date: Sun, 8 Jan 2023 11:09:02 UTC

Severity: normal

Found in version 30.0.50

To reply to this bug, email your comments to 60656 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#60656; Package emacs. (Sun, 08 Jan 2023 11:09:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mickey Petersen <mickey <at> masteringemacs.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 08 Jan 2023 11:09:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mickey Petersen <mickey <at> masteringemacs.org>
To: bug-gnu-emacs <at> gnu.org
Subject: 30.0.50; tree-sitter: editing a buffer invalidates visited node
 instances
Date: Sun, 08 Jan 2023 11:08:16 +0000
If you parse some text, retrieve a node -- using `treesit-node-at', for example -- and then edit the buffer, then the node you retrieved is marked outdated.

However, tree-sitter is capable of handling that, to a greater or lesser extent:

https://tree-sitter.github.io/tree-sitter/using-parsers#editing

It is therefore possible to refresh node instances that were created _before_ the edit. I suppose it could remain an explicit step that you must enter a special form and then Emacs will track node instances issued inside that form and refresh them when edits take place inside of it.

As it stands, it is very hard to edit and maintain a node registry at the same time. (I'm using markers and overlays as a crude hack to work around it.)




In GNU Emacs 30.0.50 (build 6, x86_64-pc-linux-gnu, GTK+ Version
 3.24.20, cairo version 1.16.0) of 2023-01-02 built on mickey-work
Repository revision: c209802f7b3721a1b95113290934a23fee88f678
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12013000
System Description: Ubuntu 20.04.3 LTS




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60656; Package emacs. (Mon, 09 Jan 2023 03:58:01 GMT) Full text and rfc822 format available.

Message #8 received at 60656 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Mickey Petersen <mickey <at> masteringemacs.org>
Cc: 60656 <at> debbugs.gnu.org
Subject: Re: bug#60656: 30.0.50; tree-sitter: editing a buffer invalidates 
 visited node instances
Date: Sun, 8 Jan 2023 19:57:32 -0800
Mickey Petersen <mickey <at> masteringemacs.org> writes:

> If you parse some text, retrieve a node -- using `treesit-node-at',
> for example -- and then edit the buffer, then the node you retrieved
> is marked outdated.
>
> However, tree-sitter is capable of handling that, to a greater or lesser extent:
>
> https://tree-sitter.github.io/tree-sitter/using-parsers#editing
>
> It is therefore possible to refresh node instances that were created
> _before_ the edit. I suppose it could remain an explicit step that you
> must enter a special form and then Emacs will track node instances
> issued inside that form and refresh them when edits take place inside
> of it.
>
> As it stands, it is very hard to edit and maintain a node registry at
> the same time. (I'm using markers and overlays as a crude hack to work
> around it.)

This is kind of a limitation of tree-sitter. The "node editing" isn’t
like what you thought (it fooled me too when I first read it).
Tree-sitter’s incremental parsing works roughly like this:

1. You have a parsed tree, TREE, corresponding to some TEXT
2. You make some edit to the TEXT, eg, TEXT’ = insert(TEXT, 1, "abc")
3. Now you need to "edit" the old tree with _positions_ of your edit:
edit(TREE, Insert(pos=1, len=3)) (Notice that this modifies the tree in-place.)
4. You reparse the edited tree and gets a new tree:
TREE’ = parse(TREE, TEXT’) (Notice that this returns a new tree.)

If you have a NODE from TREE, editing that node only updates position
information. That corresponds to the eidt(TREE, ...) step. There is no
equivalent of the parse(TREE, TEXT’) step for nodes: once the tree is
reparsed and a new tree is returned, none of the nodes in the old tree
gets carried to the new tree. In practice, tree-sitter reuses old tree’s
data, but conceptually the old and new tree don’t share any node.

IOW, the editing feature for nodes is for very specific situations,
where you edit the parse tree but didn’t reparse yet. In this case, if
you want to make your node’s positions to be correct, you edit the node.
But once you reparse, there is no way to somehow "update" this old node
into its "equivalent" in the new tree.

I’m not sure whether tree-sitter is capable to do what you want (after
all the old and new tree are sharing data). But currently it doesn’t
expose the feature to do that.

Yuan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60656; Package emacs. (Mon, 09 Jan 2023 08:59:01 GMT) Full text and rfc822 format available.

Message #11 received at 60656 <at> debbugs.gnu.org (full text, mbox):

From: Mickey Petersen <mickey <at> masteringemacs.org>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 60656 <at> debbugs.gnu.org
Subject: Re: bug#60656: 30.0.50; tree-sitter: editing a buffer invalidates
 visited node instances
Date: Mon, 09 Jan 2023 08:56:57 +0000
Yuan Fu <casouri <at> gmail.com> writes:

> Mickey Petersen <mickey <at> masteringemacs.org> writes:
>
>> If you parse some text, retrieve a node -- using `treesit-node-at',
>> for example -- and then edit the buffer, then the node you retrieved
>> is marked outdated.
>>
>> However, tree-sitter is capable of handling that, to a greater or lesser extent:
>>
>> https://tree-sitter.github.io/tree-sitter/using-parsers#editing
>>
>> It is therefore possible to refresh node instances that were created
>> _before_ the edit. I suppose it could remain an explicit step that you
>> must enter a special form and then Emacs will track node instances
>> issued inside that form and refresh them when edits take place inside
>> of it.
>>
>> As it stands, it is very hard to edit and maintain a node registry at
>> the same time. (I'm using markers and overlays as a crude hack to work
>> around it.)
>
> This is kind of a limitation of tree-sitter. The "node editing" isn’t
> like what you thought (it fooled me too when I first read it).
> Tree-sitter’s incremental parsing works roughly like this:
>
> 1. You have a parsed tree, TREE, corresponding to some TEXT
> 2. You make some edit to the TEXT, eg, TEXT’ = insert(TEXT, 1, "abc")
> 3. Now you need to "edit" the old tree with _positions_ of your edit:
> edit(TREE, Insert(pos=1, len=3)) (Notice that this modifies the tree in-place.)
> 4. You reparse the edited tree and gets a new tree:
> TREE’ = parse(TREE, TEXT’) (Notice that this returns a new tree.)
>
> If you have a NODE from TREE, editing that node only updates position
> information. That corresponds to the eidt(TREE, ...) step. There is no
> equivalent of the parse(TREE, TEXT’) step for nodes: once the tree is
> reparsed and a new tree is returned, none of the nodes in the old tree
> gets carried to the new tree. In practice, tree-sitter reuses old tree’s
> data, but conceptually the old and new tree don’t share any node.
>
> IOW, the editing feature for nodes is for very specific situations,
> where you edit the parse tree but didn’t reparse yet. In this case, if
> you want to make your node’s positions to be correct, you edit the node.
> But once you reparse, there is no way to somehow "update" this old node
> into its "equivalent" in the new tree.
>
> I’m not sure whether tree-sitter is capable to do what you want (after
> all the old and new tree are sharing data). But currently it doesn’t
> expose the feature to do that.
>

That's a shame. The documentation is a little bit ambiguous then. But if the library returns a brand-new tree and thus nodes, then I can see why this won't work.

One possible workaround is that outdated nodes are proxies for their
underlying data (node type, range, text, anonymous/named) so that
their actual state is kept around. That will allow `equal' checks to
still succeed on an outdated and a "brand-new, but identical" node.

Food for thought.

> Yuan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60656; Package emacs. (Mon, 09 Jan 2023 20:31:01 GMT) Full text and rfc822 format available.

Message #14 received at 60656 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Mickey Petersen <mickey <at> masteringemacs.org>
Cc: 60656 <at> debbugs.gnu.org
Subject: Re: bug#60656: 30.0.50; tree-sitter: editing a buffer invalidates 
 visited node instances
Date: Mon, 9 Jan 2023 12:30:20 -0800
Mickey Petersen <mickey <at> masteringemacs.org> writes:

> Yuan Fu <casouri <at> gmail.com> writes:
>
>> Mickey Petersen <mickey <at> masteringemacs.org> writes:
>>
>>> If you parse some text, retrieve a node -- using `treesit-node-at',
>>> for example -- and then edit the buffer, then the node you retrieved
>>> is marked outdated.
>>>
>>> However, tree-sitter is capable of handling that, to a greater or lesser extent:
>>>
>>> https://tree-sitter.github.io/tree-sitter/using-parsers#editing
>>>
>>> It is therefore possible to refresh node instances that were created
>>> _before_ the edit. I suppose it could remain an explicit step that you
>>> must enter a special form and then Emacs will track node instances
>>> issued inside that form and refresh them when edits take place inside
>>> of it.
>>>
>>> As it stands, it is very hard to edit and maintain a node registry at
>>> the same time. (I'm using markers and overlays as a crude hack to work
>>> around it.)
>>
>> This is kind of a limitation of tree-sitter. The "node editing" isn’t
>> like what you thought (it fooled me too when I first read it).
>> Tree-sitter’s incremental parsing works roughly like this:
>>
>> 1. You have a parsed tree, TREE, corresponding to some TEXT
>> 2. You make some edit to the TEXT, eg, TEXT’ = insert(TEXT, 1, "abc")
>> 3. Now you need to "edit" the old tree with _positions_ of your edit:
>> edit(TREE, Insert(pos=1, len=3)) (Notice that this modifies the tree in-place.)
>> 4. You reparse the edited tree and gets a new tree:
>> TREE’ = parse(TREE, TEXT’) (Notice that this returns a new tree.)
>>
>> If you have a NODE from TREE, editing that node only updates position
>> information. That corresponds to the eidt(TREE, ...) step. There is no
>> equivalent of the parse(TREE, TEXT’) step for nodes: once the tree is
>> reparsed and a new tree is returned, none of the nodes in the old tree
>> gets carried to the new tree. In practice, tree-sitter reuses old tree’s
>> data, but conceptually the old and new tree don’t share any node.
>>
>> IOW, the editing feature for nodes is for very specific situations,
>> where you edit the parse tree but didn’t reparse yet. In this case, if
>> you want to make your node’s positions to be correct, you edit the node.
>> But once you reparse, there is no way to somehow "update" this old node
>> into its "equivalent" in the new tree.
>>
>> I’m not sure whether tree-sitter is capable to do what you want (after
>> all the old and new tree are sharing data). But currently it doesn’t
>> expose the feature to do that.
>>
>
> That's a shame. The documentation is a little bit ambiguous then. But
> if the library returns a brand-new tree and thus nodes, then I can see
> why this won't work.

Yeah I wish tree-sitter can have it. Maybe you can raise an issue on
tree-sitter’s github. The author seems to be rather busy, though.

> One possible workaround is that outdated nodes are proxies for their
> underlying data (node type, range, text, anonymous/named) so that
> their actual state is kept around. That will allow `equal' checks to
> still succeed on an outdated and a "brand-new, but identical" node.
>
> Food for thought.

If you can describe what high-level feature you want to accomplish (with
node update), maybe I can provide some suggestions.

Yuan




This bug report was last modified 1 year and 105 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.