GNU bug report logs - #62368
29.0.60; Evaluating predicates before creating captured nodes in treesit-query-capture

Previous Next

Package: emacs;

Reported by: Yuan Fu <casouri <at> gmail.com>

Date: Wed, 22 Mar 2023 04:50:01 UTC

Severity: normal

Found in version 29.0.60

Done: Yuan Fu <casouri <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 62368 in the body.
You can then email your comments to 62368 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to dgutov <at> yandex.ru, bug-gnu-emacs <at> gnu.org:
bug#62368; Package emacs. (Wed, 22 Mar 2023 04:50:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Yuan Fu <casouri <at> gmail.com>:
New bug report received and forwarded. Copy sent to dgutov <at> yandex.ru, bug-gnu-emacs <at> gnu.org. (Wed, 22 Mar 2023 04:50:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Bug Report Emacs <bug-gnu-emacs <at> gnu.org>
Subject: 29.0.60; Evaluating predicates before creating captured nodes in 
 treesit-query-capture
Date: Tue, 21 Mar 2023 21:49:37 -0700
[Message part 1 (text/plain, inline)]
X-Debbugs-CC: dgutov <at> yandex.ru

Dmitry, when you have time, could you try your benchmark in bug#60953
with this patch? I made predicates evaluate before we create any nodes,
so #equal and #match should be more efficient now, when there are a lot
of rejections. In the same time #pred is made slightly worst since they
now create a lisp node and discard it. (But this can be fixed with a
little more complexity.)

Yuan

[pred-first.patch (application/octet-stream, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#62368; Package emacs. (Thu, 23 Mar 2023 00:43:02 GMT) Full text and rfc822 format available.

Message #8 received at 62368 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Yuan Fu <casouri <at> gmail.com>, 62368 <at> debbugs.gnu.org
Subject: Re: bug#62368: 29.0.60; Evaluating predicates before creating
 captured nodes in treesit-query-capture
Date: Thu, 23 Mar 2023 02:42:20 +0200
Hi Yuan!

On 22/03/2023 06:49, Yuan Fu wrote:
> X-Debbugs-CC:dgutov <at> yandex.ru
> 
> Dmitry, when you have time, could you try your benchmark in bug#60953
> with this patch? I made predicates evaluate before we create any nodes,
> so #equal and #match should be more efficient now, when there are a lot
> of rejections. In the same time #pred is made slightly worst since they
> now create a lisp node and discard it. (But this can be fixed with a
> little more complexity.)

Thank you, I was curious what would the improvement be if we could delay 
allocation of node structures until :match is checked.

But for my benchmark the difference is on the order of 4-5%. It seems we 
are scraping the barrel in terms of improving allocations/reducing GC 
because according to 'benchmark-run', where the whole run of a 100 
iterations of the scenario takes ~1.1s, the time spent in GC is 0.150s. 
And the improved version takes like 1.04s, with 0.1s in GC.

So if you ask me, I think I'd prefer to hold off on applying this patch 
until we either find scenarios where the improvement is more 
significant, or we find and eliminate some other bigger bottleneck 
first, after which these 5% grow to become 10-20% or more, of remaining 
runtime. The current approach is pretty Lisp-y, so I kind of like it.

And there's the issue of #pred, of course, which which could swing the 
difference in the other direction (I didn't test any code which uses it).

We could also try a smaller change: where the initial list of conses for 
result is build with capture_id's in car's, and then substituted with 
capture_name if the predicates all match. Then tthe treesit_node 
pseudovectors would still be created eagerly, though.

Here's the current perf report for my benchmark, most of the time is 
spent in libtree-sitter:

  17.02%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_current_status              ◆
  10.94%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_next_sibling           ▒
   9.93%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_first_child            ▒
   9.55%  emacs         emacs                       [.] 
process_mark_stack                         ▒
   4.56%  emacs         libtree-sitter.so.0.0       [.] 
ts_node_start_point                        ▒
   3.90%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_parent_node                 ▒
   3.69%  emacs         emacs                       [.] 
re_match_2_internal                        ▒
   3.08%  emacs         libtree-sitter.so.0.0       [.] 
ts_language_symbol_metadata                ▒
   1.61%  emacs         emacs                       [.] exec_byte_code 
                           ▒
   1.47%  emacs         libtree-sitter.so.0.0       [.] 
ts_node_end_point                          ▒
   1.44%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_current_node                ▒
   1.13%  emacs         emacs                       [.] 
allocate_vectorlike                        ▒
   1.11%  emacs         emacs                       [.] sweep_strings 
                           ▒
   1.04%  emacs         libtree-sitter.so.0.0       [.] 
ts_node_end_byte                           ▒
   0.94%  emacs         emacs                       [.] next_interval 
                           ▒
   0.91%  emacs         libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_parent                 ▒
   0.88%  emacs         emacs                       [.] 
lookup_char_property                       ▒
   0.81%  emacs         emacs                       [.] find_interval 
                           ▒
   0.68%  emacs         emacs                       [.] 
pdumper_marked_p_impl                      ▒
   0.67%  emacs         emacs                       [.] assq_no_quit 
                           ▒
   0.56%  emacs         libtree-sitter.so.0.0       [.] ts_node_symbol 
                           ▒
   0.56%  emacs         emacs                       [.] mark_char_table 
                           ▒
   0.55%  emacs         emacs                       [.] execute_charset 
                           ▒
   0.49%  emacs         libtree-sitter.so.0.0       [.] 
0x000000000001ae3e                         ▒
   0.49%  emacs         emacs                       [.] re_search_2 
                           ▒
   0.48%  emacs         emacs                       [.] funcall_subr 
                           ▒
   0.46%  emacs         libc.so.6                   [.] __strncmp_sse42 
                           ▒
   0.42%  emacs         libtree-sitter.so.0.0       [.] 
ts_language_public_symbol                  ▒
   0.41%  emacs         libtree-sitter.so.0.0       [.] 
ts_node_is_named                           ▒
   0.40%  emacs         libtree-sitter.so.0.0       [.] ts_node_new 
                           ▒
   0.34%  emacs         emacs                       [.] Fassq 
                           ▒
   0.34%  emacs         emacs                       [.] sweep_vectors




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#62368; Package emacs. (Thu, 23 Mar 2023 03:17:01 GMT) Full text and rfc822 format available.

Message #11 received at 62368 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 62368 <at> debbugs.gnu.org
Subject: Re: bug#62368: 29.0.60; Evaluating predicates before creating
 captured nodes in treesit-query-capture
Date: Wed, 22 Mar 2023 20:16:14 -0700

> On Mar 22, 2023, at 5:42 PM, Dmitry Gutov <dgutov <at> yandex.ru> wrote:
> 
> Hi Yuan!
> 
> On 22/03/2023 06:49, Yuan Fu wrote:
>> X-Debbugs-CC:dgutov <at> yandex.ru
>> Dmitry, when you have time, could you try your benchmark in bug#60953
>> with this patch? I made predicates evaluate before we create any nodes,
>> so #equal and #match should be more efficient now, when there are a lot
>> of rejections. In the same time #pred is made slightly worst since they
>> now create a lisp node and discard it. (But this can be fixed with a
>> little more complexity.)
> 
> Thank you, I was curious what would the improvement be if we could delay allocation of node structures until :match is checked.
> 
> But for my benchmark the difference is on the order of 4-5%. It seems we are scraping the barrel in terms of improving allocations/reducing GC because according to 'benchmark-run', where the whole run of a 100 iterations of the scenario takes ~1.1s, the time spent in GC is 0.150s. And the improved version takes like 1.04s, with 0.1s in GC.
> 
> So if you ask me, I think I'd prefer to hold off on applying this patch until we either find scenarios where the improvement is more significant, or we find and eliminate some other bigger bottleneck first, after which these 5% grow to become 10-20% or more, of remaining runtime. The current approach is pretty Lisp-y, so I kind of like it.
> 
> And there's the issue of #pred, of course, which which could swing the difference in the other direction (I didn't test any code which uses it).
> 
> We could also try a smaller change: where the initial list of conses for result is build with capture_id's in car's, and then substituted with capture_name if the predicates all match. Then tthe treesit_node pseudovectors would still be created eagerly, though.

Thank you very much! Yeah, it doesn’t seem to worth it. I guess we can keep this in our back sleeve for now ;-)

I think using symbols is fine for now, since I don’t think it would make much difference.

Yuan



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#62368; Package emacs. (Tue, 12 Sep 2023 00:07:01 GMT) Full text and rfc822 format available.

Message #14 received at 62368 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 62368 <at> debbugs.gnu.org, Dmitry Gutov <dgutov <at> yandex.ru>
Subject: Re: bug#62368: 29.0.60; Evaluating predicates before creating
 captured nodes in treesit-query-capture
Date: Mon, 11 Sep 2023 17:05:48 -0700
Yuan Fu <casouri <at> gmail.com> writes:

>> On Mar 22, 2023, at 5:42 PM, Dmitry Gutov <dgutov <at> yandex.ru> wrote:
>>
>> Hi Yuan!
>>
>> On 22/03/2023 06:49, Yuan Fu wrote:
>>> X-Debbugs-CC:dgutov <at> yandex.ru Dmitry, when you have time, could you
>>> try your benchmark in bug#60953 with this patch? I made predicates
>>> evaluate before we create any nodes, so #equal and #match should be
>>> more efficient now, when there are a lot of rejections. In the same
>>> time #pred is made slightly worst since they now create a lisp node
>>> and discard it. (But this can be fixed with a little more
>>> complexity.)
>>
>> Thank you, I was curious what would the improvement be if we could
>> delay allocation of node structures until :match is checked.
>>
>> But for my benchmark the difference is on the order of 4-5%. It seems
>> we are scraping the barrel in terms of improving allocations/reducing
>> GC because according to 'benchmark-run', where the whole run of a 100
>> iterations of the scenario takes ~1.1s, the time spent in GC is
>> 0.150s. And the improved version takes like 1.04s, with 0.1s in GC.
>>
>> So if you ask me, I think I'd prefer to hold off on applying this
>> patch until we either find scenarios where the improvement is more
>> significant, or we find and eliminate some other bigger bottleneck
>> first, after which these 5% grow to become 10-20% or more, of
>> remaining runtime. The current approach is pretty Lisp-y, so I kind
>> of like it.
>>
>> And there's the issue of #pred, of course, which which could swing
>> the difference in the other direction (I didn't test any code which
>> uses it).
>>
>> We could also try a smaller change: where the initial list of conses
>> for result is build with capture_id's in car's, and then substituted
>> with capture_name if the predicates all match. Then tthe treesit_node
>> pseudovectors would still be created eagerly, though.
>
> Thank you very much! Yeah, it doesn’t seem to worth it. I guess we can
> keep this in our back sleeve for now ;-)
>
> I think using symbols is fine for now, since I don’t think it would
> make much difference.

So should this bug be closed, then?




Reply sent to Yuan Fu <casouri <at> gmail.com>:
You have taken responsibility. (Tue, 12 Sep 2023 00:39:02 GMT) Full text and rfc822 format available.

Notification sent to Yuan Fu <casouri <at> gmail.com>:
bug acknowledged by developer. (Tue, 12 Sep 2023 00:39:02 GMT) Full text and rfc822 format available.

Message #19 received at 62368-done <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 62368-done <at> debbugs.gnu.org, Dmitry Gutov <dgutov <at> yandex.ru>
Subject: Re: bug#62368: 29.0.60; Evaluating predicates before creating
 captured nodes in treesit-query-capture
Date: Mon, 11 Sep 2023 17:37:49 -0700

> On Sep 11, 2023, at 5:05 PM, Stefan Kangas <stefankangas <at> gmail.com> wrote:
> 
> Yuan Fu <casouri <at> gmail.com> writes:
> 
>>> On Mar 22, 2023, at 5:42 PM, Dmitry Gutov <dgutov <at> yandex.ru> wrote:
>>> 
>>> Hi Yuan!
>>> 
>>> On 22/03/2023 06:49, Yuan Fu wrote:
>>>> X-Debbugs-CC:dgutov <at> yandex.ru Dmitry, when you have time, could you
>>>> try your benchmark in bug#60953 with this patch? I made predicates
>>>> evaluate before we create any nodes, so #equal and #match should be
>>>> more efficient now, when there are a lot of rejections. In the same
>>>> time #pred is made slightly worst since they now create a lisp node
>>>> and discard it. (But this can be fixed with a little more
>>>> complexity.)
>>> 
>>> Thank you, I was curious what would the improvement be if we could
>>> delay allocation of node structures until :match is checked.
>>> 
>>> But for my benchmark the difference is on the order of 4-5%. It seems
>>> we are scraping the barrel in terms of improving allocations/reducing
>>> GC because according to 'benchmark-run', where the whole run of a 100
>>> iterations of the scenario takes ~1.1s, the time spent in GC is
>>> 0.150s. And the improved version takes like 1.04s, with 0.1s in GC.
>>> 
>>> So if you ask me, I think I'd prefer to hold off on applying this
>>> patch until we either find scenarios where the improvement is more
>>> significant, or we find and eliminate some other bigger bottleneck
>>> first, after which these 5% grow to become 10-20% or more, of
>>> remaining runtime. The current approach is pretty Lisp-y, so I kind
>>> of like it.
>>> 
>>> And there's the issue of #pred, of course, which which could swing
>>> the difference in the other direction (I didn't test any code which
>>> uses it).
>>> 
>>> We could also try a smaller change: where the initial list of conses
>>> for result is build with capture_id's in car's, and then substituted
>>> with capture_name if the predicates all match. Then tthe treesit_node
>>> pseudovectors would still be created eagerly, though.
>> 
>> Thank you very much! Yeah, it doesn’t seem to worth it. I guess we can
>> keep this in our back sleeve for now ;-)
>> 
>> I think using symbols is fine for now, since I don’t think it would
>> make much difference.
> 
> So should this bug be closed, then?

We can close this for now. Thanks Stefan.

Yuan





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 10 Oct 2023 11:24:14 GMT) Full text and rfc822 format available.

This bug report was last modified 198 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.