GNU bug report logs - #60691
29.0.60; Slow tree-sitter font-lock in ruby-ts-mode

Previous Next

Package: emacs;

Reported by: Juri Linkov <juri <at> linkov.net>

Date: Mon, 9 Jan 2023 17:36:02 UTC

Severity: normal

Found in version 29.0.60

Done: Dmitry Gutov <dgutov <at> yandex.ru>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 60691 in the body.
You can then email your comments to 60691 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to dgutov <at> yandex.ru, bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Mon, 09 Jan 2023 17:36:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Juri Linkov <juri <at> linkov.net>:
New bug report received and forwarded. Copy sent to dgutov <at> yandex.ru, bug-gnu-emacs <at> gnu.org. (Mon, 09 Jan 2023 17:36:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: bug-gnu-emacs <at> gnu.org
Subject: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Mon, 09 Jan 2023 19:16:12 +0200
X-Debbugs-Cc: Dmitry Gutov <dgutov <at> yandex.ru>

After more rules were added recently to ruby-ts--font-lock-settings,
font-lock became slow even on very small files.  Some measurements:

M-: (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (font-lock-ensure)))

M-x ruby-mode
(1.3564674989999999 0 0.0)

M-x ruby-ts-mode
(8.349582391999999 2 6.489918534000001)

This is not a problem when files are visited infrequently, but
becomes a problem for diff-syntax fontification that wants to
highlight simultaneously many files from git logs.
So a temporary measure would be not to enable ruby-ts-mode
in internal buffers:

(add-hook 'find-file-hook
          (lambda ()
            (when (and (eq major-mode 'ruby-mode)
                       ;; Only when not internal as from diff-syntax
                       (not (string-prefix-p " " (buffer-name))))
              (ruby-ts-mode))))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Mon, 09 Jan 2023 22:34:01 GMT) Full text and rfc822 format available.

Message #8 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Juri Linkov <juri <at> linkov.net>, 60691 <at> debbugs.gnu.org
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Tue, 10 Jan 2023 00:33:12 +0200
Hi!

On 09/01/2023 19:16, Juri Linkov wrote:
> X-Debbugs-Cc: Dmitry Gutov <dgutov <at> yandex.ru>
> 
> After more rules were added recently to ruby-ts--font-lock-settings,
> font-lock became slow even on very small files.  Some measurements:

If you saw a particular commit that made things slower, did you try 
reverting it? What was the performance after?

> M-: (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (font-lock-ensure)))
> 
> M-x ruby-mode
> (1.3564674989999999 0 0.0)
> 
> M-x ruby-ts-mode
> (8.349582391999999 2 6.489918534000001)

I have tried this scenario (which, to be frank, is pretty artificial, 
given that fontification is usually performed in chunks, not over the 
whole buffer).

Perhaps the results depend on a particular file. The ones I have tried 
(ruby.rb and ruby-after-operator-indent.rb) show only 2x difference (or 
less). The difference was in favor of ruby-mode, but given the 
difference in approaches I wouldn't be surprised if ruby-ts-mode incurs 
a fixed overhead somewhere.

> This is not a problem when files are visited infrequently, but
> becomes a problem for diff-syntax fontification that wants to
> highlight simultaneously many files from git logs.
> So a temporary measure would be not to enable ruby-ts-mode
> in internal buffers:

Is it common to try to highlight 1000 or even 100 files in one diff?

> (add-hook 'find-file-hook
>            (lambda ()
>              (when (and (eq major-mode 'ruby-mode)
>                         ;; Only when not internal as from diff-syntax
>                         (not (string-prefix-p " " (buffer-name))))
>                (ruby-ts-mode))))

Have you tried similar tests with other -ts- modes? Ones with complex 
font-lock rules in particular.

I've tried commenting out different rules in 
ruby-ts--font-lock-settings, but none of them seem to have particularly 
outsides impact. Performance seems, roughly, inversely proportional to 
the number of separate "features".

And if all ts modes turn out to have this problem, perhaps the place to 
improve this is inside some common code.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Tue, 10 Jan 2023 08:26:01 GMT) Full text and rfc822 format available.

Message #11 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 60691 <at> debbugs.gnu.org
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Tue, 10 Jan 2023 10:10:53 +0200
>> After more rules were added recently to ruby-ts--font-lock-settings,
>> font-lock became slow even on very small files.  Some measurements:
>
> If you saw a particular commit that made things slower, did you try
> reverting it? What was the performance after?

No particular commit, just adding more rules degrades performance
gradually.

>> M-: (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (font-lock-ensure)))
>> M-x ruby-mode
>> (1.3564674989999999 0 0.0)
>> M-x ruby-ts-mode
>> (8.349582391999999 2 6.489918534000001)
>
> I have tried this scenario (which, to be frank, is pretty artificial, given
> that fontification is usually performed in chunks, not over the whole
> buffer).
>
> Perhaps the results depend on a particular file. The ones I have tried
> (ruby.rb and ruby-after-operator-indent.rb) show only 2x difference (or
> less). The difference was in favor of ruby-mode, but given the difference
> in approaches I wouldn't be surprised if ruby-ts-mode incurs a fixed
> overhead somewhere.

On test/lisp/progmodes/ruby-mode-resources/ruby.rb I see these numbers:

ruby-mode
(8.701560543000001 95 1.045961102)

ruby-ts-mode
(34.653148898000005 1464 16.904981779)

>> This is not a problem when files are visited infrequently, but
>> becomes a problem for diff-syntax fontification that wants to
>> highlight simultaneously many files from git logs.
>> So a temporary measure would be not to enable ruby-ts-mode
>> in internal buffers:
>
> Is it common to try to highlight 1000 or even 100 files in one diff?

100 is rare, but tens is pretty common, so this problem affects
only this specific case.

>> (add-hook 'find-file-hook
>>            (lambda ()
>>              (when (and (eq major-mode 'ruby-mode)
>>                         ;; Only when not internal as from diff-syntax
>>                         (not (string-prefix-p " " (buffer-name))))
>>                (ruby-ts-mode))))
>
> Have you tried similar tests with other -ts- modes? Ones with complex
> font-lock rules in particular.

I tried with c-ts-mode, and it's very fast.

> I've tried commenting out different rules in ruby-ts--font-lock-settings,
> but none of them seem to have particularly outsides impact. Performance
> seems, roughly, inversely proportional to the number of separate
> "features".

Indeed, this is what I see - no particular rule, only their number
affects performance.

> And if all ts modes turn out to have this problem, perhaps the place to
> improve this is inside some common code.

I noticed that while most library files are small, e.g.
libtree-sitter-c.so is 401,528 bytes,
libtree-sitter-ruby.so is 2,130,616 bytes
that means that it has more complex logic
that might explain its performance.

In this case, when nothing could be done to improve performance,
please close this request.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Tue, 10 Jan 2023 14:11:01 GMT) Full text and rfc822 format available.

Message #14 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Juri Linkov <juri <at> linkov.net>, Yuan Fu <casouri <at> gmail.com>
Cc: 60691 <at> debbugs.gnu.org
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Tue, 10 Jan 2023 16:10:49 +0200
On 10/01/2023 10:10, Juri Linkov wrote:
>>> After more rules were added recently to ruby-ts--font-lock-settings,
>>> font-lock became slow even on very small files.  Some measurements:
>>
>> If you saw a particular commit that made things slower, did you try
>> reverting it? What was the performance after?
> 
> No particular commit, just adding more rules degrades performance
> gradually.

But I don't think I added that many rules recently. No more than a 
quarter anyway.

>>> M-: (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (font-lock-ensure)))
>>> M-x ruby-mode
>>> (1.3564674989999999 0 0.0)
>>> M-x ruby-ts-mode
>>> (8.349582391999999 2 6.489918534000001)
>>
>> I have tried this scenario (which, to be frank, is pretty artificial, given
>> that fontification is usually performed in chunks, not over the whole
>> buffer).
>>
>> Perhaps the results depend on a particular file. The ones I have tried
>> (ruby.rb and ruby-after-operator-indent.rb) show only 2x difference (or
>> less). The difference was in favor of ruby-mode, but given the difference
>> in approaches I wouldn't be surprised if ruby-ts-mode incurs a fixed
>> overhead somewhere.
> 
> On test/lisp/progmodes/ruby-mode-resources/ruby.rb I see these numbers:
> 
> ruby-mode
> (8.701560543000001 95 1.045961102)
> 
> ruby-ts-mode
> (34.653148898000005 1464 16.904981779)

Interesting. It's 12s vs 36s for me, as I've retested now.

>>> This is not a problem when files are visited infrequently, but
>>> becomes a problem for diff-syntax fontification that wants to
>>> highlight simultaneously many files from git logs.
>>> So a temporary measure would be not to enable ruby-ts-mode
>>> in internal buffers:
>>
>> Is it common to try to highlight 1000 or even 100 files in one diff?
> 
> 100 is rare, but tens is pretty common, so this problem affects
> only this specific case.

So it's a 0,8-3s delay in those cases? That's not ideal.

>>> (add-hook 'find-file-hook
>>>             (lambda ()
>>>               (when (and (eq major-mode 'ruby-mode)
>>>                          ;; Only when not internal as from diff-syntax
>>>                          (not (string-prefix-p " " (buffer-name))))
>>>                 (ruby-ts-mode))))
>>
>> Have you tried similar tests with other -ts- modes? Ones with complex
>> font-lock rules in particular.
> 
> I tried with c-ts-mode, and it's very fast.

Just how fast is it? The number of font-lock features is has is 
comparable (though a little smaller).

I've tried the same benchmark for it in admin/alloc-colors.c, and it 
comes out to

  (3.2004193190000003 30 0.9609690980000067)

Which seems comparable.

Not sure how to directly test the modes against each other, but if I 
enable ruby-ts-mode in the same file, the benchmark comes to 1s.

Or if I enable c-ts-mode in ruby.rb -- 16s.

>> I've tried commenting out different rules in ruby-ts--font-lock-settings,
>> but none of them seem to have particularly outsides impact. Performance
>> seems, roughly, inversely proportional to the number of separate
>> "features".
> 
> Indeed, this is what I see - no particular rule, only their number
> affects performance.
> 
>> And if all ts modes turn out to have this problem, perhaps the place to
>> improve this is inside some common code.
> 
> I noticed that while most library files are small, e.g.
> libtree-sitter-c.so is 401,528 bytes,
> libtree-sitter-ruby.so is 2,130,616 bytes
> that means that it has more complex logic
> that might explain its performance.

ruby is indeed one of the larger ones. Among the ones I have here 
compiled, it's exceeded only by cpp. 2.29 MB vs 2.12 MB.

But testing admin/alloc-colors.c with c++-ts-mode vs c-ts-mode gives 
very similar performance, so it's unlikely that the complexity of the 
grammar is directly responsible.

> In this case, when nothing could be done to improve performance,
> please close this request.

Perhaps Yuan has some further ideas. There are some strong oddities here:

- Some time into debugging and repeating the benchmark again and again, 
I get the "Pure Lisp storage overflowed" message. Just once per Emacs 
session. It doesn't seem to change much, so it might be unimportant.

- The profiler output looks like this:

  18050  75%                    - font-lock-fontify-syntactically-region
  15686  65%                     - treesit-font-lock-fontify-region
   3738  15% 
treesit--children-covering-range-recurse
    188   0%                        treesit-fontify-with-override

- When running the benchmark for the first time in a buffer (such as 
ruby.rb), the variable treesit--font-lock-fast-mode is usually changed 
to t. In one Emacs session, after I changed it to nil and re-ran the 
benchmark, the variable stayed nil, and the benchmark ran much faster 
(like 10s vs 36s).

In the next session, after I restarted Emacs, that didn't happen: it 
always stayed at t, even if I reset it to nil between runs. But if I 
comment out the block in treesit-font-lock-fontify-region that uses it

    ;; (when treesit--font-lock-fast-mode
    ;;   (setq nodes (treesit--children-covering-range-recurse
    ;;                (car nodes) start end (* 4 jit-lock-chunk-size))))

and evaluate the defun, the benchmark runs much faster again: 11s.

(But then I brought it all back, and re-ran the tests, and the variable 
stayed nil that time around; to sum up: the way it's turned on is unstable.)

Should treesit--font-lock-fast-mode be locally bound inside that 
function, so that it's reset between chunks? Or maybe the condition for 
its enabling should be tweaked? E.g. I don't think there are any 
particularly large or deep nodes in ruby.rb's parse tree. It's a very 
shallow file.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Tue, 10 Jan 2023 17:58:02 GMT) Full text and rfc822 format available.

Message #17 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: Yuan Fu <casouri <at> gmail.com>, 60691 <at> debbugs.gnu.org
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Tue, 10 Jan 2023 19:50:44 +0200
>>> Is it common to try to highlight 1000 or even 100 files in one diff?
>> 100 is rare, but tens is pretty common, so this problem affects
>> only this specific case.
>
> So it's a 0,8-3s delay in those cases? That's not ideal.

The delay is noticeable, alas.

>> I noticed that while most library files are small, e.g.
>> libtree-sitter-c.so is 401,528 bytes,
>> libtree-sitter-ruby.so is 2,130,616 bytes
>> that means that it has more complex logic
>> that might explain its performance.
>
> ruby is indeed one of the larger ones. Among the ones I have here compiled,
> it's exceeded only by cpp. 2.29 MB vs 2.12 MB.

The winner is libtree-sitter-julia.so with 7.25 MB.
But regarding libtree-sitter-cpp.so I confirm it's 2.3 MB.
And c++-ts-mode is even faster than c-ts-mode.
On the same admin/alloc-colors.c:

c-mode
(33.378821569 1500 17.632000617)

c-ts-mode
(2.1949608069999997 34 0.4119784769999981)

c++-ts-mode
(2.0979403910000003 34 0.39749122499999956)

So size doesn't matter.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Wed, 11 Jan 2023 12:13:02 GMT) Full text and rfc822 format available.

Message #20 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Juri Linkov <juri <at> linkov.net>
Cc: Yuan Fu <casouri <at> gmail.com>, 60691 <at> debbugs.gnu.org
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Wed, 11 Jan 2023 14:12:06 +0200
On 10/01/2023 19:50, Juri Linkov wrote:
>>>> Is it common to try to highlight 1000 or even 100 files in one diff?
>>> 100 is rare, but tens is pretty common, so this problem affects
>>> only this specific case.
>>
>> So it's a 0,8-3s delay in those cases? That's not ideal.
> 
> The delay is noticeable, alas.

Right. I'm somewhat worried for the processing speed 
xref--collect-matches too. But that's probably only going to be 
noticeable after we add syntax-propertize-function to ruby-ts-mode.

>>> I noticed that while most library files are small, e.g.
>>> libtree-sitter-c.so is 401,528 bytes,
>>> libtree-sitter-ruby.so is 2,130,616 bytes
>>> that means that it has more complex logic
>>> that might explain its performance.
>>
>> ruby is indeed one of the larger ones. Among the ones I have here compiled,
>> it's exceeded only by cpp. 2.29 MB vs 2.12 MB.
> 
> The winner is libtree-sitter-julia.so with 7.25 MB.
> But regarding libtree-sitter-cpp.so I confirm it's 2.3 MB.
> And c++-ts-mode is even faster than c-ts-mode.

Yep.

> On the same admin/alloc-colors.c:
> 
> c-mode
> (33.378821569 1500 17.632000617)
> 
> c-ts-mode
> (2.1949608069999997 34 0.4119784769999981)
> 
> c++-ts-mode
> (2.0979403910000003 34 0.39749122499999956)
> 
> So size doesn't matter.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Wed, 11 Jan 2023 12:13:02 GMT) Full text and rfc822 format available.

Message #23 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Juri Linkov <juri <at> linkov.net>, Yuan Fu <casouri <at> gmail.com>
Cc: 60691 <at> debbugs.gnu.org
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Wed, 11 Jan 2023 14:12:33 +0200
Yuan? Just making sure you got this message.

On 10/01/2023 16:10, Dmitry Gutov wrote:
> Perhaps Yuan has some further ideas. There are some strong oddities here:
> 
> - Some time into debugging and repeating the benchmark again and again, 
> I get the "Pure Lisp storage overflowed" message. Just once per Emacs 
> session. It doesn't seem to change much, so it might be unimportant.
> 
> - The profiler output looks like this:
> 
>    18050  75%                    - font-lock-fontify-syntactically-region
>    15686  65%                     - treesit-font-lock-fontify-region
>     3738  15% treesit--children-covering-range-recurse
>      188   0%                        treesit-fontify-with-override
> 
> - When running the benchmark for the first time in a buffer (such as 
> ruby.rb), the variable treesit--font-lock-fast-mode is usually changed 
> to t. In one Emacs session, after I changed it to nil and re-ran the 
> benchmark, the variable stayed nil, and the benchmark ran much faster 
> (like 10s vs 36s).
> 
> In the next session, after I restarted Emacs, that didn't happen: it 
> always stayed at t, even if I reset it to nil between runs. But if I 
> comment out the block in treesit-font-lock-fontify-region that uses it
> 
>      ;; (when treesit--font-lock-fast-mode
>      ;;   (setq nodes (treesit--children-covering-range-recurse
>      ;;                (car nodes) start end (* 4 jit-lock-chunk-size))))
> 
> and evaluate the defun, the benchmark runs much faster again: 11s.
> 
> (But then I brought it all back, and re-ran the tests, and the variable 
> stayed nil that time around; to sum up: the way it's turned on is 
> unstable.)
> 
> Should treesit--font-lock-fast-mode be locally bound inside that 
> function, so that it's reset between chunks? Or maybe the condition for 
> its enabling should be tweaked? E.g. I don't think there are any 
> particularly large or deep nodes in ruby.rb's parse tree. It's a very 
> shallow file.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Thu, 12 Jan 2023 21:59:02 GMT) Full text and rfc822 format available.

Message #26 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 60691 <at> debbugs.gnu.org, juri <at> linkov.net
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Thu, 12 Jan 2023 13:58:48 -0800
Dmitry Gutov <dgutov <at> yandex.ru> writes:

> Yuan? Just making sure you got this message.

Sorry for the delay :-)

> On 10/01/2023 16:10, Dmitry Gutov wrote:
>> Perhaps Yuan has some further ideas. There are some strong oddities here:
>> - Some time into debugging and repeating the benchmark again and
>> again, I get the "Pure Lisp storage overflowed" message. Just once
>> per Emacs session. It doesn't seem to change much, so it might be
>> unimportant.

That sounds like 60653. The next time you encounter it, could you record
the output of M-x memory-usage and M-x memory-report? 

>> - The profiler output looks like this:
>>    18050  75%                    -
>> font-lock-fontify-syntactically-region
>>    15686  65%                     - treesit-font-lock-fontify-region
>>     3738  15% treesit--children-covering-range-recurse
>>      188   0%                        treesit-fontify-with-override
>> - When running the benchmark for the first time in a buffer (such as
>> ruby.rb), the variable treesit--font-lock-fast-mode is usually
>> changed to t. In one Emacs session, after I changed it to nil and
>> re-ran the benchmark, the variable stayed nil, and the benchmark ran
>> much faster (like 10s vs 36s).
>> In the next session, after I restarted Emacs, that didn't happen: it
>> always stayed at t, even if I reset it to nil between runs. But if I
>> comment out the block in treesit-font-lock-fontify-region that uses
>> it
>>      ;; (when treesit--font-lock-fast-mode
>>      ;;   (setq nodes (treesit--children-covering-range-recurse
>>      ;;                (car nodes) start end (* 4 jit-lock-chunk-size))))
>> and evaluate the defun, the benchmark runs much faster again: 11s.
>> (But then I brought it all back, and re-ran the tests, and the
>> variable stayed nil that time around; to sum up: the way it's turned
>> on is unstable.)
>> Should treesit--font-lock-fast-mode be locally bound inside that
>> function, so that it's reset between chunks? Or maybe the condition
>> for its enabling should be tweaked? E.g. I don't think there are any
>> particularly large or deep nodes in ruby.rb's parse tree. It's a
>> very shallow file.

Yeah that is a not-very-clever hack. I’ve got an idea: I can add a C
function that checks the maximum depth of a parse tree and the maximum
node span, and turn on the fast-mode if the depth is too large or a node
is too wide. And we do that check once before doing any fontification.

I’ll report back once I add it.

Yuan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Thu, 12 Jan 2023 23:42:01 GMT) Full text and rfc822 format available.

Message #29 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 60691 <at> debbugs.gnu.org, juri <at> linkov.net
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Fri, 13 Jan 2023 01:40:56 +0200
On 12/01/2023 23:58, Yuan Fu wrote:
> 
> Dmitry Gutov <dgutov <at> yandex.ru> writes:
> 
>> Yuan? Just making sure you got this message.
> 
> Sorry for the delay :-)
> 
>> On 10/01/2023 16:10, Dmitry Gutov wrote:
>>> Perhaps Yuan has some further ideas. There are some strong oddities here:
>>> - Some time into debugging and repeating the benchmark again and
>>> again, I get the "Pure Lisp storage overflowed" message. Just once
>>> per Emacs session. It doesn't seem to change much, so it might be
>>> unimportant.
> 
> That sounds like 60653. The next time you encounter it, could you record
> the output of M-x memory-usage and M-x memory-report?

Managed to reproduce this after running the test in a couple of 
different files.

But 'M-x memory-usage' says no such command, and 'M-x memory-report' 
ends up with this error:

Debugger entered--Lisp error: (wrong-type-argument number-or-marker-p nil)
  memory-report--gc-elem(nil strings)
  memory-report--garbage-collect()
  memory-report()
  funcall-interactively(memory-report)
  #<subr call-interactively>(memory-report record nil)
  apply(#<subr call-interactively> memory-report (record nil))
  call-interactively <at> ido-cr+-record-current-command(#<subr 
call-interactively> memory-report record nil)
  apply(call-interactively <at> ido-cr+-record-current-command #<subr 
call-interactively> (memory-report record nil))
  call-interactively(memory-report record nil)
  command-execute(memory-report record)
  execute-extended-command(nil "memory-report" nil)
  funcall-interactively(execute-extended-command nil "memory-report" nil)
  #<subr call-interactively>(execute-extended-command nil nil)
  apply(#<subr call-interactively> execute-extended-command (nil nil))
  call-interactively <at> ido-cr+-record-current-command(#<subr 
call-interactively> execute-extended-command nil nil)
  apply(call-interactively <at> ido-cr+-record-current-command #<subr 
call-interactively> (execute-extended-command nil nil))
  call-interactively(execute-extended-command nil nil)
  command-execute(execute-extended-command)

garbage-collect's docstring says:

  However, if there was overflow in pure space, and Emacs was dumped
  using the "unexec" method, ‘garbage-collect’ returns nil, because
  real GC can’t be done.

I don't know if my Emacs was dumped using "unexec", though. ./configure 
says I'm using pdumper.

In case that matters, I'm testing the emacs-29 branch.

>>> - The profiler output looks like this:
>>>     18050  75%                    -
>>> font-lock-fontify-syntactically-region
>>>     15686  65%                     - treesit-font-lock-fontify-region
>>>      3738  15% treesit--children-covering-range-recurse
>>>       188   0%                        treesit-fontify-with-override
>>> - When running the benchmark for the first time in a buffer (such as
>>> ruby.rb), the variable treesit--font-lock-fast-mode is usually
>>> changed to t. In one Emacs session, after I changed it to nil and
>>> re-ran the benchmark, the variable stayed nil, and the benchmark ran
>>> much faster (like 10s vs 36s).
>>> In the next session, after I restarted Emacs, that didn't happen: it
>>> always stayed at t, even if I reset it to nil between runs. But if I
>>> comment out the block in treesit-font-lock-fontify-region that uses
>>> it
>>>       ;; (when treesit--font-lock-fast-mode
>>>       ;;   (setq nodes (treesit--children-covering-range-recurse
>>>       ;;                (car nodes) start end (* 4 jit-lock-chunk-size))))
>>> and evaluate the defun, the benchmark runs much faster again: 11s.
>>> (But then I brought it all back, and re-ran the tests, and the
>>> variable stayed nil that time around; to sum up: the way it's turned
>>> on is unstable.)
>>> Should treesit--font-lock-fast-mode be locally bound inside that
>>> function, so that it's reset between chunks? Or maybe the condition
>>> for its enabling should be tweaked? E.g. I don't think there are any
>>> particularly large or deep nodes in ruby.rb's parse tree. It's a
>>> very shallow file.
> 
> Yeah that is a not-very-clever hack. I’ve got an idea: I can add a C
> function that checks the maximum depth of a parse tree and the maximum
> node span, and turn on the fast-mode if the depth is too large or a node
> is too wide. And we do that check once before doing any fontification.
> 
> I’ll report back once I add it.

Thanks!

And if the check can be fast enough, we could probably do it in the 
beginning of fontifying every chunk.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Fri, 13 Jan 2023 07:59:02 GMT) Full text and rfc822 format available.

Message #32 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: casouri <at> gmail.com, Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 60691 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 juri <at> linkov.net
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Fri, 13 Jan 2023 09:57:52 +0200
> Cc: 60691 <at> debbugs.gnu.org, juri <at> linkov.net
> Date: Fri, 13 Jan 2023 01:40:56 +0200
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> 
> Managed to reproduce this after running the test in a couple of 
> different files.
> 
> But 'M-x memory-usage' says no such command, and 'M-x memory-report' 
> ends up with this error:
> 
> Debugger entered--Lisp error: (wrong-type-argument number-or-marker-p nil)
>    memory-report--gc-elem(nil strings)
>    memory-report--garbage-collect()
>    memory-report()

This means GC is disabled in this session at the time you invoke
memory-report.  Which shouldn't happen, of course.  It sounds like
your pure Lisp storage overflowed, and that disabled GC.

And I think I see the problem: we use build_pure_c_string in treesit.c
in places that we shouldn't.

Yuan, build_pure_c_string should only be used in places such as
syms_of_treesit, which are called just once, during dumping.  Look at
all the other calls to this function in the sources, and you will see
it.  In all other cases, you should do one of the following:

  . for strings whose text is fixed, define a variable, give it the
    value in syms_of_treesit using build_pure_c_string, then use that
    variable elsewhere in the source
  . for strings whose text depends on run-time information, use
    AUTO_STRING or build_string

This is a serious problem, and we should fix it ASAP.

> garbage-collect's docstring says:
> 
>    However, if there was overflow in pure space, and Emacs was dumped
>    using the "unexec" method, ‘garbage-collect’ returns nil, because
>    real GC can’t be done.
> 
> I don't know if my Emacs was dumped using "unexec", though. ./configure 
> says I'm using pdumper.

The above text doesn't account for bugs ;-)  Functions that produce
objects in pure space are supposed to be called only during the build,
a.k.a. "when dumping", and for that the text is correct.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Fri, 13 Jan 2023 09:16:01 GMT) Full text and rfc822 format available.

Message #35 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Juri Linkov <juri <at> linkov.net>, 60691 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, Dmitry Gutov <dgutov <at> yandex.ru>
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Fri, 13 Jan 2023 01:15:09 -0800

> On Jan 12, 2023, at 11:57 PM, Eli Zaretskii <eliz <at> gnu.org> wrote:
> 
>> Cc: 60691 <at> debbugs.gnu.org, juri <at> linkov.net
>> Date: Fri, 13 Jan 2023 01:40:56 +0200
>> From: Dmitry Gutov <dgutov <at> yandex.ru>
>> 
>> Managed to reproduce this after running the test in a couple of 
>> different files.
>> 
>> But 'M-x memory-usage' says no such command, and 'M-x memory-report' 
>> ends up with this error:
>> 
>> Debugger entered--Lisp error: (wrong-type-argument number-or-marker-p nil)
>>   memory-report--gc-elem(nil strings)
>>   memory-report--garbage-collect()
>>   memory-report()
> 
> This means GC is disabled in this session at the time you invoke
> memory-report.  Which shouldn't happen, of course.  It sounds like
> your pure Lisp storage overflowed, and that disabled GC.
> 
> And I think I see the problem: we use build_pure_c_string in treesit.c
> in places that we shouldn't.
> 
> Yuan, build_pure_c_string should only be used in places such as
> syms_of_treesit, which are called just once, during dumping.  Look at
> all the other calls to this function in the sources, and you will see
> it.  In all other cases, you should do one of the following:
> 
>  . for strings whose text is fixed, define a variable, give it the
>    value in syms_of_treesit using build_pure_c_string, then use that
>    variable elsewhere in the source

Can I define a bunch of static C variables and initialize them in syms_of_treesit, or they have to be all Lisp variables? Eg,

static Lisp_Object TREESIT_STAR;

...

void
syms_of_treesit (void)
{
...
TREESIT_STAR = build_pure_c_string ("*");
...
}

Yuan



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Fri, 13 Jan 2023 11:52:02 GMT) Full text and rfc822 format available.

Message #38 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Yuan Fu <casouri <at> gmail.com>
Cc: juri <at> linkov.net, 60691 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca,
 dgutov <at> yandex.ru
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Fri, 13 Jan 2023 13:51:15 +0200
> From: Yuan Fu <casouri <at> gmail.com>
> Date: Fri, 13 Jan 2023 01:15:09 -0800
> Cc: Dmitry Gutov <dgutov <at> yandex.ru>,
>  60691 <at> debbugs.gnu.org,
>  Juri Linkov <juri <at> linkov.net>,
>  Stefan Monnier <monnier <at> iro.umontreal.ca>
> 
> > On Jan 12, 2023, at 11:57 PM, Eli Zaretskii <eliz <at> gnu.org> wrote:
> > 
> >> Cc: 60691 <at> debbugs.gnu.org, juri <at> linkov.net
> >> Date: Fri, 13 Jan 2023 01:40:56 +0200
> >> From: Dmitry Gutov <dgutov <at> yandex.ru>
> >> 
> >> Managed to reproduce this after running the test in a couple of 
> >> different files.
> >> 
> >> But 'M-x memory-usage' says no such command, and 'M-x memory-report' 
> >> ends up with this error:
> >> 
> >> Debugger entered--Lisp error: (wrong-type-argument number-or-marker-p nil)
> >>   memory-report--gc-elem(nil strings)
> >>   memory-report--garbage-collect()
> >>   memory-report()
> > 
> > This means GC is disabled in this session at the time you invoke
> > memory-report.  Which shouldn't happen, of course.  It sounds like
> > your pure Lisp storage overflowed, and that disabled GC.
> > 
> > And I think I see the problem: we use build_pure_c_string in treesit.c
> > in places that we shouldn't.
> > 
> > Yuan, build_pure_c_string should only be used in places such as
> > syms_of_treesit, which are called just once, during dumping.  Look at
> > all the other calls to this function in the sources, and you will see
> > it.  In all other cases, you should do one of the following:
> > 
> >  . for strings whose text is fixed, define a variable, give it the
> >    value in syms_of_treesit using build_pure_c_string, then use that
> >    variable elsewhere in the source
> 
> Can I define a bunch of static C variables and initialize them in syms_of_treesit, or they have to be all Lisp variables? Eg,
> 
> static Lisp_Object TREESIT_STAR;
> 
> ...
> 
> void
> syms_of_treesit (void)
> {
> ...
> TREESIT_STAR = build_pure_c_string ("*");
> ...
> }

Yes, of course.  Look, for example, how coding.c does that:

  /* A string that serves as name of the reusable work buffer, and as base
     name of temporary work buffers used for code-conversion operations.  */
  static Lisp_Object Vcode_conversion_workbuf_name;
  [...]
  void
  syms_of_coding (void)
  {
  [...]
    staticpro (&Vcode_conversion_workbuf_name);
    Vcode_conversion_workbuf_name = build_pure_c_string (" *code-conversion-work*");

But please keep the convention of naming such variables Vsome_thing,
both regarding the "V" and the fact that the name is otherwise
lower-case.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Sat, 14 Jan 2023 03:49:01 GMT) Full text and rfc822 format available.

Message #41 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: juri <at> linkov.net, 60691 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca,
 dgutov <at> yandex.ru
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Fri, 13 Jan 2023 19:48:40 -0800

> On Jan 13, 2023, at 3:51 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
> 
>> From: Yuan Fu <casouri <at> gmail.com>
>> Date: Fri, 13 Jan 2023 01:15:09 -0800
>> Cc: Dmitry Gutov <dgutov <at> yandex.ru>,
>> 60691 <at> debbugs.gnu.org,
>> Juri Linkov <juri <at> linkov.net>,
>> Stefan Monnier <monnier <at> iro.umontreal.ca>
>> 
>>> On Jan 12, 2023, at 11:57 PM, Eli Zaretskii <eliz <at> gnu.org> wrote:
>>> 
>>>> Cc: 60691 <at> debbugs.gnu.org, juri <at> linkov.net
>>>> Date: Fri, 13 Jan 2023 01:40:56 +0200
>>>> From: Dmitry Gutov <dgutov <at> yandex.ru>
>>>> 
>>>> Managed to reproduce this after running the test in a couple of 
>>>> different files.
>>>> 
>>>> But 'M-x memory-usage' says no such command, and 'M-x memory-report' 
>>>> ends up with this error:
>>>> 
>>>> Debugger entered--Lisp error: (wrong-type-argument number-or-marker-p nil)
>>>>  memory-report--gc-elem(nil strings)
>>>>  memory-report--garbage-collect()
>>>>  memory-report()
>>> 
>>> This means GC is disabled in this session at the time you invoke
>>> memory-report.  Which shouldn't happen, of course.  It sounds like
>>> your pure Lisp storage overflowed, and that disabled GC.
>>> 
>>> And I think I see the problem: we use build_pure_c_string in treesit.c
>>> in places that we shouldn't.
>>> 
>>> Yuan, build_pure_c_string should only be used in places such as
>>> syms_of_treesit, which are called just once, during dumping.  Look at
>>> all the other calls to this function in the sources, and you will see
>>> it.  In all other cases, you should do one of the following:
>>> 
>>> . for strings whose text is fixed, define a variable, give it the
>>>   value in syms_of_treesit using build_pure_c_string, then use that
>>>   variable elsewhere in the source
>> 
>> Can I define a bunch of static C variables and initialize them in syms_of_treesit, or they have to be all Lisp variables? Eg,
>> 
>> static Lisp_Object TREESIT_STAR;
>> 
>> ...
>> 
>> void
>> syms_of_treesit (void)
>> {
>> ...
>> TREESIT_STAR = build_pure_c_string ("*");
>> ...
>> }
> 
> Yes, of course.  Look, for example, how coding.c does that:
> 
>  /* A string that serves as name of the reusable work buffer, and as base
>     name of temporary work buffers used for code-conversion operations.  */
>  static Lisp_Object Vcode_conversion_workbuf_name;
>  [...]
>  void
>  syms_of_coding (void)
>  {
>  [...]
>    staticpro (&Vcode_conversion_workbuf_name);
>    Vcode_conversion_workbuf_name = build_pure_c_string (" *code-conversion-work*");
> 
> But please keep the convention of naming such variables Vsome_thing,
> both regarding the "V" and the fact that the name is otherwise
> lower-case.

Thanks, I pushed a fix for it. I also used intern_c_string in some places like these:

intern_c_string (":?”)
intern_c_string (":*")

I want to change them to use DEFSYM, but what should be the c name for them?

Yuan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Sat, 14 Jan 2023 07:30:02 GMT) Full text and rfc822 format available.

Message #44 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Yuan Fu <casouri <at> gmail.com>
Cc: juri <at> linkov.net, 60691 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca,
 dgutov <at> yandex.ru
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Sat, 14 Jan 2023 09:29:08 +0200
> From: Yuan Fu <casouri <at> gmail.com>
> Date: Fri, 13 Jan 2023 19:48:40 -0800
> Cc: dgutov <at> yandex.ru,
>  60691 <at> debbugs.gnu.org,
>  juri <at> linkov.net,
>  monnier <at> iro.umontreal.ca
> 
> Thanks, I pushed a fix for it. I also used intern_c_string in some places like these:
> 
> intern_c_string (":?”)
> intern_c_string (":*")
> 
> I want to change them to use DEFSYM, but what should be the c name for them?

Yes, DEFSYM is better in such cases.  The C name can be QCquestion and
QCasterix, for example.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Sat, 14 Jan 2023 07:52:01 GMT) Full text and rfc822 format available.

Message #47 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: juri <at> linkov.net, 60691 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca,
 dgutov <at> yandex.ru
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Fri, 13 Jan 2023 23:51:05 -0800

> On Jan 13, 2023, at 11:29 PM, Eli Zaretskii <eliz <at> gnu.org> wrote:
> 
>> From: Yuan Fu <casouri <at> gmail.com>
>> Date: Fri, 13 Jan 2023 19:48:40 -0800
>> Cc: dgutov <at> yandex.ru,
>> 60691 <at> debbugs.gnu.org,
>> juri <at> linkov.net,
>> monnier <at> iro.umontreal.ca
>> 
>> Thanks, I pushed a fix for it. I also used intern_c_string in some places like these:
>> 
>> intern_c_string (":?”)
>> intern_c_string (":*")
>> 
>> I want to change them to use DEFSYM, but what should be the c name for them?
> 
> Yes, DEFSYM is better in such cases.  The C name can be QCquestion and
> QCasterix, for example.

My worry is that they will conflict with, eg, symbol `question’ and `asterix’, if someone ever defines them in the C codebase. Is that not possible?

Yuan



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Sat, 14 Jan 2023 08:02:02 GMT) Full text and rfc822 format available.

Message #50 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Yuan Fu <casouri <at> gmail.com>
Cc: juri <at> linkov.net, 60691 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca,
 dgutov <at> yandex.ru
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Sat, 14 Jan 2023 10:01:12 +0200
> From: Yuan Fu <casouri <at> gmail.com>
> Date: Fri, 13 Jan 2023 23:51:05 -0800
> Cc: dgutov <at> yandex.ru,
>  60691 <at> debbugs.gnu.org,
>  juri <at> linkov.net,
>  monnier <at> iro.umontreal.ca
> 
> > Yes, DEFSYM is better in such cases.  The C name can be QCquestion and
> > QCasterix, for example.
> 
> My worry is that they will conflict with, eg, symbol `question’ and `asterix’, if someone ever defines them in the C codebase. Is that not possible?

It's possible, but how can a symbol conflict in that case? it will
just be reused.

But if you want to have treesit-specific symbols, you can use names
like QCasterix_treesit.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Sat, 14 Jan 2023 08:47:02 GMT) Full text and rfc822 format available.

Message #53 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Yuan Fu <casouri <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, dgutov <at> yandex.ru, 60691 <at> debbugs.gnu.org,
 monnier <at> iro.umontreal.ca, juri <at> linkov.net
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Sat, 14 Jan 2023 09:46:45 +0100
On Jan 13 2023, Yuan Fu wrote:

>> On Jan 13, 2023, at 11:29 PM, Eli Zaretskii <eliz <at> gnu.org> wrote:
>> 
>>> From: Yuan Fu <casouri <at> gmail.com>
>>> Date: Fri, 13 Jan 2023 19:48:40 -0800
>>> Cc: dgutov <at> yandex.ru,
>>> 60691 <at> debbugs.gnu.org,
>>> juri <at> linkov.net,
>>> monnier <at> iro.umontreal.ca
>>> 
>>> Thanks, I pushed a fix for it. I also used intern_c_string in some places like these:
>>> 
>>> intern_c_string (":?”)
>>> intern_c_string (":*")
>>> 
>>> I want to change them to use DEFSYM, but what should be the c name for them?
>> 
>> Yes, DEFSYM is better in such cases.  The C name can be QCquestion and
>> QCasterix, for example.
>
> My worry is that they will conflict with, eg, symbol `question’ and `asterix’, if someone ever defines them in the C codebase. Is that not possible?

The C name of the symbol `question' would be Qquestion, without the C
(which stands for the `:' prefix).

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Sat, 14 Jan 2023 23:05:01 GMT) Full text and rfc822 format available.

Message #56 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, dgutov <at> yandex.ru, 60691 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, Juri Linkov <juri <at> linkov.net>
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Sat, 14 Jan 2023 15:03:58 -0800

> On Jan 14, 2023, at 12:46 AM, Andreas Schwab <schwab <at> linux-m68k.org> wrote:
> 
> On Jan 13 2023, Yuan Fu wrote:
> 
>>> On Jan 13, 2023, at 11:29 PM, Eli Zaretskii <eliz <at> gnu.org> wrote:
>>> 
>>>> From: Yuan Fu <casouri <at> gmail.com>
>>>> Date: Fri, 13 Jan 2023 19:48:40 -0800
>>>> Cc: dgutov <at> yandex.ru,
>>>> 60691 <at> debbugs.gnu.org,
>>>> juri <at> linkov.net,
>>>> monnier <at> iro.umontreal.ca
>>>> 
>>>> Thanks, I pushed a fix for it. I also used intern_c_string in some places like these:
>>>> 
>>>> intern_c_string (":?”)
>>>> intern_c_string (":*")
>>>> 
>>>> I want to change them to use DEFSYM, but what should be the c name for them?
>>> 
>>> Yes, DEFSYM is better in such cases.  The C name can be QCquestion and
>>> QCasterix, for example.
>> 
>> My worry is that they will conflict with, eg, symbol `question’ and `asterix’, if someone ever defines them in the C codebase. Is that not possible?
> 
> The C name of the symbol `question' would be Qquestion, without the C
> (which stands for the `:' prefix).

Sorry, I meant `:question’, and `:asterix’. 

Yuan



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Wed, 18 Jan 2023 06:51:01 GMT) Full text and rfc822 format available.

Message #59 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 60691 <at> debbugs.gnu.org, juri <at> linkov.net
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Tue, 17 Jan 2023 22:50:12 -0800
Yuan Fu <casouri <at> gmail.com> writes:

> Dmitry Gutov <dgutov <at> yandex.ru> writes:
>
>> Yuan? Just making sure you got this message.
>
> Sorry for the delay :-)
>
>> On 10/01/2023 16:10, Dmitry Gutov wrote:
>>> Perhaps Yuan has some further ideas. There are some strong oddities here:
>>> - Some time into debugging and repeating the benchmark again and
>>> again, I get the "Pure Lisp storage overflowed" message. Just once
>>> per Emacs session. It doesn't seem to change much, so it might be
>>> unimportant.
>
> That sounds like 60653. The next time you encounter it, could you record
> the output of M-x memory-usage and M-x memory-report? 
>
>>> - The profiler output looks like this:
>>>    18050  75%                    -
>>> font-lock-fontify-syntactically-region
>>>    15686  65%                     - treesit-font-lock-fontify-region
>>>     3738  15% treesit--children-covering-range-recurse
>>>      188   0%                        treesit-fontify-with-override
>>> - When running the benchmark for the first time in a buffer (such as
>>> ruby.rb), the variable treesit--font-lock-fast-mode is usually
>>> changed to t. In one Emacs session, after I changed it to nil and
>>> re-ran the benchmark, the variable stayed nil, and the benchmark ran
>>> much faster (like 10s vs 36s).
>>> In the next session, after I restarted Emacs, that didn't happen: it
>>> always stayed at t, even if I reset it to nil between runs. But if I
>>> comment out the block in treesit-font-lock-fontify-region that uses
>>> it
>>>      ;; (when treesit--font-lock-fast-mode
>>>      ;;   (setq nodes (treesit--children-covering-range-recurse
>>>      ;;                (car nodes) start end (* 4 jit-lock-chunk-size))))
>>> and evaluate the defun, the benchmark runs much faster again: 11s.
>>> (But then I brought it all back, and re-ran the tests, and the
>>> variable stayed nil that time around; to sum up: the way it's turned
>>> on is unstable.)
>>> Should treesit--font-lock-fast-mode be locally bound inside that
>>> function, so that it's reset between chunks? Or maybe the condition
>>> for its enabling should be tweaked? E.g. I don't think there are any
>>> particularly large or deep nodes in ruby.rb's parse tree. It's a
>>> very shallow file.
>
> Yeah that is a not-very-clever hack. I’ve got an idea: I can add a C
> function that checks the maximum depth of a parse tree and the maximum
> node span, and turn on the fast-mode if the depth is too large or a node
> is too wide. And we do that check once before doing any fontification.
>
> I’ll report back once I add it.

I wrote that function. But I didn’t end up using it. Instead I added a
"grace count", so that the query time has to be longer than the
threshold 5 times before we switch on the fast mode instead of 1.

My main worry is that simply looking at the parse tree would not catch
all the case where there will be expensive queries.

Could you try the latest commit and see if the fast mode still switches
on when it shouldn’t?

Yuan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Thu, 19 Jan 2023 18:29:01 GMT) Full text and rfc822 format available.

Message #62 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 60691 <at> debbugs.gnu.org, juri <at> linkov.net
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Thu, 19 Jan 2023 20:28:41 +0200
Hi Yuan,

On 18/01/2023 08:50, Yuan Fu wrote:
>>>> Should treesit--font-lock-fast-mode be locally bound inside that
>>>> function, so that it's reset between chunks? Or maybe the condition
>>>> for its enabling should be tweaked? E.g. I don't think there are any
>>>> particularly large or deep nodes in ruby.rb's parse tree. It's a
>>>> very shallow file.
>>
>> Yeah that is a not-very-clever hack. I’ve got an idea: I can add a C
>> function that checks the maximum depth of a parse tree and the maximum
>> node span, and turn on the fast-mode if the depth is too large or a node
>> is too wide. And we do that check once before doing any fontification.
>>
>> I’ll report back once I add it.
> 
> I wrote that function. But I didn’t end up using it. Instead I added a
> "grace count", so that the query time has to be longer than the
> threshold 5 times before we switch on the fast mode instead of 1.
> 
> My main worry is that simply looking at the parse tree would not catch
> all the case where there will be expensive queries.

That might be true, but a criterion that doesn't specify conditions 
exactly can give no guarantee against false positives.

> Could you try the latest commit and see if the fast mode still switches
> on when it shouldn’t?

At first it seemed to help, but then I switched the major mode a couple 
more times, and ran the benchmark twice more, and the "fast mode" 
switched on again.

Which seems to make sense: there is no resetting the counter, right?

So if previously it happened once somehow during a certain scenario, now 
I have to repeat the same scenario 4 times, and the condition is met.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Fri, 20 Jan 2023 22:25:02 GMT) Full text and rfc822 format available.

Message #65 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 60691 <at> debbugs.gnu.org, juri <at> linkov.net
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Fri, 20 Jan 2023 14:24:28 -0800

> On Jan 19, 2023, at 10:28 AM, Dmitry Gutov <dgutov <at> yandex.ru> wrote:
> 
> Hi Yuan,
> 
> On 18/01/2023 08:50, Yuan Fu wrote:
>>>>> Should treesit--font-lock-fast-mode be locally bound inside that
>>>>> function, so that it's reset between chunks? Or maybe the condition
>>>>> for its enabling should be tweaked? E.g. I don't think there are any
>>>>> particularly large or deep nodes in ruby.rb's parse tree. It's a
>>>>> very shallow file.
>>> 
>>> Yeah that is a not-very-clever hack. I’ve got an idea: I can add a C
>>> function that checks the maximum depth of a parse tree and the maximum
>>> node span, and turn on the fast-mode if the depth is too large or a node
>>> is too wide. And we do that check once before doing any fontification.
>>> 
>>> I’ll report back once I add it.
>> I wrote that function. But I didn’t end up using it. Instead I added a
>> "grace count", so that the query time has to be longer than the
>> threshold 5 times before we switch on the fast mode instead of 1.
>> My main worry is that simply looking at the parse tree would not catch
>> all the case where there will be expensive queries.
> 
> That might be true, but a criterion that doesn't specify conditions exactly can give no guarantee against false positives.

The condition is “query is (consistently) slow”, that’s why I thought measuring the time is the most direct way.

> 
>> Could you try the latest commit and see if the fast mode still switches
>> on when it shouldn’t?
> 
> At first it seemed to help, but then I switched the major mode a couple more times, and ran the benchmark twice more, and the "fast mode" switched on again.
> 
> Which seems to make sense: there is no resetting the counter, right?
> 
> So if previously it happened once somehow during a certain scenario, now I have to repeat the same scenario 4 times, and the condition is met.

I was hoping that the scenario only happen once, oh well :-) I’ll change the decision based on analyzing the tree’s dimension: too deep or too wide activates the fast mode. Let’s see how it works.

Yuan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Sun, 22 Jan 2023 02:02:02 GMT) Full text and rfc822 format available.

Message #68 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 60691 <at> debbugs.gnu.org, juri <at> linkov.net
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Sun, 22 Jan 2023 04:01:38 +0200
On 21/01/2023 00:24, Yuan Fu wrote:
> 
> 
>> On Jan 19, 2023, at 10:28 AM, Dmitry Gutov <dgutov <at> yandex.ru> wrote:
>>
>> Hi Yuan,
>>
>> On 18/01/2023 08:50, Yuan Fu wrote:
>>>>>> Should treesit--font-lock-fast-mode be locally bound inside that
>>>>>> function, so that it's reset between chunks? Or maybe the condition
>>>>>> for its enabling should be tweaked? E.g. I don't think there are any
>>>>>> particularly large or deep nodes in ruby.rb's parse tree. It's a
>>>>>> very shallow file.
>>>>
>>>> Yeah that is a not-very-clever hack. I’ve got an idea: I can add a C
>>>> function that checks the maximum depth of a parse tree and the maximum
>>>> node span, and turn on the fast-mode if the depth is too large or a node
>>>> is too wide. And we do that check once before doing any fontification.
>>>>
>>>> I’ll report back once I add it.
>>> I wrote that function. But I didn’t end up using it. Instead I added a
>>> "grace count", so that the query time has to be longer than the
>>> threshold 5 times before we switch on the fast mode instead of 1.
>>> My main worry is that simply looking at the parse tree would not catch
>>> all the case where there will be expensive queries.
>>
>> That might be true, but a criterion that doesn't specify conditions exactly can give no guarantee against false positives.
> 
> The condition is “query is (consistently) slow”, that’s why I thought measuring the time is the most direct way.

The benchmark itself might be artificial, in that it's measuring the 
font-lock of a specific buffer, in whole, for 1000 iterations. But Juri 
must have come up with the original report based on real usage scenario.

OTOH, the scenario which it might correspond to, is used typing in the 
same buffer for a long time (triggering thousands of refontifications, 
possibly partial ones). I don't know if it's feasible to try to 
reproduce it specifically. But, again, anything that can happen once can 
happen 4 more times.

>>> Could you try the latest commit and see if the fast mode still switches
>>> on when it shouldn’t?
>>
>> At first it seemed to help, but then I switched the major mode a couple more times, and ran the benchmark twice more, and the "fast mode" switched on again.
>>
>> Which seems to make sense: there is no resetting the counter, right?
>>
>> So if previously it happened once somehow during a certain scenario, now I have to repeat the same scenario 4 times, and the condition is met.
> 
> I was hoping that the scenario only happen once, oh well :-) I’ll change the decision based on analyzing the tree’s dimension: too deep or too wide activates the fast mode. Let’s see how it works.

Thank you, let me know when it's time to test again.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Sun, 29 Jan 2023 08:26:02 GMT) Full text and rfc822 format available.

Message #71 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 60691 <at> debbugs.gnu.org, juri <at> linkov.net
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Sun, 29 Jan 2023 00:25:20 -0800
Dmitry Gutov <dgutov <at> yandex.ru> writes:

> On 21/01/2023 00:24, Yuan Fu wrote:
>> 
>>> On Jan 19, 2023, at 10:28 AM, Dmitry Gutov <dgutov <at> yandex.ru> wrote:
>>>
>>> Hi Yuan,
>>>
>>> On 18/01/2023 08:50, Yuan Fu wrote:
>>>>>>> Should treesit--font-lock-fast-mode be locally bound inside that
>>>>>>> function, so that it's reset between chunks? Or maybe the condition
>>>>>>> for its enabling should be tweaked? E.g. I don't think there are any
>>>>>>> particularly large or deep nodes in ruby.rb's parse tree. It's a
>>>>>>> very shallow file.
>>>>>
>>>>> Yeah that is a not-very-clever hack. I’ve got an idea: I can add a C
>>>>> function that checks the maximum depth of a parse tree and the maximum
>>>>> node span, and turn on the fast-mode if the depth is too large or a node
>>>>> is too wide. And we do that check once before doing any fontification.
>>>>>
>>>>> I’ll report back once I add it.
>>>> I wrote that function. But I didn’t end up using it. Instead I added a
>>>> "grace count", so that the query time has to be longer than the
>>>> threshold 5 times before we switch on the fast mode instead of 1.
>>>> My main worry is that simply looking at the parse tree would not catch
>>>> all the case where there will be expensive queries.
>>>
>>> That might be true, but a criterion that doesn't specify conditions exactly can give no guarantee against false positives.
>> The condition is “query is (consistently) slow”, that’s why I
>> thought measuring the time is the most direct way.
>
> The benchmark itself might be artificial, in that it's measuring the
> font-lock of a specific buffer, in whole, for 1000 iterations. But
> Juri must have come up with the original report based on real usage
> scenario.
>
> OTOH, the scenario which it might correspond to, is used typing in the
> same buffer for a long time (triggering thousands of refontifications,
> possibly partial ones). I don't know if it's feasible to try to
> reproduce it specifically. But, again, anything that can happen once
> can happen 4 more times.
>
>>>> Could you try the latest commit and see if the fast mode still switches
>>>> on when it shouldn’t?
>>>
>>> At first it seemed to help, but then I switched the major mode a
>>> couple more times, and ran the benchmark twice more, and the "fast
>>> mode" switched on again.
>>>
>>> Which seems to make sense: there is no resetting the counter, right?
>>>
>>> So if previously it happened once somehow during a certain scenario, now I have to repeat the same scenario 4 times, and the condition is met.
>> I was hoping that the scenario only happen once, oh well :-) I’ll
>> change the decision based on analyzing the tree’s dimension: too
>> deep or too wide activates the fast mode. Let’s see how it works.
>
> Thank you, let me know when it's time to test again.

Sorry for the delay. Now treesit-font-lock-fontify-region uses
treesit-subtree-stat to determine whether to enable the "fast mode". Now
it should be impossible to activate the fast mode on moderately sized
buffers.

Yuan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Sun, 29 Jan 2023 23:08:01 GMT) Full text and rfc822 format available.

Message #74 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 60691 <at> debbugs.gnu.org, juri <at> linkov.net
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Mon, 30 Jan 2023 01:07:10 +0200
Hi Yuan,

On 29/01/2023 10:25, Yuan Fu wrote:

>>>> So if previously it happened once somehow during a certain scenario, now I have to repeat the same scenario 4 times, and the condition is met.
>>> I was hoping that the scenario only happen once, oh well :-) I’ll
>>> change the decision based on analyzing the tree’s dimension: too
>>> deep or too wide activates the fast mode. Let’s see how it works.
>>
>> Thank you, let me know when it's time to test again.
> 
> Sorry for the delay. Now treesit-font-lock-fontify-region uses
> treesit-subtree-stat to determine whether to enable the "fast mode". Now
> it should be impossible to activate the fast mode on moderately sized
> buffers.

Thank you, it seems to work just fine in my scenario. And 
treesit-subtree-stat makes sense.

I have a few more questions about the current strategy, though.

IIUC, we only do the treesit--font-lock-fast-mode test once in 
treesit-font-lock-fontify-region, and then use the detected value for 
the whole later life of the buffer. Is that right?

What if the buffer didn't originally have the problematic error nodes we 
are guarding from, and then later the user wrote enough code to have at 
least one of them? If they didn't close Emacs, or revert the buffer, our 
logic still wouldn't use the "fast node", would it?

Or vice versa: if the buffer started out with error nodes, and 
consequently, "fast mode", but then the user has edited it so that those 
error nodes disappeared, shouldn't the buffer stop using the "fast mode"?

From my measurements, in ruby-mode, at least treesit-subtree-stat is 
20-40x faster than refontifying the whole buffer. So one possible 
strategy would be to repeat the test every time. I'm not sure it's fast 
enough in the "problem" buffers, though, and I don't have any to test.

In those I did test, though, it takes ~1 ms.

But we could repeat the test only once every couple of seconds and/or 
after the buffer has changed again. That would hopefully make it a 
non-bottleneck in all cases.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Sun, 29 Jan 2023 23:25:02 GMT) Full text and rfc822 format available.

Message #77 received at 60691 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 60691 <at> debbugs.gnu.org, juri <at> linkov.net
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Sun, 29 Jan 2023 15:23:34 -0800

> On Jan 29, 2023, at 3:07 PM, Dmitry Gutov <dgutov <at> yandex.ru> wrote:
> 
> Hi Yuan,
> 
> On 29/01/2023 10:25, Yuan Fu wrote:
> 
>>>>> So if previously it happened once somehow during a certain scenario, now I have to repeat the same scenario 4 times, and the condition is met.
>>>> I was hoping that the scenario only happen once, oh well :-) I’ll
>>>> change the decision based on analyzing the tree’s dimension: too
>>>> deep or too wide activates the fast mode. Let’s see how it works.
>>> 
>>> Thank you, let me know when it's time to test again.
>> Sorry for the delay. Now treesit-font-lock-fontify-region uses
>> treesit-subtree-stat to determine whether to enable the "fast mode". Now
>> it should be impossible to activate the fast mode on moderately sized
>> buffers.
> 
> Thank you, it seems to work just fine in my scenario. And treesit-subtree-stat makes sense.
> 
> I have a few more questions about the current strategy, though.
> 
> IIUC, we only do the treesit--font-lock-fast-mode test once in treesit-font-lock-fontify-region, and then use the detected value for the whole later life of the buffer. Is that right?
> 
> What if the buffer didn't originally have the problematic error nodes we are guarding from, and then later the user wrote enough code to have at least one of them? If they didn't close Emacs, or revert the buffer, our logic still wouldn't use the "fast node", would it?
> 
> Or vice versa: if the buffer started out with error nodes, and consequently, "fast mode", but then the user has edited it so that those error nodes disappeared, shouldn't the buffer stop using the "fast mode"?
> 
> From my measurements, in ruby-mode, at least treesit-subtree-stat is 20-40x faster than refontifying the whole buffer. So one possible strategy would be to repeat the test every time. I'm not sure it's fast enough in the "problem" buffers, though, and I don't have any to test.
> 
> In those I did test, though, it takes ~1 ms.
> 
> But we could repeat the test only once every couple of seconds and/or after the buffer has changed again. That would hopefully make it a non-bottleneck in all cases.

I should mention this in the comments, but the fast mode is only for very rare cases, where the file is mechanically generated and has some peculiarities that causes tree-sitter to work poorly. If the file is hand-written and “normal”, even huge files like xdisp.c is well below the bar. Therefore I don’t think “crossing the line” will realistically happen when editing source files.

Here is the stats of two “problematic files”, named packet and dec_mask, comparing to xdisp.c:

;;           max-depth max-width count
;; cut-off   100       4000
;; packet   (98159     46581 1895137)
;; dec mask (3         64301 283995)
;; xdisp.c  (29        985   218971)

I’d say that any regular source file, even mechanically generated, wouldn’t go beyond ~50 levels in depth, and hand-written files should never has a node that has 4000+ direct children in the parse tree.

Yuan



Reply sent to Dmitry Gutov <dgutov <at> yandex.ru>:
You have taken responsibility. (Mon, 30 Jan 2023 00:16:02 GMT) Full text and rfc822 format available.

Notification sent to Juri Linkov <juri <at> linkov.net>:
bug acknowledged by developer. (Mon, 30 Jan 2023 00:16:02 GMT) Full text and rfc822 format available.

Message #82 received at 60691-done <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Yuan Fu <casouri <at> gmail.com>
Cc: juri <at> linkov.net, 60691-done <at> debbugs.gnu.org
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Mon, 30 Jan 2023 02:15:44 +0200
On 30/01/2023 01:23, Yuan Fu wrote:
> 
>> On Jan 29, 2023, at 3:07 PM, Dmitry Gutov<dgutov <at> yandex.ru>  wrote:
>>
>> Hi Yuan,
>>
>> On 29/01/2023 10:25, Yuan Fu wrote:
>>
>>>>>> So if previously it happened once somehow during a certain scenario, now I have to repeat the same scenario 4 times, and the condition is met.
>>>>> I was hoping that the scenario only happen once, oh well 😄 I’ll
>>>>> change the decision based on analyzing the tree’s dimension: too
>>>>> deep or too wide activates the fast mode. Let’s see how it works.
>>>> Thank you, let me know when it's time to test again.
>>> Sorry for the delay. Now treesit-font-lock-fontify-region uses
>>> treesit-subtree-stat to determine whether to enable the "fast mode". Now
>>> it should be impossible to activate the fast mode on moderately sized
>>> buffers.
>> Thank you, it seems to work just fine in my scenario. And treesit-subtree-stat makes sense.
>>
>> I have a few more questions about the current strategy, though.
>>
>> IIUC, we only do the treesit--font-lock-fast-mode test once in treesit-font-lock-fontify-region, and then use the detected value for the whole later life of the buffer. Is that right?
>>
>> What if the buffer didn't originally have the problematic error nodes we are guarding from, and then later the user wrote enough code to have at least one of them? If they didn't close Emacs, or revert the buffer, our logic still wouldn't use the "fast node", would it?
>>
>> Or vice versa: if the buffer started out with error nodes, and consequently, "fast mode", but then the user has edited it so that those error nodes disappeared, shouldn't the buffer stop using the "fast mode"?
>>
>>  From my measurements, in ruby-mode, at least treesit-subtree-stat is 20-40x faster than refontifying the whole buffer. So one possible strategy would be to repeat the test every time. I'm not sure it's fast enough in the "problem" buffers, though, and I don't have any to test.
>>
>> In those I did test, though, it takes ~1 ms.
>>
>> But we could repeat the test only once every couple of seconds and/or after the buffer has changed again. That would hopefully make it a non-bottleneck in all cases.
> I should mention this in the comments, but the fast mode is only for very rare cases, where the file is mechanically generated and has some peculiarities that causes tree-sitter to work poorly. If the file is hand-written and “normal”, even huge files like xdisp.c is well below the bar. Therefore I don’t think “crossing the line” will realistically happen when editing source files.
> 
> Here is the stats of two “problematic files”, named packet and dec_mask, comparing to xdisp.c:
> 
> ;;           max-depth max-width count
> ;; cut-off   100       4000
> ;; packet   (98159     46581 1895137)
> ;; dec mask (3         64301 283995)
> ;; xdisp.c  (29        985   218971)
> 
> I’d say that any regular source file, even mechanically generated, wouldn’t go beyond ~50 levels in depth, and hand-written files should never has a node that has 4000+ direct children in the parse tree.

Oh, thanks for the explanation. Then the current strategy makes sense.

Is xdisp.c absolutely the largest C file in your experience?

According to the above numbers, a file that's only 4x as large could hit 
our current cutoff.

Though, TBH, maybe some extreme files do, and they have font-lock 
performance reduced somewhat. That's not the end of the world, and it 
shouldn't make a difference for the original scenario (diff-syntax 
fontification).

Either way, I'm closing this report. Thank you for your help.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Wed, 01 Feb 2023 06:55:02 GMT) Full text and rfc822 format available.

Message #85 received at 60691-done <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: juri <at> linkov.net, 60691-done <at> debbugs.gnu.org
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Tue, 31 Jan 2023 21:26:11 -0800
>> I should mention this in the comments, but the fast mode is only for very rare cases, where the file is mechanically generated and has some peculiarities that causes tree-sitter to work poorly. If the file is hand-written and “normal”, even huge files like xdisp.c is well below the bar. Therefore I don’t think “crossing the line” will realistically happen when editing source files.
>> Here is the stats of two “problematic files”, named packet and dec_mask, comparing to xdisp.c:
>> ;;           max-depth max-width count
>> ;; cut-off   100       4000
>> ;; packet   (98159     46581 1895137)
>> ;; dec mask (3         64301 283995)
>> ;; xdisp.c  (29        985   218971)
>> I’d say that any regular source file, even mechanically generated, wouldn’t go beyond ~50 levels in depth, and hand-written files should never has a node that has 4000+ direct children in the parse tree.
> 
> Oh, thanks for the explanation. Then the current strategy makes sense.
> 
> Is xdisp.c absolutely the largest C file in your experience?
> 
> According to the above numbers, a file that's only 4x as large could hit our current cutoff.

I don’t think these stats increase linearly as the file size increases. Even if there is a file that has a node with 3999 direct children, and the developer adds another one, I’d say it’s better not to turn on “fast mode” immediately.

Yuan



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#60691; Package emacs. (Wed, 01 Feb 2023 15:12:01 GMT) Full text and rfc822 format available.

Message #88 received at 60691-done <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Yuan Fu <casouri <at> gmail.com>
Cc: 60691-done <at> debbugs.gnu.org, juri <at> linkov.net
Subject: Re: bug#60691: 29.0.60; Slow tree-sitter font-lock in ruby-ts-mode
Date: Wed, 1 Feb 2023 17:11:16 +0200
On 01/02/2023 07:26, Yuan Fu wrote:
>>> I should mention this in the comments, but the fast mode is only for very rare cases, where the file is mechanically generated and has some peculiarities that causes tree-sitter to work poorly. If the file is hand-written and “normal”, even huge files like xdisp.c is well below the bar. Therefore I don’t think “crossing the line” will realistically happen when editing source files.
>>> Here is the stats of two “problematic files”, named packet and dec_mask, comparing to xdisp.c:
>>> ;;           max-depth max-width count
>>> ;; cut-off   100       4000
>>> ;; packet   (98159     46581 1895137)
>>> ;; dec mask (3         64301 283995)
>>> ;; xdisp.c  (29        985   218971)
>>> I’d say that any regular source file, even mechanically generated, wouldn’t go beyond ~50 levels in depth, and hand-written files should never has a node that has 4000+ direct children in the parse tree.
>> Oh, thanks for the explanation. Then the current strategy makes sense.
>>
>> Is xdisp.c absolutely the largest C file in your experience?
>>
>> According to the above numbers, a file that's only 4x as large could hit our current cutoff.
> I don’t think these stats increase linearly as the file size increases. Even if there is a file that has a node with 3999 direct children, and the developer adds another one, I’d say it’s better not to turn on “fast mode” immediately.

I see your point.

In the previous message I was talking about a different scenario: when a 
project has a file 4x the size of xdisp.c, and the user just opens it. I 
suspect it's not great to have "fast mode" enabled in that case? Like, 
false positive.

Anyway, this is a very theoretical concern on my part.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 02 Mar 2023 12:24:08 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 46 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.