GNU bug report logs - #63622
lisp/progmodes/python.el: performance regression introduced by multiline font-lock

Package: emacs;

Reported by: Tom Gillespie <tgbugs <at> gmail.com>

Date: Sun, 21 May 2023 03:15:02 UTC

Severity: normal

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 63622 in the body.
You can then email your comments to 63622 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#63622; Package emacs. (Sun, 21 May 2023 03:15:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Tom Gillespie <tgbugs <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 21 May 2023 03:15:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Tom Gillespie <tgbugs <at> gmail.com>
To: Emacs Bug Report <bug-gnu-emacs <at> gnu.org>
Subject: lisp/progmodes/python.el: performance regression introduced by
 multiline font-lock
Date: Sat, 20 May 2023 20:14:19 -0700

The changes in 4915ca5dd4245a909c046e6691e8d4a1919890c8 have
introduced a significant performance regression when editing python
code that contains dictionary structures across multiple lines.

The current behavior makes Emacs unusable when editing python
dictionaries with more than 20 or so lines. I would suggest reverting
the commit until the performance issue can be addressed.

If I had to guess, this is probably being caused by a double
zero-or-more pattern (possibly implicit) in the new regexps that were
added/changed.

The literal dictionary below is sufficient to demonstrate the issue
and if you bisect and compare the behavior to the immediately prior
commit 31e32212670f5774a6dbc0debac8854fa01d8f92 the difference is
clear. Open the file and hit enter a couple of times and the lag is
obvious (if you can't detect the issue try doubling the number of
lines at the deepest nesting level from 25 to 50).

By profiling and varying the number of repeated lines (e.g. by
doubling them) it appears that the issue is some lurking quadratic
behavior in syntax-ppss as a result of the changes in 4914ca. In my
testing 25, 50, and 100 lines take 100ms, 800ms, and 5 seconds
respectively to insert a new line while the cursor is inside the outer
most paren.

Collapsing all the structures into one line hides the issue. The
longer each individual line the more rapid the slowdown.

The example below is not syntactically correct python, however it
highlights the issue in a way that is clearer than it would be
otherwise.

Examples that trigger the issue (repeat the 2nd line 50 or 100 times
to see the effect). Any combination of paren types will cause the
issue.  The closing paren does not have to be present and does not
prevent the issue.

#+begin_src python
[''
'' []
#+end_src
#+begin_src python
[''
[] ''
#+end_src
#+begin_src python
[''
'' {}
#+end_src
#+begin_src python
{''
'' ()
#+end_src
#+begin_src python
[""
'' []
#+end_src

Examples that are do not cause the issue.
#+begin_src python
[a
'' []
#+end_src
#+begin_src python
[''
'' a
#+end_src
#+begin_src python
[''
'' ''
#+end_src
#+begin_src python
[[]
[] []
#+end_src
#+begin_src python
[[]
[] ''
#+end_src

Example of syntactically correct python that causes the issue.
#+begin_src python
{'':
 {
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
  '': {'': ''},
 },
 '': ['']}
#+end_src

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63622; Package emacs. (Sun, 21 May 2023 04:55:02 GMT) Full text and rfc822 format available.

Message #8 received at 63622 <at> debbugs.gnu.org (full text, mbox):

From: Tom Gillespie <tgbugs <at> gmail.com>
To: 63622 <at> debbugs.gnu.org
Subject: source of problem identified to be python-font-lock-extend-region
Date: Sat, 20 May 2023 21:53:49 -0700

Despite my previous speculation about the regexp,
the issue is in python-font-lock-extend-region. When
replaced with a no-op the performance returns to normal.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63622; Package emacs. (Sun, 21 May 2023 06:09:02 GMT) Full text and rfc822 format available.

Message #11 received at 63622 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Tom Gillespie <tgbugs <at> gmail.com>, kobarity <kobarity <at> gmail.com>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 63622 <at> debbugs.gnu.org
Subject: Re: bug#63622: lisp/progmodes/python.el: performance regression
 introduced by multiline font-lock
Date: Sun, 21 May 2023 09:08:50 +0300

> From: Tom Gillespie <tgbugs <at> gmail.com>
> Date: Sat, 20 May 2023 20:14:19 -0700
> 
> The changes in 4915ca5dd4245a909c046e6691e8d4a1919890c8 have
> introduced a significant performance regression when editing python
> code that contains dictionary structures across multiple lines.
> 
> The current behavior makes Emacs unusable when editing python
> dictionaries with more than 20 or so lines. I would suggest reverting
> the commit until the performance issue can be addressed.

If the problem is so severe, I wonder how come this comes up only now,
9 months after those changes were installed.  It probably means these
cases are quite rare in practice.  Nevertheless, it would be good to
solve them, of course.

FWIW, python-ts-mode doesn't show performance issues in the examples
you posted.

> The literal dictionary below is sufficient to demonstrate the issue
> and if you bisect and compare the behavior to the immediately prior
> commit 31e32212670f5774a6dbc0debac8854fa01d8f92 the difference is
> clear. Open the file and hit enter a couple of times and the lag is
> obvious (if you can't detect the issue try doubling the number of
> lines at the deepest nesting level from 25 to 50).
> 
> By profiling and varying the number of repeated lines (e.g. by
> doubling them) it appears that the issue is some lurking quadratic
> behavior in syntax-ppss as a result of the changes in 4914ca. In my
> testing 25, 50, and 100 lines take 100ms, 800ms, and 5 seconds
> respectively to insert a new line while the cursor is inside the outer
> most paren.
> 
> Collapsing all the structures into one line hides the issue. The
> longer each individual line the more rapid the slowdown.
> 
> The example below is not syntactically correct python, however it
> highlights the issue in a way that is clearer than it would be
> otherwise.

Any chance of your posting some real-life Python code where the issue
rears its head?  I mean, real-life code that makes sense, not just
syntactically correct code invented to make a point?

> Despite my previous speculation about the regexp,
> the issue is in python-font-lock-extend-region. When
> replaced with a no-op the performance returns to normal.

kobarity, could you please look into this ASAP?  Emacs 29.1 is in late
stages of pretest, and I'd like to have this resolved, one way or
another, soon enough.  TIA.

P.S.  Tom, please don't change the Subject when posting followups,
please use the same Subject for all your messages that discuss this
bug.

Also, for the record, please state which Emacs version are you using.
You didn't use "M-x report-emacs-bug" to submit the bug report, so
this and other important information is missing from your OP.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63622; Package emacs. (Sun, 21 May 2023 07:14:02 GMT) Full text and rfc822 format available.

Message #14 received at 63622 <at> debbugs.gnu.org (full text, mbox):

From: Tom Gillespie <tgbugs <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: kobarity <kobarity <at> gmail.com>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 63622 <at> debbugs.gnu.org
Subject: Re: bug#63622: lisp/progmodes/python.el: performance regression
 introduced by multiline font-lock
Date: Sun, 21 May 2023 00:13:36 -0700

> If the problem is so severe, I wonder how come this comes up only now,
> 9 months after those changes were installed.  It probably means these
> cases are quite rare in practice.  Nevertheless, it would be good to
> solve them, of course.

I suspect it is because there are 3 factors that have to be just right
to notice.

1 the opening paren and the quote must be immediately adjacent to get
exceptionally bad behavior, there is still some performance degradation
when there is separation, but it would be harder to notice.

2 a user would have to directly edit a dictionary literal with enough lines
to notice the slowdown.

3 assigning the dictionary to a variable mitigates the issue, so only a
dict that is not assigned results in the full slowdown.

> FWIW, python-ts-mode doesn't show performance issues in the examples
> you posted.

I would imagine so. I've continued trying to hunt down the source of the
issue, and it is triggered by setting python-font-lock-extend-region as
the font-lock-extend-after-change-region-function function for python
this is true in old versions of Emacs (e.g. 28.2) as well.

As far as I can tell the existing implementation for python font locking
has some quadratic behavior that is revealed when a region is extended
inside a nested dictionary with multiple lines.

> Any chance of your posting some real-life Python code where the issue
> rears its head?  I mean, real-life code that makes sense, not just
> syntactically correct code invented to make a point?

Yep, basically any nested dictionary literal with more than 15 lines is
affected. With a note that the issue is masked if there is an equal sign
(=) before the opening paren, which is the common case.

An example of the particular file that caused me to spot the issue:
https://github.com/tgbugs/pyontutils/blob/master/pyontutils/auth-config.py

> P.S.  Tom, please don't change the Subject when posting followups,
> please use the same Subject for all your messages that discuss this
> bug.

Ack, apologies. I will keep it the same in the future.

> Also, for the record, please state which Emacs version are you using.
> You didn't use "M-x report-emacs-bug" to submit the bug report, so
> this and other important information is missing from your OP.

Ok, I wasn't sure how to handle it in this case since I was able to
reproduce the issue in multiple versions.

For the record:
The version I spotted it on was the emacs-29 branch at
3bc5efb87e5ac9b7068e71307466b2d0220e92fb but everything
on emacs-29 after 4915ca5dd4245a909c046e6691e8d4a1919890c8
is affected (according to git bisect results). So 29.0.90 and 29.0.91
should be affected as well.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63622; Package emacs. (Sun, 21 May 2023 09:33:02 GMT) Full text and rfc822 format available.

Message #17 received at 63622 <at> debbugs.gnu.org (full text, mbox):

From: kobarity <kobarity <at> gmail.com>
To: Tom Gillespie <tgbugs <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 63622 <at> debbugs.gnu.org
Subject: Re: bug#63622: lisp/progmodes/python.el: performance regression
 introduced by multiline font-lock
Date: Sun, 21 May 2023 18:31:50 +0900

[Message part 1 (text/plain, inline)]

Tom Gillespie wrote:
> The changes in 4915ca5dd4245a909c046e6691e8d4a1919890c8 have
> introduced a significant performance regression when editing python
> code that contains dictionary structures across multiple lines.

Hi Tom and Eli,

Thanks for bringing this issue to my attention.

> As far as I can tell the existing implementation for python font locking
> has some quadratic behavior that is revealed when a region is extended
> inside a nested dictionary with multiple lines.

I agree.  It seems to me that it is not python-font-lock-extend-region
itself that is slow, but rather font-lock's processing of the area
extended by it.  So one workaround would be to limit the number of
lines to be extended, as in the attached patch.  If this limit is
exceeded, the area or the entire buffer must be font-locked manually
later.  What do you think of this idea?

Even if we adopt this idea, there remain several points to consider:

- How many lines are appropriate for the limit?
- Is it better to make the limit customizable?
- python-ts-mode should be excluded for this limit?

[0001-Workaround-performance-degradation-when-editing-mult.patch (application/octet-stream, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63622; Package emacs. (Sun, 21 May 2023 15:18:01 GMT) Full text and rfc822 format available.

Message #20 received at 63622 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: kobarity <kobarity <at> gmail.com>
Cc: Tom Gillespie <tgbugs <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>,
 63622 <at> debbugs.gnu.org
Subject: Re: bug#63622: lisp/progmodes/python.el: performance regression
 introduced by multiline font-lock
Date: Sun, 21 May 2023 11:16:13 -0400

> I agree.  It seems to me that it is not python-font-lock-extend-region
> itself that is slow, but rather font-lock's processing of the area
> extended by it.  So one workaround would be to limit the number of
> lines to be extended, as in the attached patch.  If this limit is
> exceeded, the area or the entire buffer must be font-locked manually
> later.  What do you think of this idea?

FWIW, I recommend against using
`font-lock-extend-after-change-region-function`.

E.g. in a case like `python-font-lock-assignment-statement-multiline-1`,
the current code may misfontify code because jit-lock may decide to first
call font-lock on a chunk that goes until:

    [
        a,
        b

and will call again font-lock later to fontify the rest:

    ] # (
        1,
        2
    )

and this can happen with no buffer modification at all (e.g. on the
initial fontification of a buffer).

You can use `font-lock-extend-region-functions` instead (which performs
the region-extension right before fontifying a chunk) to avoid this problem.
[ It won't help with the current performance issue, tho.  ]

`font-lock-extend-after-change-region-function` can also be costly when
a command makes many changes (since
`font-lock-extend-after-change-region-function` is called for every such
change rather than once at the end).
`font-lock-extend-region-functions` tends to be better behaved in this
respect (it's called once per chunk, and there's usually only
a single chunk (re)fontified per command, even after a command that makes
many changes),

One more thing: Tom mentioned a suspicion that the performance issue may
have to do with interaction with `syntax-ppss`.  This is odd, because
`syntax-ppss` and `syntax-propertize` should not be affected by
font-lock.


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63622; Package emacs. (Sun, 21 May 2023 15:45:02 GMT) Full text and rfc822 format available.

Message #23 received at 63622 <at> debbugs.gnu.org (full text, mbox):

From: kobarity <kobarity <at> gmail.com>
To: Tom Gillespie <tgbugs <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 63622 <at> debbugs.gnu.org
Subject: Re: bug#63622: lisp/progmodes/python.el: performance regression
 introduced by multiline font-lock
Date: Mon, 22 May 2023 00:44:03 +0900

[Message part 1 (text/plain, inline)]

I wrote:
> I agree.  It seems to me that it is not python-font-lock-extend-region
> itself that is slow, but rather font-lock's processing of the area
> extended by it.  So one workaround would be to limit the number of
> lines to be extended, as in the attached patch.  If this limit is
> exceeded, the area or the entire buffer must be font-locked manually
> later.  What do you think of this idea?

The cause of the slowdown seems to be in python-info-docstring-p.  So
another option would be to improve it.  Attached is a patch that
determines that if point is in parens except for "(", it is not a
docstring.  I can't think of a case where a docstring is in parens
except for "(".  However, usually more tests should be done.

[0001-Optimize-python-info-docstring-p.patch (application/octet-stream, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63622; Package emacs. (Mon, 22 May 2023 14:59:01 GMT) Full text and rfc822 format available.

Message #26 received at 63622 <at> debbugs.gnu.org (full text, mbox):

From: kobarity <kobarity <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Tom Gillespie <tgbugs <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>,
 63622 <at> debbugs.gnu.org
Subject: Re: bug#63622: lisp/progmodes/python.el: performance regression
 introduced by multiline font-lock
Date: Mon, 22 May 2023 23:58:37 +0900

[Message part 1 (text/plain, inline)]

Stefan Monnier wrote:
> FWIW, I recommend against using
> `font-lock-extend-after-change-region-function`.
> 
> You can use `font-lock-extend-region-functions` instead (which performs
> the region-extension right before fontifying a chunk) to avoid this problem.
> [ It won't help with the current performance issue, tho.  ]

Thank you for your advice.  Does the attached patch seem reasonable?

[0001-Use-font-lock-extend-region-functions-in-python-mode.patch (application/octet-stream, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63622; Package emacs. (Mon, 22 May 2023 15:09:01 GMT) Full text and rfc822 format available.

Message #29 received at 63622 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: kobarity <kobarity <at> gmail.com>
Cc: Tom Gillespie <tgbugs <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>,
 63622 <at> debbugs.gnu.org
Subject: Re: bug#63622: lisp/progmodes/python.el: performance regression
 introduced by multiline font-lock
Date: Mon, 22 May 2023 11:08:35 -0400

> @@ -869,18 +869,20 @@ python-font-lock-keywords
>  Which one will be chosen depends on the value of
>  `font-lock-maximum-decoration'.")
>  
> -(defun python-font-lock-extend-region (beg end _old-len)
> -  "Extend font-lock region given by BEG and END to statement boundaries."
> -  (save-excursion
> -    (save-match-data
> -      (goto-char beg)
> -      (python-nav-beginning-of-statement)
> -      (setq beg (point))
> -      (goto-char end)
> -      (python-nav-end-of-statement)
> -      (setq end (point))
> -      (cons beg end))))
> -
> +(defvar font-lock-beg)
> +(defvar font-lock-end)
> +(defun python-font-lock-extend-region ()
> +  "Extend font-lock region to statement boundaries."
> +  (let ((beg font-lock-beg)
> +        (end font-lock-end))
> +    (goto-char beg)
> +    (python-nav-beginning-of-statement)
> +    (setq font-lock-beg (point))
> +    (goto-char end)
> +    (python-nav-end-of-statement)
> +    (when (not (eobp)) (forward-char))
> +    (setq font-lock-end (point))
> +    (or (/= beg font-lock-beg) (/= end font-lock-end))))
>  
>  (defconst python-syntax-propertize-function
>    (syntax-propertize-rules

Looks fine to me (I assume you've checked that it behaves about as well
as the previous code).

> @@ -6708,8 +6710,8 @@ python-mode
>                  nil nil nil nil
>                  (font-lock-syntactic-face-function
>                   . python-font-lock-syntactic-face-function)
> -                (font-lock-extend-after-change-region-function
> -                 . python-font-lock-extend-region)))
> +                (font-lock-extend-region-functions
> +                 . (python-font-lock-extend-region))))
>    (setq-local syntax-propertize-function
>                python-syntax-propertize-function)
>    (setq-local imenu-create-index-function

This is bound to break in some cases.  Please use `add-hook` instead.


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63622; Package emacs. (Tue, 23 May 2023 15:46:01 GMT) Full text and rfc822 format available.

Message #32 received at 63622 <at> debbugs.gnu.org (full text, mbox):

From: kobarity <kobarity <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Tom Gillespie <tgbugs <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>,
 63622 <at> debbugs.gnu.org
Subject: Re: bug#63622: lisp/progmodes/python.el: performance regression
 introduced by multiline font-lock
Date: Wed, 24 May 2023 00:45:27 +0900

[Message part 1 (text/plain, inline)]

Stefan Monnier wrote:
> > @@ -6708,8 +6710,8 @@ python-mode
> >                  nil nil nil nil
> >                  (font-lock-syntactic-face-function
> >                   . python-font-lock-syntactic-face-function)
> > -                (font-lock-extend-after-change-region-function
> > -                 . python-font-lock-extend-region)))
> > +                (font-lock-extend-region-functions
> > +                 . (python-font-lock-extend-region))))
> >    (setq-local syntax-propertize-function
> >                python-syntax-propertize-function)
> >    (setq-local imenu-create-index-function
> 
> This is bound to break in some cases.  Please use `add-hook` instead.

Thanks.  I also revised python-font-lock-extend-region.  The attached
0001-Use-font-lock-extend-region-functions-in-python-mode.patch is the
revised patch.

As for the performance degradation, I am almost certain that it is a
bug in python-info-docstring-p and/or python-rx string-delimiter.
python-info-docstring-p uses the following regex as one of the
conditions to determine if it is a docstring.

          (re (concat "[uU]?[rR]?"
                      (python-rx string-delimiter))))

One of the problems is that string-delimiter matches even if there is
a single character preceding the string.  For example,
string-delimiter matches both x" and 1".  It might be reasonable to
match r" or b", for example.  Because they are correct string starter
in Python.  However, there is no reason to match x" or 1".  Besides,
python-info-docstring-p explicitly states "[uU]?[rR]?" as above.  So
(python-rx string-delimiter) in the above code should only match
string delimiter without prefixes.  Current python-info-docstring-p
incorrectly considers a code like [""] or {""} to be a docstring.

The performance problem in the example shown by Tom can be resolved by
modifying the above code as follows:

          (re (concat "[uU]?[rR]?"
                      (rx (or "\"\"\"" "\"" "'''" "'")))))

It breaks some ERTs, but I think we should fix the ERTs.  However,
there seems to be another problem in python-info-docstring-p.  It
intentionally considers contiguous strings as docstring as in the ERT
python-info-docstring-p-1:

'''
Module Docstring Django style.
'''
u'''Additional module docstring.'''
'''Not a module docstring.'''

However, as far as I have tried with Python 3 and Python 2.7, this is
not correct.  Therefore, I feel it is better to mark the ERTs as
expected failures than to modify it at this stage.  The attached
0001-Fix-python-info-docstring-p.patch is the patch to do this.

[0001-Use-font-lock-extend-region-functions-in-python-mode.patch (application/octet-stream, attachment)]

[0001-Fix-python-info-docstring-p.patch (application/octet-stream, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63622; Package emacs. (Tue, 23 May 2023 17:09:02 GMT) Full text and rfc822 format available.

Message #35 received at 63622 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: kobarity <kobarity <at> gmail.com>
Cc: Tom Gillespie <tgbugs <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>,
 63622 <at> debbugs.gnu.org
Subject: Re: bug#63622: lisp/progmodes/python.el: performance regression
 introduced by multiline font-lock
Date: Tue, 23 May 2023 13:08:25 -0400

> Thanks.  I also revised python-font-lock-extend-region.  The attached
> 0001-Use-font-lock-extend-region-functions-in-python-mode.patch is the
> revised patch.

LGTM, thank you.

I'm not sufficiently familiar with Python to have an opinion on the
rest, tho your analysis sounds convincing.


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63622; Package emacs. (Tue, 23 May 2023 19:05:01 GMT) Full text and rfc822 format available.

Message #38 received at 63622 <at> debbugs.gnu.org (full text, mbox):

From: Tom Gillespie <tgbugs <at> gmail.com>
To: kobarity <kobarity <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 63622 <at> debbugs.gnu.org
Subject: Re: bug#63622: lisp/progmodes/python.el: performance regression
 introduced by multiline font-lock
Date: Tue, 23 May 2023 12:04:16 -0700

I have tested the two patches and the performance is dramatically improved.

> One of the problems is that string-delimiter matches even if there is
> a single character preceding the string.

I agree with the assessment about docstring syntax. The fix in Fix-python-
is ok and fixes the font-locking performance in general. The general issue
with the definition of string-delimiter is something that might need to be
considered in the future if another perf issue crops up, but for now I think it
is safe to leave it alone since docstring detection was the only place where
it seems to be causing issues.

While we're here I think [fF]? should probably be included in the prefix since
format strings are also valid docstrings.

Thank you very much for digging into this!
Tom

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63622; Package emacs. (Tue, 23 May 2023 23:22:02 GMT) Full text and rfc822 format available.

Message #41 received at 63622 <at> debbugs.gnu.org (full text, mbox):

From: kobarity <kobarity <at> gmail.com>
To: Tom Gillespie <tgbugs <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 63622 <at> debbugs.gnu.org
Subject: Re: bug#63622: lisp/progmodes/python.el: performance regression
 introduced by multiline font-lock
Date: Wed, 24 May 2023 08:21:11 +0900

Hi Stefan and Tom,

Thank you very much for your confirmation.

Tom Gillespie wrote:
> While we're here I think [fF]? should probably be included in the prefix since
> format strings are also valid docstrings.

Is that so?  As far as I have tried, f-string does not seem to be a
docstring.

Python 3.11.3 (main, May  2 2023, 21:12:31) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def f1():
...     "docstring"
...
>>> f1.__doc__
'docstring'
>>> def f2():
...     f"not docstring"
...
>>> f2.__doc__
>>> a = 1
>>> def f3():
...     f"not docstring {a}"
...
>>> f3.__doc__
>>>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63622; Package emacs. (Tue, 23 May 2023 23:43:02 GMT) Full text and rfc822 format available.

Message #44 received at 63622 <at> debbugs.gnu.org (full text, mbox):

From: Ruijie Yu <ruijie <at> netyu.xyz>
To: kobarity <kobarity <at> gmail.com>
Cc: Tom Gillespie <tgbugs <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, 63622 <at> debbugs.gnu.org
Subject: Re: bug#63622: lisp/progmodes/python.el: performance regression
 introduced by multiline font-lock
Date: Wed, 24 May 2023 07:41:52 +0800

On May 23, 2023, at 23:46, kobarity <kobarity <at> gmail.com> wrote:
> 
> The performance problem in the example shown by Tom can be resolved by
> modifying the above code as follows:
> 
>          (re (concat "[uU]?[rR]?"
>                      (rx (or "\"\"\"" "\"" "'''" "'")))))

I didn’t read the context for this snippet, but isn’t it sufficient to match for only one single-quote and double-quote, instead of also matching for the triple (multiline) counterparts?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63622; Package emacs. (Wed, 24 May 2023 15:06:02 GMT) Full text and rfc822 format available.

Message #47 received at 63622 <at> debbugs.gnu.org (full text, mbox):

From: kobarity <kobarity <at> gmail.com>
To: Ruijie Yu <ruijie <at> netyu.xyz>
Cc: Tom Gillespie <tgbugs <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, 63622 <at> debbugs.gnu.org
Subject: Re: bug#63622: lisp/progmodes/python.el: performance regression
 introduced by multiline font-lock
Date: Thu, 25 May 2023 00:05:09 +0900

[Message part 1 (text/plain, inline)]

Ruijie Yu wrote:
> On May 23, 2023, at 23:46, kobarity <kobarity <at> gmail.com> wrote:
> > The performance problem in the example shown by Tom can be resolved by
> > modifying the above code as follows:
> > 
> >          (re (concat "[uU]?[rR]?"
> >                      (rx (or "\"\"\"" "\"" "'''" "'")))))
> 
> I didn’t read the context for this snippet, but isn’t it sufficient to match for only one single-quote and double-quote, instead of also matching for the triple (multiline) counterparts? 

You are right.  I copied the above code from the definition of
python-rx string-delimiter.  However, it was inside of the group
construct.  As group capturing is not needed in
python-info-docstring-p, the regex can be simplified to:

          (re "[uU]?[rR]?[\"']"))

The same regex was used in another place in python-info-docstring-p.
So I fixed it too.  Also I added a simple ERT to identify this fix.

I wrote:
> It breaks some ERTs, but I think we should fix the ERTs.  However,
> there seems to be another problem in python-info-docstring-p.  It
> intentionally considers contiguous strings as docstring as in the ERT
> python-info-docstring-p-1:
> 
> '''
> Module Docstring Django style.
> '''
> u'''Additional module docstring.'''
> '''Not a module docstring.'''
> 
> However, as far as I have tried with Python 3 and Python 2.7, this is
> not correct.

This was my misunderstanding.  PEP-257
(https://www.python.org/dev/peps/pep-0257/#what-is-a-docstring)
clearly states:

#+begin_quote
String literals occurring elsewhere in Python code may also act as
documentation. They are not recognized by the Python bytecode compiler
and are not accessible as runtime object attributes (i.e. not assigned
to __doc__), but two types of extra docstrings may be extracted by
software tools:

1. String literals occurring immediately after a simple assignment at
    the top level of a module, class, or __init__ method are called
    “attribute docstrings”.
2. String literals occurring immediately after another docstring are
    called “additional docstrings”.
#+end_quote

However, there still seems to be a bug in python-info-docstring-p.
Therefore, I would like to keep failing ERTs as expected fail at this
time.  Maybe f-string can also be a docstring.

Attached is a series of patches that replace the previous patches.

[0001-Fix-python-info-docstring-p.patch (application/octet-stream, attachment)]

[0002-Use-font-lock-extend-region-functions-in-python-mode.patch (application/octet-stream, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63622; Package emacs. (Wed, 24 May 2023 18:54:01 GMT) Full text and rfc822 format available.

Message #50 received at 63622 <at> debbugs.gnu.org (full text, mbox):

From: Tom Gillespie <tgbugs <at> gmail.com>
To: kobarity <kobarity <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 63622 <at> debbugs.gnu.org
Subject: Re: bug#63622: lisp/progmodes/python.el: performance regression
 introduced by multiline font-lock
Date: Wed, 24 May 2023 11:53:00 -0700

> Is that so?  As far as I have tried, f-string does not seem to be a
> docstring.

Oops. Indeed you are correct. I remember this being an issue in one of my
files as I had to assign to __doc__ manually since of course docstrings do
not require the file to be run for the purposes of extracting documentation.

Best!
Tom

Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Fri, 26 May 2023 10:02:02 GMT) Full text and rfc822 format available.

Notification sent to Tom Gillespie <tgbugs <at> gmail.com>:
bug acknowledged by developer. (Fri, 26 May 2023 10:02:02 GMT) Full text and rfc822 format available.

Message #55 received at 63622-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: kobarity <kobarity <at> gmail.com>
Cc: ruijie <at> netyu.xyz, tgbugs <at> gmail.com, 63622-done <at> debbugs.gnu.org,
 monnier <at> iro.umontreal.ca
Subject: Re: bug#63622: lisp/progmodes/python.el: performance regression
 introduced by multiline font-lock
Date: Fri, 26 May 2023 13:01:33 +0300

> Date: Thu, 25 May 2023 00:05:09 +0900
> From: kobarity <kobarity <at> gmail.com>
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>,
> 	Tom Gillespie <tgbugs <at> gmail.com>,
> 	Eli Zaretskii <eliz <at> gnu.org>,
> 	63622 <at> debbugs.gnu.org
> 
> Attached is a series of patches that replace the previous patches.

Thanks, installed on the emacs-29 branch, and closing the bug.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 23 Jun 2023 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 13 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #63622 lisp/progmodes/python.el: performance regression introduced by multiline font-lock

GNU bug report logs - #63622
lisp/progmodes/python.el: performance regression introduced by multiline font-lock