GNU bug report logs - #53749
29.0.50; [PATCH] Xref backend for TeX buffers

Previous Next

Package: emacs;

Reported by: David Fussner <dfussner <at> googlemail.com>

Date: Thu, 3 Feb 2022 15:10:02 UTC

Severity: normal

Tags: patch

Found in version 29.0.50

Fixed in version 31.1

Done: Stefan Kangas <stefankangas <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 53749 in the body.
You can then email your comments to 53749 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 03 Feb 2022 15:10:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to David Fussner <dfussner <at> googlemail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 03 Feb 2022 15:10:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 3 Feb 2022 15:09:22 +0000

[Message part 1 (text/plain, inline)]

I've recently been trying to use xref commands with a tags table in a
TeX repository, and many of the results are sub-optimal.  This is a
known issue -- within living memory there have been at least two
discussions related to it on help-gnu-emacs:

https://lists.gnu.org/archive/html/help-gnu-emacs/2018-06/msg00126.html
https://lists.gnu.org/archive/html/help-gnu-emacs/2021-07/msg00436.html

Neither discussion resulted in any code, at least not that I can find,
and the issues mentioned there remain.  For example,
xref-find-definitions on, say, '\mycommand' returns

No definitions found for: mycommand.

(The absence of the escape char in the search string makes the search
fail, as the tag name in the table will be '\mycommand'.)

Similarly, any xref command on 'my:citekey' will only search by default
for the half of the symbol under point, stopping at the colon.

There are many other behaviors that are suboptimal, as well, so in the
end I wrote a new xref backend for TeX buffers (cloning large portions
of the default etags backend), and wondered whether it might be welcome
in GNU Emacs.

A few remarks:

1. The code should work as it stands both in the AUCTeX and the in-tree
modes.  The AUCTeX hooks I've included in the patch are provisional, as
I would want to discuss with them how they would want to handle it,
should the patch be accepted in some form.

2. Along the way I found some issues with how etags parses TeX files,
issues which affect the usefulness of the xref commands, so I've made
changes in etags.c as well.  When running the test suite for etags the
only diffs occurred in the TeX-related sections of the resulting tags
file, and location information in those sections was good.

3. The patch as it stands enables all the changes by default to give
what I judge to be the best out-of-the-box experience, but wiser heads
may well have other ideas.

4. If it looks like the patch will make it into Emacs in some form, I'm
going to need to assign copyright, so I'd appreciate help with getting
that started.

Thanks,

David.

[Message part 2 (text/html, inline)]

[0001-Provide-an-xref-backend-for-TeX-buffers.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 21 Feb 2022 02:12:01 GMT) Full text and rfc822 format available.

Message #8 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: David Fussner <dfussner <at> googlemail.com>, 53749 <at> debbugs.gnu.org
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 21 Feb 2022 04:11:33 +0200

Hi!

Let us first discuss whether we could make do without an additional Xref 
backend. Just to make sure.

On 03.02.2022 17:09, David Fussner via Bug reports for GNU Emacs, the 
Swiss army knife of text editors wrote:
> Similarly, any xref command on 'my:citekey' will only search by default
> for the half of the symbol under point, stopping at the colon.

etags's implementation of 'xref-backend-identifier-at-point' calls 
'find-tag--default', which consults 'find-tag-default-function' and
(get major-mode 'find-tag-default-function).

So if your main goal was to alter which string gets searched for (based 
on text around point), you can define a function which returns the 
necessary string (as you did in the patch) and then either set 
'find-tag-default-function' to that function, or put it on the 
'find-tag-default-function' property for the respective major mode 
functions.

> There are many other behaviors that are suboptimal, as well, so in the
> end I wrote a new xref backend for TeX buffers (cloning large portions
> of the default etags backend), and wondered whether it might be welcome
> in GNU Emacs.

Could you point out the other changes which were required?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 21 Feb 2022 09:49:02 GMT) Full text and rfc822 format available.

Message #11 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 21 Feb 2022 09:48:30 +0000

(Resending to include the mailing list -- sorry!)

Hi Dmitry,

Many thanks for looking into this.

>
> So if your main goal was to alter which string gets searched for (based
> on text around point), you can define a function which returns the
> necessary string (as you did in the patch) and then either set
> 'find-tag-default-function' to that function, or put it on the
> 'find-tag-default-function' property for the respective major mode
> functions.
>
> > There are many other behaviors that are suboptimal, as well, so in the
> > end I wrote a new xref backend for TeX buffers (cloning large portions
> > of the default etags backend), and wondered whether it might be welcome
> > in GNU Emacs.
>
> Could you point out the other changes which were required?

As you've noticed, I tried at first to get by without a new backend,
but I ran into a few issues that I couldn't solve that way, hence the
current patch.  A couple of examples:

1. TeX is very generous with the characters it includes in its
symbols, so what looks like a standard symbol to it can look like a
regexp either to grep or to emacs, so I needed to changes things in
xref-find-apropos and in xref-find-references to take this into
account.  (See tex-xref-apropos-regexp and
tex-xref-references-in-directory.)  Sometimes using a search string
that had been put through regexp-quote was wrong, as when a user
provided their own regexp in the minibuffer, so in both those cases I
provided fallbacks to a different search in case the default search
came up empty.  I couldn't see how to do this without a new backend.

2.  A package like biblatex creates what amounts to a separate
namespace using the \newbibmacro mechanism, so pretty much every
biblatex style has both a \cite command and a cite bibmacro, and I
wanted to allow emacs to differentiate between them when using
xref-find-definitions.  Because users of the etoolbox package (like
biblatex) may well mix commands with and without the escape char "\",
I also provided a variable to allow users to find when a \command is
called using \csuse{command} instead.  Again, this required a fallback
search (see xref-backend-definitions) which I couldn't see how to
provide without a new backend.

Does this make any sense?  I can give more specific examples if you
like -- try running xref-find-references on a TeX command with "@" in
it.  (If memory serves, that behaved badly here on an unpatched emacs,
but maybe I'm misremembering.)

David.

On Mon, 21 Feb 2022 at 02:11, Dmitry Gutov <dgutov <at> yandex.ru> wrote:
>
> Hi!
>
> Let us first discuss whether we could make do without an additional Xref
> backend. Just to make sure.
>
> On 03.02.2022 17:09, David Fussner via Bug reports for GNU Emacs, the
> Swiss army knife of text editors wrote:
> > Similarly, any xref command on 'my:citekey' will only search by default
> > for the half of the symbol under point, stopping at the colon.
>
> etags's implementation of 'xref-backend-identifier-at-point' calls
> 'find-tag--default', which consults 'find-tag-default-function' and
> (get major-mode 'find-tag-default-function).
>
> So if your main goal was to alter which string gets searched for (based
> on text around point), you can define a function which returns the
> necessary string (as you did in the patch) and then either set
> 'find-tag-default-function' to that function, or put it on the
> 'find-tag-default-function' property for the respective major mode
> functions.
>
> > There are many other behaviors that are suboptimal, as well, so in the
> > end I wrote a new xref backend for TeX buffers (cloning large portions
> > of the default etags backend), and wondered whether it might be welcome
> > in GNU Emacs.
>
> Could you point out the other changes which were required?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 21 Feb 2022 12:38:02 GMT) Full text and rfc822 format available.

Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Arash Esbati <arash <at> gnu.org>
To: David Fussner via "Bug reports for GNU Emacs, the Swiss army knife of
 text editors" <bug-gnu-emacs <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, David Fussner <dfussner <at> googlemail.com>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 21 Feb 2022 13:35:52 +0100

David Fussner via "Bug reports for GNU Emacs, the Swiss army knife of
text editors" <bug-gnu-emacs <at> gnu.org> writes:

> diff --git a/lib-src/etags.c b/lib-src/etags.c
> index aa5bc8839d..e5269aa456 100644
> --- a/lib-src/etags.c
> +++ b/lib-src/etags.c
> [...]
>  /* Default set of control sequences to put into TEX_toktab.
> -   The value of environment var TEXTAGS is prepended to this.  */
> +   The value of environment var TEXTAGS is prepended to this.
> +   (2021) Add variants of '\def', some additional LaTeX commands,
> +   and common variants from the 'etoolbox' package.  Also, add
> +   starred variants of the commands if they exist.  Starred
> +   variants need to appear before their unstarred versions. */
>  static const char *TEX_defenv = "\
> -:chapter:section:subsection:subsubsection:eqno:label:ref:cite:bibitem\
> -:part:appendix:entry:index:def\
> -:newcommand:renewcommand:newenvironment:renewenvironment";
> +:chapter*:section*:subsection*:subsubsection*:part*:label:ref\
> +:chapter:section:subsection:subsubsection:eqno:cite:bibitem\
> +:part:appendix:entry:index:def:edef:gdef:xdef:newcommand*:newcommand\
> +:renewcommand*:renewcommand:newenvironment*:newenvironment\
> +:renewenvironment*:renewenvironment:DeclareRobustCommand*\
> +:DeclareRobustCommand:renewrobustcmd*:renewrobustcmd:newrobustcmd*\
> +:newrobustcmd:let:csdef:csedef:csgdef:csxdef:csletcs:cslet";

Hi David,

thanks for looking into this.  While you're at it, can you also please
add support for the former xparse \newcommand variants which are now
(now is October 2020) part of LaTeX kernel, namely:

\NewDocumentCommand
\RenewDocumentCommand
\ProvideDocumentCommand
\DeclareDocumentCommand
\NewDocumentEnvironment
\RenewDocumentEnvironment
\ProvideDocumentEnvironment
\DeclareDocumentEnvironment
\NewExpandableDocumentCommand
\RenewExpandableDocumentCommand
\ProvideExpandableDocumentCommand
\DeclareExpandableDocumentCommand

TIA.  Best, Arash

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 21 Feb 2022 12:38:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 21 Feb 2022 14:05:02 GMT) Full text and rfc822 format available.

Message #20 received at submit <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Arash Esbati <arash <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, "David Fussner via Bug reports for GNU Emacs,
 the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 21 Feb 2022 14:03:59 +0000

Hi Arash,

Thank you for the list!  I had fully intended to add the new LaTeX 3
commands but managed somehow to forget.  If you see anything else I've
omitted please let me know.

David.

On Mon, 21 Feb 2022 at 12:36, Arash Esbati <arash <at> gnu.org> wrote:
>
> David Fussner via "Bug reports for GNU Emacs, the Swiss army knife of
> text editors" <bug-gnu-emacs <at> gnu.org> writes:
>
> > diff --git a/lib-src/etags.c b/lib-src/etags.c
> > index aa5bc8839d..e5269aa456 100644
> > --- a/lib-src/etags.c
> > +++ b/lib-src/etags.c
> > [...]
> >  /* Default set of control sequences to put into TEX_toktab.
> > -   The value of environment var TEXTAGS is prepended to this.  */
> > +   The value of environment var TEXTAGS is prepended to this.
> > +   (2021) Add variants of '\def', some additional LaTeX commands,
> > +   and common variants from the 'etoolbox' package.  Also, add
> > +   starred variants of the commands if they exist.  Starred
> > +   variants need to appear before their unstarred versions. */
> >  static const char *TEX_defenv = "\
> > -:chapter:section:subsection:subsubsection:eqno:label:ref:cite:bibitem\
> > -:part:appendix:entry:index:def\
> > -:newcommand:renewcommand:newenvironment:renewenvironment";
> > +:chapter*:section*:subsection*:subsubsection*:part*:label:ref\
> > +:chapter:section:subsection:subsubsection:eqno:cite:bibitem\
> > +:part:appendix:entry:index:def:edef:gdef:xdef:newcommand*:newcommand\
> > +:renewcommand*:renewcommand:newenvironment*:newenvironment\
> > +:renewenvironment*:renewenvironment:DeclareRobustCommand*\
> > +:DeclareRobustCommand:renewrobustcmd*:renewrobustcmd:newrobustcmd*\
> > +:newrobustcmd:let:csdef:csedef:csgdef:csxdef:csletcs:cslet";
>
> Hi David,
>
> thanks for looking into this.  While you're at it, can you also please
> add support for the former xparse \newcommand variants which are now
> (now is October 2020) part of LaTeX kernel, namely:
>
> \NewDocumentCommand
> \RenewDocumentCommand
> \ProvideDocumentCommand
> \DeclareDocumentCommand
> \NewDocumentEnvironment
> \RenewDocumentEnvironment
> \ProvideDocumentEnvironment
> \DeclareDocumentEnvironment
> \NewExpandableDocumentCommand
> \RenewExpandableDocumentCommand
> \ProvideExpandableDocumentCommand
> \DeclareExpandableDocumentCommand
>
> TIA.  Best, Arash

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 21 Feb 2022 14:05:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 21 Feb 2022 17:29:02 GMT) Full text and rfc822 format available.

Message #26 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 21 Feb 2022 17:28:32 +0000

Hi Dmitry,

I found a bit of time to test, and the problem with "@" in command
names appears when a search string for xref-find-references ends with
"@". The results returned will miss out valid hits, depending on what
follows the "@" in the actual command name in the TeX file.

Hope this might help,

David.

On Mon, 21 Feb 2022 at 09:48, David Fussner <dfussner <at> googlemail.com> wrote:
>
> (Resending to include the mailing list -- sorry!)
>
> Hi Dmitry,
>
> Many thanks for looking into this.
>
> >
> > So if your main goal was to alter which string gets searched for (based
> > on text around point), you can define a function which returns the
> > necessary string (as you did in the patch) and then either set
> > 'find-tag-default-function' to that function, or put it on the
> > 'find-tag-default-function' property for the respective major mode
> > functions.
> >
> > > There are many other behaviors that are suboptimal, as well, so in the
> > > end I wrote a new xref backend for TeX buffers (cloning large portions
> > > of the default etags backend), and wondered whether it might be welcome
> > > in GNU Emacs.
> >
> > Could you point out the other changes which were required?
>
> As you've noticed, I tried at first to get by without a new backend,
> but I ran into a few issues that I couldn't solve that way, hence the
> current patch.  A couple of examples:
>
> 1. TeX is very generous with the characters it includes in its
> symbols, so what looks like a standard symbol to it can look like a
> regexp either to grep or to emacs, so I needed to changes things in
> xref-find-apropos and in xref-find-references to take this into
> account.  (See tex-xref-apropos-regexp and
> tex-xref-references-in-directory.)  Sometimes using a search string
> that had been put through regexp-quote was wrong, as when a user
> provided their own regexp in the minibuffer, so in both those cases I
> provided fallbacks to a different search in case the default search
> came up empty.  I couldn't see how to do this without a new backend.
>
> 2.  A package like biblatex creates what amounts to a separate
> namespace using the \newbibmacro mechanism, so pretty much every
> biblatex style has both a \cite command and a cite bibmacro, and I
> wanted to allow emacs to differentiate between them when using
> xref-find-definitions.  Because users of the etoolbox package (like
> biblatex) may well mix commands with and without the escape char "\",
> I also provided a variable to allow users to find when a \command is
> called using \csuse{command} instead.  Again, this required a fallback
> search (see xref-backend-definitions) which I couldn't see how to
> provide without a new backend.
>
> Does this make any sense?  I can give more specific examples if you
> like -- try running xref-find-references on a TeX command with "@" in
> it.  (If memory serves, that behaved badly here on an unpatched emacs,
> but maybe I'm misremembering.)
>
> David.
>
> On Mon, 21 Feb 2022 at 02:11, Dmitry Gutov <dgutov <at> yandex.ru> wrote:
> >
> > Hi!
> >
> > Let us first discuss whether we could make do without an additional Xref
> > backend. Just to make sure.
> >
> > On 03.02.2022 17:09, David Fussner via Bug reports for GNU Emacs, the
> > Swiss army knife of text editors wrote:
> > > Similarly, any xref command on 'my:citekey' will only search by default
> > > for the half of the symbol under point, stopping at the colon.
> >
> > etags's implementation of 'xref-backend-identifier-at-point' calls
> > 'find-tag--default', which consults 'find-tag-default-function' and
> > (get major-mode 'find-tag-default-function).
> >
> > So if your main goal was to alter which string gets searched for (based
> > on text around point), you can define a function which returns the
> > necessary string (as you did in the patch) and then either set
> > 'find-tag-default-function' to that function, or put it on the
> > 'find-tag-default-function' property for the respective major mode
> > functions.
> >
> > > There are many other behaviors that are suboptimal, as well, so in the
> > > end I wrote a new xref backend for TeX buffers (cloning large portions
> > > of the default etags backend), and wondered whether it might be welcome
> > > in GNU Emacs.
> >
> > Could you point out the other changes which were required?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 21 Feb 2022 23:56:02 GMT) Full text and rfc822 format available.

Message #29 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Tue, 22 Feb 2022 01:55:17 +0200

On 21.02.2022 11:48, David Fussner wrote:
> Sometimes using a search string
> that had been put through regexp-quote was wrong, as when a user
> provided their own regexp in the minibuffer, so in both those cases I
> provided fallbacks to a different search in case the default search
> came up empty.  I couldn't see how to do this without a new backend.

One way to deal with that is to treat all user inputs as regexps there. 
Perhaps some will have to be more verbose that ideal, but as long as the 
user is familiar with the regexp syntax, the behavior will be both 
powerful and predictable.

> 2.  A package like biblatex creates what amounts to a separate
> namespace using the \newbibmacro mechanism, so pretty much every
> biblatex style has both a \cite command and a cite bibmacro, and I
> wanted to allow emacs to differentiate between them when using
> xref-find-definitions.  Because users of the etoolbox package (like
> biblatex) may well mix commands with and without the escape char "\",
> I also provided a variable to allow users to find when a \command is
> called using \csuse{command} instead.  Again, this required a fallback
> search (see xref-backend-definitions) which I couldn't see how to
> provide without a new backend.

Could those be be disambiguated when the tags are scanned, instead? Then 
the user will tailor their input to find the one or the other.

Or if we want more fuzzier matching, perhaps creating mode-specific 
values of etags-xref-find-definitions-tag-order could help.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 21 Feb 2022 23:57:02 GMT) Full text and rfc822 format available.

Message #32 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Tue, 22 Feb 2022 01:56:07 +0200

On 21.02.2022 19:28, David Fussner wrote:
> Hi Dmitry,
> 
> I found a bit of time to test, and the problem with "@" in command
> names appears when a search string for xref-find-references ends with
> "@". The results returned will miss out valid hits, depending on what
> follows the "@" in the actual command name in the TeX file.

Sorry, I have very little familiarity with TeX.

Do you have a step-by-step scenario? Perhaps using one of the .texi 
manuals already existing in the repo?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Tue, 22 Feb 2022 15:20:01 GMT) Full text and rfc822 format available.

Message #35 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Tue, 22 Feb 2022 15:19:29 +0000

[Message part 1 (text/plain, inline)]

Hi Dmitry,

> Do you have a step-by-step scenario? Perhaps using one of the .texi
> manuals already existing in the repo?

I can't find a good example in the emacs repo, but I'll try to talk through
what happens with a code snippet from biblatex.sty, which I hope will
explain some of the issues we're discussing, even if it is a little
artificial.

\DeclareBiblatexOption{global,type}[string]{uniquename}[true]{%
  \ifcsdef{blx <at> opt <at> uniquename@#1}
    {\letcs\blx <at> uniquename{blx <at> opt <at> uniquename@#1}}
    {\blx <at> err <at> invopt{uniquename=#1}{}}}
\def\blx <at> opt <at> uniquename <at> false{false}
\def\blx <at> opt <at> uniquename <at> init{init}
\def\blx <at> opt <at> uniquename <at> true{full}
\def\blx <at> opt <at> uniquename <at> full{full}
\def\blx <at> opt <at> uniquename <at> allinit{allinit}
\def\blx <at> opt <at> uniquename <at> allfull{allfull}
\def\blx <at> opt <at> uniquename <at> mininit{mininit}
\def\blx <at> opt <at> uniquename <at> minfull{minfull}

If you do M-? on \ifcsdef{blx <at> opt <at> uniquename@#1} using the default backend,
the default search string is blx <at> opt <at> uniquename@, and you'll get two hits,
that line and the following one.  Stepping through
xref-references-in-directory shows that the semantic-symref search (using
grep) only finds those two using the :searchtype 'symbol, and they're
returned.  If you change 'symbol to 'regexp, grep finds all the matches in
that code snippet, but then xref--convert-hits uses (format "\\_<%s\\_>"),
which again loses all but the first two hits when it scans the list
provided by grep.  Either grep or emacs here will miss out on valid hits
unless you change both the semantic-symref instantiation and the format
specification.

> One way to deal with that is to treat all user inputs as regexps there.
Perhaps some will have to be more verbose that ideal, but as      > long as
the user is familiar with the regexp syntax, the behavior will be both
powerful and predictable

If I understand you right, I think that's what I'm trying to do, but
allowing for users who perhaps aren't too familiar with emacs regexps and
who might typically just accept the default search string offered by xref.

>  Could those be disambiguated when the tags are scanned, instead? Then
the user will tailor their input to find the one or the other.

If I understand you correctly, that's also what I try to do -- each tagged
command in the tags file is searched by the name of the tag, which in these
cases will either start with the escape char or not.  Looking at the
biblatex snippet, if you come across \csuse{blx <at> opt <at> uniquename <at> false}
somewhere in a file, and you want to see what the definition is, you can't
know apriori how it was defined, with \def or with \csdef.  This snippet
above mixes both styles, and I hoped that a user would be allowed to choose
whether to search for both styles without necessarily having to try both
forms of the string in separate searches.  In fact, as the code stands, it
only does the second search if the first one fails, so it still more or
less keeps the two command-naming styles separate.

The simplest fix is to remove the escape char from all tag names, which I
suggest to users of ctags in some commented-out code in etags.c. This does
lose the ability to differentiate \def'ed commands and \csdef'd ones,
especially as in some circumstances they can have the same name.  I'm not
sure how great a loss that is, on the other hand.  Is that what you had in
mind?

> Or if we want more fuzzier matching, perhaps creating mode-specific
values of etags-xref-find-definitions-tag-order could help.

Yeah, you're right, I'm pretty sure I could use a buffer-local value of
that variable to get xref-find-definitions to do the fuzzy matching I'm
after.  Does the discussion above at all help to convince you that there
are other issues that might still require a new backend?

David.

[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Wed, 23 Feb 2022 02:22:01 GMT) Full text and rfc822 format available.

Message #38 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Wed, 23 Feb 2022 04:21:17 +0200

Hi David,

On 22.02.2022 17:19, David Fussner wrote:

>  > Do you have a step-by-step scenario? Perhaps using one of the .texi
>  > manuals already existing in the repo?
> 
> I can't find a good example in the emacs repo, but I'll try to talk 
> through what happens with a code snippet from biblatex.sty, which I hope 
> will explain some of the issues we're discussing, even if it is a little 
> artificial.

Thank you.

> \DeclareBiblatexOption{global,type}[string]{uniquename}[true]{%
>    \ifcsdef{blx <at> opt <at> uniquename@#1}
>      {\letcs\blx <at> uniquename{blx <at> opt <at> uniquename@#1}}
>      {\blx <at> err <at> invopt{uniquename=#1}{}}}
> \def\blx <at> opt <at> uniquename <at> false{false}
> \def\blx <at> opt <at> uniquename <at> init{init}
> \def\blx <at> opt <at> uniquename <at> true{full}
> \def\blx <at> opt <at> uniquename <at> full{full}
> \def\blx <at> opt <at> uniquename <at> allinit{allinit}
> \def\blx <at> opt <at> uniquename <at> allfull{allfull}
> \def\blx <at> opt <at> uniquename <at> mininit{mininit}
> \def\blx <at> opt <at> uniquename <at> minfull{minfull}
> 
> If you do M-? on \ifcsdef{blx <at> opt <at> uniquename@#1} using the default 
> backend, the default search string is blx <at> opt <at> uniquename@, and you'll 
> get two hits, that line and the following one.  Stepping through 
> xref-references-in-directory shows that the semantic-symref search 
> (using grep) only finds those two using the :searchtype 'symbol, and 
> they're returned.  If you change 'symbol to 'regexp, grep finds all the 
> matches in that code snippet, but then xref--convert-hits uses (format 
> "\\_<%s\\_>"), which again loses all but the first two hits when it 
> scans the list provided by grep.  Either grep or emacs here will miss 
> out on valid hits unless you change both the semantic-symref 
> instantiation and the format specification.

That might call for a different implementation of 'references' indeed.

But could you make 'blx <at> opt <at> uniquename' the default search string in 
that example? Does that make sense?

And if not, all in all, I wouldn't worry too much about 
xref-find-references, since TeX is more of a text format (IMHO) than a 
program with well-defined identifiers. Perhaps using project-find-regexp 
most of the time will save you a lot of the trouble?

>  > One way to deal with that is to treat all user inputs as regexps 
> there. Perhaps some will have to be more verbose that ideal, but as      
>  > long as the user is familiar with the regexp syntax, the behavior 
> will be both powerful and predictable
> 
> If I understand you right, I think that's what I'm trying to do, but 
> allowing for users who perhaps aren't too familiar with emacs regexps 
> and who might typically just accept the default search string offered by 
> xref.

I'm not sure how I feel about the extra "fuzziness" in the behavior 
which comes with this approach.

>  >  Could those be disambiguated when the tags are scanned, instead? 
> Then the user will tailor their input to find the one or the other.
> 
> If I understand you correctly, that's also what I try to do -- each 
> tagged command in the tags file is searched by the name of the tag, 
> which in these cases will either start with the escape char or not.  
> Looking at the biblatex snippet, if you come across 
> \csuse{blx <at> opt <at> uniquename <at> false} somewhere in a file, and you want to 
> see what the definition is, you can't know apriori how it was defined, 
> with \def or with \csdef.  This snippet above mixes both styles, and I 
> hoped that a user would be allowed to choose whether to search for both 
> styles without necessarily having to try both forms of the string in 
> separate searches.  In fact, as the code stands, it only does the second 
> search if the first one fails, so it still more or less keeps the two 
> command-naming styles separate.

The parser could create both qualified (with \def or \csdef) and 
unqualified entries for the same definition. Maybe make it optional 
(with -Q argument to etags). Then the user could search using any of 
these formats.

>  > Or if we want more fuzzier matching, perhaps creating mode-specific 
> values of etags-xref-find-definitions-tag-order could help.
> 
> Yeah, you're right, I'm pretty sure I could use a buffer-local value of 
> that variable to get xref-find-definitions to do the fuzzy matching I'm 
> after. Does the discussion above at all help to convince you that there 
> are other issues that might still require a new backend?

The suggestion about a buffer-local value of that var was made in the 
context of trying to make it work with the current etags backend. At 
least, in the first patch. If only because I don't really like to see 
duplicated code.

If we find another place where we really want to diverge, we could also 
try adding some behavior-altering variable first.

After that, we might as well add a new backend (I'm not really against 
it, just prefer to exhaust other options first), but hopefully someone 
else (more familiar with tex-mode) could take over this discussion at 
that point, and the subsequent responsibility for the added code. That 
person could be yourself too, under right conditions.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Wed, 23 Feb 2022 10:46:02 GMT) Full text and rfc822 format available.

Message #41 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Wed, 23 Feb 2022 10:45:28 +0000

Hi Dmitry,

Thanks again for looking at all this, and for your patience.

On Wed, 23 Feb 2022 at 02:21, Dmitry Gutov <dgutov <at> yandex.ru> wrote:

>
> That might call for a different implementation of 'references' indeed.
>
> But could you make 'blx <at> opt <at> uniquename' the default search string in
> that example? Does that make sense?
>

I guess it might be possible to come up with a regexp to suppress the
@ in some positions in the string, but the bad news is that if you M-?
with that search string you get no results at all with the default
backend. Grep finds the same two as before, but the default format
specification eliminates even those.  So you're left looking at a
string in your buffer and xref is telling you it isn't there.

> And if not, all in all, I wouldn't worry too much about
> xref-find-references, since TeX is more of a text format (IMHO) than a
> program with well-defined identifiers. Perhaps using project-find-regexp
> most of the time will save you a lot of the trouble?
>

You're quite right that C-x p g works well in this instance, and I
tried to improve how thing-at-point finds search strings in TeX
buffers for this command.  I guess TeX is a little bit of a bad fit
both for text modes and for prog modes, but I confess I'm still uneasy
at the thought of M-? returning such misleading results.  What would
you think about putting project-find-regexp on M-? in TeX buffers?
That is, assuming I don't find reasonably common TeX constructs that
defeat it?

> > If I understand you right, I think that's what I'm trying to do, but
> > allowing for users who perhaps aren't too familiar with emacs regexps
> > and who might typically just accept the default search string offered by
> > xref.
>
> I'm not sure how I feel about the extra "fuzziness" in the behavior
> which comes with this approach.

I see your point here.

>
> The parser could create both qualified (with \def or \csdef) and
> unqualified entries for the same definition. Maybe make it optional
> (with -Q argument to etags). Then the user could search using any of
> these formats.
>

I guess we could make etags do some of the work, perhaps adding also a
distinction between tagged commands that require this duplication
(\def & \csdef) and those that don't (\chapter).  Aside from making
tags files a lot bigger, and possibly adding another option to a
program already overloaded with them -- neither of which is a
showstopper -- I suspect it could work pretty well for
xref-find-definitions.

>
> The suggestion about a buffer-local value of that var was made in the
> context of trying to make it work with the current etags backend. At
> least, in the first patch. If only because I don't really like to see
> duplicated code.
>
> If we find another place where we really want to diverge, we could also
> try adding some behavior-altering variable first.
>
> After that, we might as well add a new backend (I'm not really against
> it, just prefer to exhaust other options first), but hopefully someone
> else (more familiar with tex-mode) could take over this discussion at
> that point, and the subsequent responsibility for the added code. That
> person could be yourself too, under right conditions.

I certainly concur about duplicated code, and I really did try hard to
get by without a new backend, but I won't pretend that I exhausted all
or even nearly all of the possibilities. If I'm understanding you
correctly, you'd prefer a few, small changes to the backend code in
etags.el (and xref.el), should that be necessary, to a whole new
backend which limits changes to tex-mode.el.  If this understanding is
reasonably accurate, I can have another look at earlier iterations of
the code to see what I missed, and perhaps come up with something that
works right without so much duplication. It may well take me some
time, so apologies in advance for being slow.

David.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 24 Feb 2022 02:24:02 GMT) Full text and rfc822 format available.

Message #44 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 24 Feb 2022 04:23:48 +0200

Hi David,

On 23.02.2022 12:45, David Fussner via Bug reports for GNU Emacs, the 
Swiss army knife of text editors wrote:

> I guess it might be possible to come up with a regexp to suppress the
> @ in some positions in the string, but the bad news is that if you M-?
> with that search string you get no results at all with the default
> backend. Grep finds the same two as before, but the default format
> specification eliminates even those.  So you're left looking at a
> string in your buffer and xref is telling you it isn't there.

That's odd. I've tried searching for 'blx <at> opt <at> uniquename' inside \...@, 
and 'grep -w' successfully finds it. Post-processing fails, apparently, 
but that depends on the contents of the syntax table. So one solution 
might be to update tex-mode's syntax table.

>> And if not, all in all, I wouldn't worry too much about
>> xref-find-references, since TeX is more of a text format (IMHO) than a
>> program with well-defined identifiers. Perhaps using project-find-regexp
>> most of the time will save you a lot of the trouble?
>>
> 
> You're quite right that C-x p g works well in this instance, and I
> tried to improve how thing-at-point finds search strings in TeX
> buffers for this command.  I guess TeX is a little bit of a bad fit
> both for text modes and for prog modes, but I confess I'm still uneasy
> at the thought of M-? returning such misleading results.  What would
> you think about putting project-find-regexp on M-? in TeX buffers?
> That is, assuming I don't find reasonably common TeX constructs that
> defeat it?

At the face of it, the suggestion seems odd (those command's features 
and user expectations are different), but it wouldn't be out of the 
question to circle back to it later.

>> The parser could create both qualified (with \def or \csdef) and
>> unqualified entries for the same definition. Maybe make it optional
>> (with -Q argument to etags). Then the user could search using any of
>> these formats.
>>
> 
> I guess we could make etags do some of the work, perhaps adding also a
> distinction between tagged commands that require this duplication
> (\def & \csdef) and those that don't (\chapter).  Aside from making
> tags files a lot bigger, and possibly adding another option to a
> program already overloaded with them -- neither of which is a
> showstopper -- I suspect it could work pretty well for
> xref-find-definitions.

IIUC tag files for LaTeX aren't going to be particularly big anyway 
(book projects are almost always smaller than even a mid-sized software 
project), so the size might never be a problem.

But then again, I could be very wrong about that.

>> The suggestion about a buffer-local value of that var was made in the
>> context of trying to make it work with the current etags backend. At
>> least, in the first patch. If only because I don't really like to see
>> duplicated code.
>>
>> If we find another place where we really want to diverge, we could also
>> try adding some behavior-altering variable first.
>>
>> After that, we might as well add a new backend (I'm not really against
>> it, just prefer to exhaust other options first), but hopefully someone
>> else (more familiar with tex-mode) could take over this discussion at
>> that point, and the subsequent responsibility for the added code. That
>> person could be yourself too, under right conditions.
> 
> I certainly concur about duplicated code, and I really did try hard to
> get by without a new backend, but I won't pretend that I exhausted all
> or even nearly all of the possibilities. If I'm understanding you
> correctly, you'd prefer a few, small changes to the backend code in
> etags.el (and xref.el), should that be necessary, to a whole new
> backend which limits changes to tex-mode.el.  If this understanding is
> reasonably accurate, I can have another look at earlier iterations of
> the code to see what I missed, and perhaps come up with something that
> works right without so much duplication. It may well take me some
> time, so apologies in advance for being slow.

Yes, please.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 24 Feb 2022 13:17:01 GMT) Full text and rfc822 format available.

Message #47 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 24 Feb 2022 13:15:41 +0000

Thanks Dmitry.  I'll post back here when I've got something.

David.

On Thu, 24 Feb 2022 at 02:23, Dmitry Gutov <dgutov <at> yandex.ru> wrote:
>
> Hi David,
>
> On 23.02.2022 12:45, David Fussner via Bug reports for GNU Emacs, the
> Swiss army knife of text editors wrote:
>
> > I guess it might be possible to come up with a regexp to suppress the
> > @ in some positions in the string, but the bad news is that if you M-?
> > with that search string you get no results at all with the default
> > backend. Grep finds the same two as before, but the default format
> > specification eliminates even those.  So you're left looking at a
> > string in your buffer and xref is telling you it isn't there.
>
> That's odd. I've tried searching for 'blx <at> opt <at> uniquename' inside \...@,
> and 'grep -w' successfully finds it. Post-processing fails, apparently,
> but that depends on the contents of the syntax table. So one solution
> might be to update tex-mode's syntax table.
>
> >> And if not, all in all, I wouldn't worry too much about
> >> xref-find-references, since TeX is more of a text format (IMHO) than a
> >> program with well-defined identifiers. Perhaps using project-find-regexp
> >> most of the time will save you a lot of the trouble?
> >>
> >
> > You're quite right that C-x p g works well in this instance, and I
> > tried to improve how thing-at-point finds search strings in TeX
> > buffers for this command.  I guess TeX is a little bit of a bad fit
> > both for text modes and for prog modes, but I confess I'm still uneasy
> > at the thought of M-? returning such misleading results.  What would
> > you think about putting project-find-regexp on M-? in TeX buffers?
> > That is, assuming I don't find reasonably common TeX constructs that
> > defeat it?
>
> At the face of it, the suggestion seems odd (those command's features
> and user expectations are different), but it wouldn't be out of the
> question to circle back to it later.
>
> >> The parser could create both qualified (with \def or \csdef) and
> >> unqualified entries for the same definition. Maybe make it optional
> >> (with -Q argument to etags). Then the user could search using any of
> >> these formats.
> >>
> >
> > I guess we could make etags do some of the work, perhaps adding also a
> > distinction between tagged commands that require this duplication
> > (\def & \csdef) and those that don't (\chapter).  Aside from making
> > tags files a lot bigger, and possibly adding another option to a
> > program already overloaded with them -- neither of which is a
> > showstopper -- I suspect it could work pretty well for
> > xref-find-definitions.
>
> IIUC tag files for LaTeX aren't going to be particularly big anyway
> (book projects are almost always smaller than even a mid-sized software
> project), so the size might never be a problem.
>
> But then again, I could be very wrong about that.
>
> >> The suggestion about a buffer-local value of that var was made in the
> >> context of trying to make it work with the current etags backend. At
> >> least, in the first patch. If only because I don't really like to see
> >> duplicated code.
> >>
> >> If we find another place where we really want to diverge, we could also
> >> try adding some behavior-altering variable first.
> >>
> >> After that, we might as well add a new backend (I'm not really against
> >> it, just prefer to exhaust other options first), but hopefully someone
> >> else (more familiar with tex-mode) could take over this discussion at
> >> that point, and the subsequent responsibility for the added code. That
> >> person could be yourself too, under right conditions.
> >
> > I certainly concur about duplicated code, and I really did try hard to
> > get by without a new backend, but I won't pretend that I exhausted all
> > or even nearly all of the possibilities. If I'm understanding you
> > correctly, you'd prefer a few, small changes to the backend code in
> > etags.el (and xref.el), should that be necessary, to a whole new
> > backend which limits changes to tex-mode.el.  If this understanding is
> > reasonably accurate, I can have another look at earlier iterations of
> > the code to see what I missed, and perhaps come up with something that
> > works right without so much duplication. It may well take me some
> > time, so apologies in advance for being slow.
>
> Yes, please.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Fri, 25 Feb 2022 20:17:01 GMT) Full text and rfc822 format available.

Message #50 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Augusto Stoffel <arstoffel <at> gmail.com>
To: David Fussner via "Bug reports for GNU Emacs, the Swiss army knife of
 text editors" <bug-gnu-emacs <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, David Fussner <dfussner <at> googlemail.com>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Fri, 25 Feb 2022 21:16:16 +0100

Hi David,

I took a superficial look at this thread, and this seems very nice.

I was wondering why you want to be able to find the definition of macros
with @ in their name.  Those are "private" macros that the user
shouldn't have occasion to use.  Is it for a TeX programmer mode?

Let me also mention a library I wrote for analyzing TeX code (accessible
to Emacs via LSP):

    https://github.com/astoff/digestif

It's written in Lua (can run on the LuaTeX interpreter) and uses PEGs
for flexible parsing.  If you want to be very ambitious about what you
are able to parse, I think regexps are not sufficient.

Digestif can handle \cite{messed up reference} just fine, for example.

On Thu,  3 Feb 2022 at 15:09, David Fussner via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org> wrote:

> I've recently been trying to use xref commands with a tags table in a
> TeX repository, and many of the results are sub-optimal.  This is a
> known issue -- within living memory there have been at least two
> discussions related to it on help-gnu-emacs:
>
> https://lists.gnu.org/archive/html/help-gnu-emacs/2018-06/msg00126.html
> https://lists.gnu.org/archive/html/help-gnu-emacs/2021-07/msg00436.html
>
> Neither discussion resulted in any code, at least not that I can find,
> and the issues mentioned there remain.  For example,
> xref-find-definitions on, say, '\mycommand' returns
>
> No definitions found for: mycommand.
>
> (The absence of the escape char in the search string makes the search
> fail, as the tag name in the table will be '\mycommand'.)
>
> Similarly, any xref command on 'my:citekey' will only search by default
> for the half of the symbol under point, stopping at the colon.
>
> There are many other behaviors that are suboptimal, as well, so in the
> end I wrote a new xref backend for TeX buffers (cloning large portions
> of the default etags backend), and wondered whether it might be welcome
> in GNU Emacs.
>
> A few remarks:
>
> 1. The code should work as it stands both in the AUCTeX and the in-tree
> modes.  The AUCTeX hooks I've included in the patch are provisional, as
> I would want to discuss with them how they would want to handle it,
> should the patch be accepted in some form.
>
> 2. Along the way I found some issues with how etags parses TeX files,
> issues which affect the usefulness of the xref commands, so I've made
> changes in etags.c as well.  When running the test suite for etags the
> only diffs occurred in the TeX-related sections of the resulting tags
> file, and location information in those sections was good.
>
> 3. The patch as it stands enables all the changes by default to give
> what I judge to be the best out-of-the-box experience, but wiser heads
> may well have other ideas.
>
> 4. If it looks like the patch will make it into Emacs in some form, I'm
> going to need to assign copyright, so I'd appreciate help with getting
> that started.
>
> Thanks,
>
> David.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Fri, 25 Feb 2022 20:17:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 26 Feb 2022 09:30:01 GMT) Full text and rfc822 format available.

Message #56 received at submit <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Augusto Stoffel <arstoffel <at> gmail.com>
Cc: 53749 <at> debbugs.gnu.org, "David Fussner via Bug reports for GNU Emacs,
 the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 26 Feb 2022 09:29:06 +0000

Hi Augusto,

On Fri, 25 Feb 2022 at 20:16, Augusto Stoffel <arstoffel <at> gmail.com> wrote:
>
> Hi David,

>
> I took a superficial look at this thread, and this seems very nice.

Thanks!

>
> I was wondering why you want to be able to find the definition of macros
> with @ in their name.  Those are "private" macros that the user
> shouldn't have occasion to use.  Is it for a TeX programmer mode?

I confess that TeX developers are indeed one of the main targets for
the feature as I envisioned it.  For creating and following \labels,
\refs, and \cites (of all sorts) I find RefTeX very handy, as well as
for jumping around \chapters and \sections and the like.  What I miss
when developing are the code-navigation features of something like
xref, which are (from the user point of view) both simple and
powerful.  My modest goal was to make Emacs' extensive infrastructure
work a little better out of the box for TeX documents, especially for
styles and other collections of macros.

>
> Let me also mention a library I wrote for analyzing TeX code (accessible
> to Emacs via LSP):
>
>     https://github.com/astoff/digestif
>
> It's written in Lua (can run on the LuaTeX interpreter) and uses PEGs
> for flexible parsing.  If you want to be very ambitious about what you
> are able to parse, I think regexps are not sufficient.
>
> Digestif can handle \cite{messed up reference} just fine, for example.
>

This looks very nice indeed, and if I'm reading it right provides a
replacement both for RefTeX and for the code-navigation features I'm
trying to implement.  I figure I'll continue trying to get improved
out-of-the-box features into core, and if I manage to satisfy Dmitry
we'll then have a choice, but in any case I'm going to have a longer
look at digestif when I get some time.

Thanks for the hint!

David.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 26 Feb 2022 09:30:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 26 Feb 2022 10:57:02 GMT) Full text and rfc822 format available.

Message #62 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Augusto Stoffel <arstoffel <at> gmail.com>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, "David Fussner via Bug reports for GNU Emacs,
 the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 26 Feb 2022 11:56:50 +0100

On Sat, 26 Feb 2022 at 09:29, David Fussner <dfussner <at> googlemail.com> wrote:

> Hi Augusto,
>
> On Fri, 25 Feb 2022 at 20:16, Augusto Stoffel <arstoffel <at> gmail.com> wrote:
>>
>> Hi David,
>
>>
>> I took a superficial look at this thread, and this seems very nice.
>
> Thanks!
>
>>
>> I was wondering why you want to be able to find the definition of macros
>> with @ in their name.  Those are "private" macros that the user
>> shouldn't have occasion to use.  Is it for a TeX programmer mode?
>
> I confess that TeX developers are indeed one of the main targets for
> the feature as I envisioned it.  For creating and following \labels,
> \refs, and \cites (of all sorts) I find RefTeX very handy, as well as
> for jumping around \chapters and \sections and the like.  What I miss
> when developing are the code-navigation features of something like
> xref, which are (from the user point of view) both simple and
> powerful.  My modest goal was to make Emacs' extensive infrastructure
> work a little better out of the box for TeX documents, especially for
> styles and other collections of macros.

Sorry for entering a tangent, but here's one more thing I dislike about
RefTeX you might want to consider.  If you type \label{something}, as
opposed to using the RefTeX command to add a label (or if you edit the
label by hand) then RefTeX will not reparse the document and get out of
sync.  Or at least that was the case when I still used RefTeX.  So it
might be worth considering some cache invalidation scheme there.
(Digestif has caching for multifile documents, but parsing a single file
is fast enough that this is not a problem I need to worry :-).)

>>
>> Let me also mention a library I wrote for analyzing TeX code (accessible
>> to Emacs via LSP):
>>
>>     https://github.com/astoff/digestif
>>
>> It's written in Lua (can run on the LuaTeX interpreter) and uses PEGs
>> for flexible parsing.  If you want to be very ambitious about what you
>> are able to parse, I think regexps are not sufficient.
>>
>> Digestif can handle \cite{messed up reference} just fine, for example.
>>
>
> This looks very nice indeed, and if I'm reading it right provides a
> replacement both for RefTeX and for the code-navigation features I'm
> trying to implement.

That's right.  Also command completion (including snippets, if that's
your thing) and Eldoc.

>  I figure I'll continue trying to get improved
> out-of-the-box features into core, and if I manage to satisfy Dmitry
> we'll then have a choice, but in any case I'm going to have a longer
> look at digestif when I get some time.

Let me mention one last thing, since you seem interested in a TeX
programming mode.

Digestif will not work great out of the box for programming because it
correctly considers @ to have catcode "other" (so it can't be part of
the name of a command).  But this is trivial to change and, in fact,
Digestif already has a "latex-prog" mode that simulates the correct
catcodes.  It would be easy to include a "latex-expl3" mode as well.

The problem is that there's no way for Emacs to communicate that one of
these programming modes is to be used.  This could be fixed in two ways:

A. by creating latex-prog and latex-expl3 derived modes in Emacs, or

B. adding heuristics to Digestif to decide if a given file is "document"
   or "code".

Do you have any thoughts about A?  Would there be any other benefits in
Emacs to justify the latex-prog and latex-expl3 major modes?  It seems
that (at least in AUCTeX) @ is always considered a letter, which may be
innocuous but is kinda wrong.

>
> Thanks for the hint!
>
> David.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 26 Feb 2022 10:58:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sun, 27 Feb 2022 18:44:02 GMT) Full text and rfc822 format available.

Message #68 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Arash Esbati <arash <at> gnu.org>
To: Augusto Stoffel <arstoffel <at> gmail.com>
Cc: 53749 <at> debbugs.gnu.org, David Fussner <dfussner <at> googlemail.com>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sun, 27 Feb 2022 19:42:55 +0100

Augusto Stoffel <arstoffel <at> gmail.com> writes:

> If you type \label{something}, as opposed to using the RefTeX command
> to add a label (or if you edit the label by hand) then RefTeX will not
> reparse the document and get out of sync.

If you know the known labels to RefTeX are out of sync, you can issue
`C-c )' with a prefix argument:

,----[ C-h f reftex-reference RET ]
| reftex-reference is an interactive native compiled Lisp function in
| ‘reftex-ref.el’.
| 
| (reftex-reference &optional TYPE NO-INSERT CUT)
| 
| Make a LaTeX reference.  Look only for labels of a certain TYPE.
| With prefix arg, force to rescan buffer for labels.  This should only be
| necessary if you have recently entered labels yourself without using
| reftex-label.  Rescanning of the buffer can also be requested from the
| label selection menu.
| The function returns the selected label or nil.
| If NO-INSERT is non-nil, do not insert \ref command, just return label.
| When called with 2 C-u prefix args, disable magic word recognition.
| 
|   Probably introduced at or before Emacs version 20.1.
| 
`----

Or in the labels *RefTeX select* buffer, you have these choices:

 r / C-u r  Reparse document / Reparse entire document.

I usually hit r when I don't find the label I'm looking for.

> Or at least that was the case when I still used RefTeX.  So it might
> be worth considering some cache invalidation scheme there.

The question is if it's worth the effort where a remedy is already in
place.

Best, Arash

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 28 Feb 2022 09:10:02 GMT) Full text and rfc822 format available.

Message #71 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Arash Esbati <arash <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, Augusto Stoffel <arstoffel <at> gmail.com>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 28 Feb 2022 09:09:36 +0000

Hi Augusto,

For what it's worth, I've always just done what Arash suggests when
RefTeX gets out of sync, and haven't had any issues with it that I can
remember.  (To be fair, my use cases haven't exactly been exotic.)

> The problem is that there's no way for Emacs to communicate that one of
> these programming modes is to be used.  This could be fixed in two ways:
>
> A. by creating latex-prog and latex-expl3 derived modes in Emacs, or
>
> B. adding heuristics to Digestif to decide if a given file is "document"
>    or "code".
>
> Do you have any thoughts about A?  Would there be any other benefits in
> Emacs to justify the latex-prog and latex-expl3 major modes?  It seems
> that (at least in AUCTeX) @ is always considered a letter, which may be
> innocuous but is kinda wrong.

The only thought I have is that it sounds like a new major mode would
be overkill for what you need here.  I would think that a variable or
defcustom might do the trick, or at most maybe a minor mode?  When
navigating code I really want to be able to follow the commands to
their source no matter whether the command is internal or for users,
though I can see how in a code-completion setting you might want to be
able to separate the two more cleanly.  Obviously, I'm not the person
you need to convince about all of this -- that would be Arash and the
emacs maintainers, themselves.

Best,

David.

On Sun, 27 Feb 2022 at 18:43, Arash Esbati <arash <at> gnu.org> wrote:
>
> Augusto Stoffel <arstoffel <at> gmail.com> writes:
>
> > If you type \label{something}, as opposed to using the RefTeX command
> > to add a label (or if you edit the label by hand) then RefTeX will not
> > reparse the document and get out of sync.
>
> If you know the known labels to RefTeX are out of sync, you can issue
> `C-c )' with a prefix argument:
>
> ,----[ C-h f reftex-reference RET ]
> | reftex-reference is an interactive native compiled Lisp function in
> | ‘reftex-ref.el’.
> |
> | (reftex-reference &optional TYPE NO-INSERT CUT)
> |
> | Make a LaTeX reference.  Look only for labels of a certain TYPE.
> | With prefix arg, force to rescan buffer for labels.  This should only be
> | necessary if you have recently entered labels yourself without using
> | reftex-label.  Rescanning of the buffer can also be requested from the
> | label selection menu.
> | The function returns the selected label or nil.
> | If NO-INSERT is non-nil, do not insert \ref command, just return label.
> | When called with 2 C-u prefix args, disable magic word recognition.
> |
> |   Probably introduced at or before Emacs version 20.1.
> |
> `----
>
> Or in the labels *RefTeX select* buffer, you have these choices:
>
>  r / C-u r  Reparse document / Reparse entire document.
>
> I usually hit r when I don't find the label I'm looking for.
>
> > Or at least that was the case when I still used RefTeX.  So it might
> > be worth considering some cache invalidation scheme there.
>
> The question is if it's worth the effort where a remedy is already in
> place.
>
> Best, Arash

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 28 Feb 2022 11:56:02 GMT) Full text and rfc822 format available.

Message #74 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Arash Esbati <arash <at> gnu.org>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Augusto Stoffel <arstoffel <at> gmail.com>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 28 Feb 2022 12:54:52 +0100

David Fussner <dfussner <at> googlemail.com> writes:

>> The problem is that there's no way for Emacs to communicate that one of
>> these programming modes is to be used.  This could be fixed in two ways:
>>
>> A. by creating latex-prog and latex-expl3 derived modes in Emacs, or
>>
>> B. adding heuristics to Digestif to decide if a given file is "document"
>>    or "code".
>>
>> Do you have any thoughts about A?  Would there be any other benefits in
>> Emacs to justify the latex-prog and latex-expl3 major modes?  It seems
>> that (at least in AUCTeX) @ is always considered a letter, which may be
>> innocuous but is kinda wrong.
>
> The only thought I have is that it sounds like a new major mode would
> be overkill for what you need here.  I would think that a variable or
> defcustom might do the trick, or at most maybe a minor mode?

Sorry if I'm missing something here, I wasn't tracking this thread.  But
does doctex-mode (or docTeX in AUCTeX) fit the bill here?

Best, Arash

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 28 Feb 2022 13:06:02 GMT) Full text and rfc822 format available.

Message #77 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Augusto Stoffel <arstoffel <at> gmail.com>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Arash Esbati <arash <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 28 Feb 2022 14:05:38 +0100

On Mon, 28 Feb 2022 at 09:09, David Fussner <dfussner <at> googlemail.com> wrote:

> For what it's worth, I've always just done what Arash suggests when
> RefTeX gets out of sync, and haven't had any issues with it that I can
> remember.  (To be fair, my use cases haven't exactly been exotic.)

Sure, I'm aware you have to do the manual resync when using RefTeX.  I
just think it's a totally unnecessary hurdle at this time and age.  I
used to advice the RefTeX commands so they would reparse the document
every time, and this worked just fine.  (Granted, I never worked on
anything over 100 pages or so, but it should also be possible to reparse
individual files of a multifile project so that the file sizes never
become an issue.)

>> The problem is that there's no way for Emacs to communicate that one of
>> these programming modes is to be used.  This could be fixed in two ways:
>>
>> A. by creating latex-prog and latex-expl3 derived modes in Emacs, or
>>
>> B. adding heuristics to Digestif to decide if a given file is "document"
>>    or "code".
>>
>> Do you have any thoughts about A?  Would there be any other benefits in
>> Emacs to justify the latex-prog and latex-expl3 major modes?  It seems
>> that (at least in AUCTeX) @ is always considered a letter, which may be
>> innocuous but is kinda wrong.
>
> The only thought I have is that it sounds like a new major mode would
> be overkill for what you need here.  I would think that a variable or
> defcustom might do the trick, or at most maybe a minor mode?  When
> navigating code I really want to be able to follow the commands to
> their source no matter whether the command is internal or for users,
> though I can see how in a code-completion setting you might want to be
> able to separate the two more cleanly.  Obviously, I'm not the person
> you need to convince about all of this -- that would be Arash and the
> emacs maintainers, themselves.

Okay, thanks for your insight.

>
> Best,
>
> David.
>
> On Sun, 27 Feb 2022 at 18:43, Arash Esbati <arash <at> gnu.org> wrote:
>>
>> Augusto Stoffel <arstoffel <at> gmail.com> writes:
>>
>> > If you type \label{something}, as opposed to using the RefTeX command
>> > to add a label (or if you edit the label by hand) then RefTeX will not
>> > reparse the document and get out of sync.
>>
>> If you know the known labels to RefTeX are out of sync, you can issue
>> `C-c )' with a prefix argument:
>>
>> ,----[ C-h f reftex-reference RET ]
>> | reftex-reference is an interactive native compiled Lisp function in
>> | ‘reftex-ref.el’.
>> |
>> | (reftex-reference &optional TYPE NO-INSERT CUT)
>> |
>> | Make a LaTeX reference.  Look only for labels of a certain TYPE.
>> | With prefix arg, force to rescan buffer for labels.  This should only be
>> | necessary if you have recently entered labels yourself without using
>> | reftex-label.  Rescanning of the buffer can also be requested from the
>> | label selection menu.
>> | The function returns the selected label or nil.
>> | If NO-INSERT is non-nil, do not insert \ref command, just return label.
>> | When called with 2 C-u prefix args, disable magic word recognition.
>> |
>> |   Probably introduced at or before Emacs version 20.1.
>> |
>> `----
>>
>> Or in the labels *RefTeX select* buffer, you have these choices:
>>
>>  r / C-u r  Reparse document / Reparse entire document.
>>
>> I usually hit r when I don't find the label I'm looking for.
>>
>> > Or at least that was the case when I still used RefTeX.  So it might
>> > be worth considering some cache invalidation scheme there.
>>
>> The question is if it's worth the effort where a remedy is already in
>> place.
>>
>> Best, Arash

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 28 Feb 2022 13:12:02 GMT) Full text and rfc822 format available.

Message #80 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Augusto Stoffel <arstoffel <at> gmail.com>
To: Arash Esbati <arash <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, David Fussner <dfussner <at> googlemail.com>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 28 Feb 2022 14:11:05 +0100

On Mon, 28 Feb 2022 at 12:54, Arash Esbati <arash <at> gnu.org> wrote:

> Sorry if I'm missing something here, I wasn't tracking this thread.  But
> does doctex-mode (or docTeX in AUCTeX) fit the bill here?

Ah, I forgot about that one.  I mean basically that, but for files like
plain.tex or tikz.code.tex; or also when writing a .sty file directly
for personal purposes only.

But since tex-mode and derived ones always pretend @ is a letter, I
guess there's no real need for a dedicated TeX programming mode.

Now, how about expl3 code, where _ and : are letters too?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 28 Feb 2022 19:05:02 GMT) Full text and rfc822 format available.

Message #83 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Arash Esbati <arash <at> gnu.org>
To: Augusto Stoffel <arstoffel <at> gmail.com>
Cc: 53749 <at> debbugs.gnu.org, David Fussner <dfussner <at> googlemail.com>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 28 Feb 2022 20:04:02 +0100

Augusto Stoffel <arstoffel <at> gmail.com> writes:

> Now, how about expl3 code, where _ and : are letters too?

AUCTeX has a style file expl3.el[1] which changes the syntax for "_" and
":".  Can't tell about the builtin tex/latex-mode.

Best, Arash

Footnotes:
[1]  http://git.savannah.gnu.org/cgit/auctex.git/tree/style/expl3.el

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Tue, 01 Mar 2022 08:47:01 GMT) Full text and rfc822 format available.

Message #86 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Arash Esbati <arash <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, Augusto Stoffel <arstoffel <at> gmail.com>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Tue, 1 Mar 2022 08:46:23 +0000

Unless I'm missing something, the in-tree code hasn't (yet) made any
syntax changes for expl3.

Best,

David.

On Mon, 28 Feb 2022 at 19:04, Arash Esbati <arash <at> gnu.org> wrote:
>
> Augusto Stoffel <arstoffel <at> gmail.com> writes:
>
> > Now, how about expl3 code, where _ and : are letters too?
>
> AUCTeX has a style file expl3.el[1] which changes the syntax for "_" and
> ":".  Can't tell about the builtin tex/latex-mode.
>
> Best, Arash
>
> Footnotes:
> [1]  http://git.savannah.gnu.org/cgit/auctex.git/tree/style/expl3.el

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 08 Sep 2022 13:26:02 GMT) Full text and rfc822 format available.

Message #89 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org, David Fussner <dfussner <at> googlemail.com>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 08 Sep 2022 15:25:00 +0200

Dmitry Gutov <dgutov <at> yandex.ru> writes:

> Let us first discuss whether we could make do without an additional
> Xref backend. Just to make sure.

(I'm going through old bug reports that unfortunately weren't resolved
at the time.)

I've only skimmed this bug report, so I might well have missed
something.  Was there a conclusion here as to what should be done?  It
looks like useful functionality to me (but it's been years since I've
written tex-y stuff).

In any case, if this is to be applied, we'd need to have a copyright
assignment to the FSF on file.  David, would you be willing to sign
that?

Added tag(s) moreinfo. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Thu, 08 Sep 2022 13:26:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 08 Sep 2022 13:36:02 GMT) Full text and rfc822 format available.

Message #94 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 53749 <at> debbugs.gnu.org, Dmitry Gutov <dgutov <at> yandex.ru>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 8 Sep 2022 14:34:47 +0100

Hi Lars,

The conclusion at the time was that the patch needed reworking before
Dmitry was happy with it, and I've not yet found enough time to do so,
though I'm still fully intending to make the necessary changes. Please
leave the bug open so I can restart the conversation when I have a
better patch. (Oh, and I'm more than happy to sign the copyright
assignment whenever Dmitry judges the patch to be ready.)

Thanks for the reminder.

David.

On Thu, 8 Sept 2022 at 14:25, Lars Ingebrigtsen <larsi <at> gnus.org> wrote:
>
> Dmitry Gutov <dgutov <at> yandex.ru> writes:
>
> > Let us first discuss whether we could make do without an additional
> > Xref backend. Just to make sure.
>
> (I'm going through old bug reports that unfortunately weren't resolved
> at the time.)
>
> I've only skimmed this bug report, so I might well have missed
> something.  Was there a conclusion here as to what should be done?  It
> looks like useful functionality to me (but it's been years since I've
> written tex-y stuff).
>
> In any case, if this is to be applied, we'd need to have a copyright
> assignment to the FSF on file.  David, would you be willing to sign
> that?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 08 Sep 2022 13:40:02 GMT) Full text and rfc822 format available.

Message #97 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Dmitry Gutov <dgutov <at> yandex.ru>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 08 Sep 2022 15:39:02 +0200

David Fussner <dfussner <at> googlemail.com> writes:

> The conclusion at the time was that the patch needed reworking before
> Dmitry was happy with it, and I've not yet found enough time to do so,
> though I'm still fully intending to make the necessary changes. Please
> leave the bug open so I can restart the conversation when I have a
> better patch.

Of course.

> (Oh, and I'm more than happy to sign the copyright
> assignment whenever Dmitry judges the patch to be ready.)

Here's the form to get started:

Please email the following information to assign <at> gnu.org, and we
will send you the assignment form for your past and future changes.

Please use your full legal name (in ASCII characters) as the subject
line of the message.
----------------------------------------------------------------------
REQUEST: SEND FORM FOR PAST AND FUTURE CHANGES

[What is the name of the program or package you're contributing to?]
Emacs

[Did you copy any files or text written by someone else in these changes?
Even if that material is free software, we need to know about it.]

[Do you have an employer who might have a basis to claim to own
your changes?  Do you attend a school which might make such a claim?]

[For the copyright registration, what country are you a citizen of?]

[What year were you born?]

[Please write your email address here.]

[Please write your postal address here.]

[Which files have you changed so far, and which new files have you written
so far?]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 08 Sep 2022 15:52:02 GMT) Full text and rfc822 format available.

Message #100 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 53749 <at> debbugs.gnu.org, Dmitry Gutov <dgutov <at> yandex.ru>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 8 Sep 2022 16:50:55 +0100

Thanks Lars, will do.

On Thu, 8 Sept 2022 at 14:39, Lars Ingebrigtsen <larsi <at> gnus.org> wrote:
>
> David Fussner <dfussner <at> googlemail.com> writes:
>
> > The conclusion at the time was that the patch needed reworking before
> > Dmitry was happy with it, and I've not yet found enough time to do so,
> > though I'm still fully intending to make the necessary changes. Please
> > leave the bug open so I can restart the conversation when I have a
> > better patch.
>
> Of course.
>
> > (Oh, and I'm more than happy to sign the copyright
> > assignment whenever Dmitry judges the patch to be ready.)
>
> Here's the form to get started:
>
>
> Please email the following information to assign <at> gnu.org, and we
> will send you the assignment form for your past and future changes.
>
> Please use your full legal name (in ASCII characters) as the subject
> line of the message.
> ----------------------------------------------------------------------
> REQUEST: SEND FORM FOR PAST AND FUTURE CHANGES
>
> [What is the name of the program or package you're contributing to?]
> Emacs
>
> [Did you copy any files or text written by someone else in these changes?
> Even if that material is free software, we need to know about it.]
>
> [Do you have an employer who might have a basis to claim to own
> your changes?  Do you attend a school which might make such a claim?]
>
> [For the copyright registration, what country are you a citizen of?]
>
> [What year were you born?]
>
> [Please write your email address here.]
>
> [Please write your postal address here.]
>
> [Which files have you changed so far, and which new files have you written
> so far?]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sun, 03 Sep 2023 09:09:01 GMT) Full text and rfc822 format available.

Message #103 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>,
 Dmitry Gutov <dgutov <at> yandex.ru>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sun, 3 Sep 2023 02:08:27 -0700

David Fussner <dfussner <at> googlemail.com> writes:

> Thanks Lars, will do.
>
> On Thu, 8 Sept 2022 at 14:39, Lars Ingebrigtsen <larsi <at> gnus.org> wrote:
>>
>> David Fussner <dfussner <at> googlemail.com> writes:
>>
>> > The conclusion at the time was that the patch needed reworking before
>> > Dmitry was happy with it, and I've not yet found enough time to do so,
>> > though I'm still fully intending to make the necessary changes. Please
>> > leave the bug open so I can restart the conversation when I have a
>> > better patch.
>>
>> Of course.

That was a year ago.  Have you made any progress here?

Removed tag(s) moreinfo. Request was from Stefan Kangas <stefankangas <at> gmail.com> to control <at> debbugs.gnu.org. (Sun, 03 Sep 2023 09:09:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sun, 03 Sep 2023 10:04:02 GMT) Full text and rfc822 format available.

Message #108 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 53749 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>,
 Dmitry Gutov <dgutov <at> yandex.ru>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sun, 3 Sep 2023 11:03:16 +0100

[Message part 1 (text/plain, inline)]

Hi Stefan

Thanks for the nudge. I do in fact have a patch that I'm just about finding
time to test, so I'll try to get it to the list within a week or two.

Thanks, and best,

David.

On Sun, 3 Sept 2023, 10:08 Stefan Kangas, <stefankangas <at> gmail.com> wrote:

> David Fussner <dfussner <at> googlemail.com> writes:
>
> > Thanks Lars, will do.
> >
> > On Thu, 8 Sept 2022 at 14:39, Lars Ingebrigtsen <larsi <at> gnus.org> wrote:
> >>
> >> David Fussner <dfussner <at> googlemail.com> writes:
> >>
> >> > The conclusion at the time was that the patch needed reworking before
> >> > Dmitry was happy with it, and I've not yet found enough time to do so,
> >> > though I'm still fully intending to make the necessary changes. Please
> >> > leave the bug open so I can restart the conversation when I have a
> >> > better patch.
> >>
> >> Of course.
>
> That was a year ago.  Have you made any progress here?
>

[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sun, 03 Sep 2023 10:47:02 GMT) Full text and rfc822 format available.

Message #111 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>,
 Dmitry Gutov <dgutov <at> yandex.ru>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sun, 3 Sep 2023 03:46:01 -0700

David Fussner <dfussner <at> googlemail.com> writes:

> Thanks for the nudge. I do in fact have a patch that I'm just about finding
> time to test, so I'll try to get it to the list within a week or two.

Sounds good, and thank you.

Added tag(s) pending. Request was from Stefan Kangas <stefankangas <at> gmail.com> to control <at> debbugs.gnu.org. (Thu, 07 Sep 2023 18:28:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Wed, 13 Sep 2023 11:12:02 GMT) Full text and rfc822 format available.

Message #116 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 53749 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>,
 Dmitry Gutov <dgutov <at> yandex.ru>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Wed, 13 Sep 2023 12:10:36 +0100

[Message part 1 (text/plain, inline)]

Hi Dmitry,

I've belatedly found some time to get the xref commands working better
in TeX buffers, this time using the default etags backend, as you
requested last year.  The basic strategy remains the same -- create a
new thing-at-point argument "texsymbol" which replaces "symbol" in a
definable set of major modes, then pass the resulting search term to
xref.  Changes in etags.c ensure that the various TeX modes and the
tags tables are cooperating with each other, and I added a new option
to etags (--tex-alt-forms) to handle some of the complexities of the
TeX escape character (as you suggested).  I also manipulate some
variables buffer-locally to make things like project-find-regexp and
isearch-forward-thing-at-point work better in such buffers.

I attach a patch against current master. There is another patch which
contains changes to the test suite in test/manual/etags, but I'll
leave that one in case the changes I've made to etags.c need further
work.

I've sent patches to AUCTeX trying to fix a couple of issues there
with xref-find-references. There's more work to be done on related
issues in tex-mode.el, too, but this patch is a start.

Thanks,

David.

P.S. I'm also starting the copyright assignment process, in case these
changes prove acceptable.

On Sun, 3 Sept 2023 at 11:46, Stefan Kangas <stefankangas <at> gmail.com> wrote:
>
> David Fussner <dfussner <at> googlemail.com> writes:
>
> > Thanks for the nudge. I do in fact have a patch that I'm just about finding
> > time to test, so I'll try to get it to the list within a week or two.
>
> Sounds good, and thank you.

[0001-Fix-behavior-of-xref-commands-in-TeX-buffers.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Wed, 13 Sep 2023 13:43:01 GMT) Full text and rfc822 format available.

Message #119 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>,
 Dmitry Gutov <dgutov <at> yandex.ru>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Wed, 13 Sep 2023 06:42:27 -0700

David Fussner <dfussner <at> googlemail.com> writes:

> P.S. I'm also starting the copyright assignment process, in case these
> changes prove acceptable.

That's great, thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Wed, 13 Sep 2023 15:24:01 GMT) Full text and rfc822 format available.

Message #122 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: David Fussner <dfussner <at> googlemail.com>,
 Stefan Kangas <stefankangas <at> gmail.com>
Cc: 53749 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Wed, 13 Sep 2023 18:23:13 +0300

Hi David!

Thanks for the new patch.

I'm skipping over the etags parser changes (others might comment, I'm 
just assuming they are good).

And "thing at point" code is, I think, at your discretion (if the result 
is useful, then that seems good). I would probably not call the function 
the same way given that we don't install this "thing" globally, just 
using it from several the major modes in a particular way. Anyway, that 
is a minor affair.

I'd like to suggest two simplifications for the xref-related stuff, if 
those work for you.

On 13/09/2023 14:10, David Fussner via Bug reports for GNU Emacs, the 
Swiss army knife of text editors wrote:

> <...> I also manipulate some
> variables buffer-locally to make things like project-find-regexp and
> isearch-forward-thing-at-point work better in such buffers.

These won't be affected either way, right? Because project-find-regexp 
defaults its input to (thing-at-point 'symbol t), and isearch... 
probably also uses "symbol" if you ask it to.

So... why not just make tex-thingatpt-include-escape a boolean? What 
commands need to be distinguished that way? I think 'find-tag' (it's 
obsolete but still used sometimes) would need to obey this var as well.

And the second thing: you're putting the symbol on major modes.

+(dolist (texmode tex-thingatpt-modes-list)
+  (put texmode 'find-tag-default-function 'tex--thing-at-point))

Why not set the variable find-tag-default-function instead? That seems 
easier and more appropriate to do inside a major mode function.

Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Wed, 13 Sep 2023 17:03:02 GMT) Full text and rfc822 format available.

Message #125 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>,
 Stefan Kangas <stefankangas <at> gmail.com>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Wed, 13 Sep 2023 18:01:41 +0100

Hi Dmitry,

Thanks for the feedback!

> These won't be affected either way, right? Because project-find-regexp
> defaults its input to (thing-at-point 'symbol t), and isearch...
> probably also uses "symbol" if you ask it to.
>
> So... why not just make tex-thingatpt-include-escape a boolean? What
> commands need to be distinguished that way? I think 'find-tag' (it's
> obsolete but still used sometimes) would need to obey this var as well.

xref-find-apropos and xref-find-references don't work well (or at all)
with the escape char included in the search string, so I was keeping
that char away from them. (The buffer-local variables I manipulate for
project-find-regexp and isearch-forward-thing-at-point have to do with
ensuring they use the texsymbol thing in the first place -- see
tex--symbol-or-texsymbol.) Does that make sense?

I'll look at find-tag, too; thanks for pointing that out.

> Why not set the variable find-tag-default-function instead? That seems
> easier and more appropriate to do inside a major mode function.

I settled on putting the symbol on the modes because I thought it was
simpler than setting the variable buffer-locally in all the in-tree
and AUCTeX modes, but I'll revisit this and see whether I can come up
with something better.

Thanks again.

On Wed, 13 Sept 2023 at 16:23, Dmitry Gutov <dgutov <at> yandex.ru> wrote:
>
> Hi David!
>
> Thanks for the new patch.
>
> I'm skipping over the etags parser changes (others might comment, I'm
> just assuming they are good).
>
> And "thing at point" code is, I think, at your discretion (if the result
> is useful, then that seems good). I would probably not call the function
> the same way given that we don't install this "thing" globally, just
> using it from several the major modes in a particular way. Anyway, that
> is a minor affair.
>
> I'd like to suggest two simplifications for the xref-related stuff, if
> those work for you.r along the lines of your
>
> On 13/09/2023 14:10, David Fussner via Bug reports for GNU Emacs, the
> Swiss army knife of text editors wrote:
>
> > <...> I also manipulate some
> > variables buffer-locally to make things like project-find-regexp and
> > isearch-forward-thing-at-point work better in such buffers.
>
> These won't be affected either way, right? Because project-find-regexp
> defaults its input to (thing-at-point 'symbol t), and isearch...
> probably also uses "symbol" if you ask it to.
>
> So... why not just make tex-thingatpt-include-escape a boolean? What
> commands need to be distinguished that way? I think 'find-tag' (it's
> obsolete but still used sometimes) would need to obey this var as well.
>
> And the second thing: you're putting the symbol on major modes.
>
> +(dolist (texmode tex-thingatpt-modes-list)
> +  (put texmode 'find-tag-default-function 'tex--thing-at-point))
>
> Why not set the variable find-tag-default-function instead? That seems
> easier and more appropriate to do inside a major mode function.r along the lines of your
>
> Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Wed, 13 Sep 2023 19:17:02 GMT) Full text and rfc822 format available.

Message #128 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org, larsi <at> gnus.org, stefankangas <at> gmail.com,
 dfussner <at> googlemail.com
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Wed, 13 Sep 2023 22:16:21 +0300

> Cc: 53749 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>
> Date: Wed, 13 Sep 2023 18:23:13 +0300
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> 
> I'm skipping over the etags parser changes (others might comment, I'm 
> just assuming they are good).

They look OK to me at first glance, but we need to make sure the etags
tests still succeed after this change, and the new option should be
documented in the man page.  Bonus points for adding to the etags test
suite a test where this option is activated.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Wed, 13 Sep 2023 20:26:02 GMT) Full text and rfc822 format available.

Message #131 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>,
 Stefan Kangas <stefankangas <at> gmail.com>, Dmitry Gutov <dgutov <at> yandex.ru>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Wed, 13 Sep 2023 21:25:28 +0100

[Message part 1 (text/plain, inline)]

Thanks Eli.

I'll have a look at the man page, and also at an additional test for the
suite. I did run the test suite, and all the diffs were where they should
be; I can send a patch that I have if you'd like, but if I'm going to add
tests maybe you'd prefer to wait?

On Wed, 13 Sept 2023, 20:16 Eli Zaretskii, <eliz <at> gnu.org> wrote:

> > Cc: 53749 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>
> > Date: Wed, 13 Sep 2023 18:23:13 +0300
> > From: Dmitry Gutov <dgutov <at> yandex.ru>
> >
> > I'm skipping over the etags parser changes (others might comment, I'm
> > just assuming they are good).
>
> They look OK to me at first glance, but we need to make sure the etags
> tests still succeed after this change, and the new option should be
> documented in the man page.  Bonus points for adding to the etags test
> suite a test where this option is activated.
>

[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 14 Sep 2023 00:00:02 GMT) Full text and rfc822 format available.

Message #134 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>,
 Stefan Kangas <stefankangas <at> gmail.com>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 14 Sep 2023 02:59:33 +0300

On 13/09/2023 20:01, David Fussner via Bug reports for GNU Emacs, the 
Swiss army knife of text editors wrote:

>> These won't be affected either way, right? Because project-find-regexp
>> defaults its input to (thing-at-point 'symbol t), and isearch...
>> probably also uses "symbol" if you ask it to.
>>
>> So... why not just make tex-thingatpt-include-escape a boolean? What
>> commands need to be distinguished that way? I think 'find-tag' (it's
>> obsolete but still used sometimes) would need to obey this var as well.
> 
> xref-find-apropos and xref-find-references don't work well (or at all)
> with the escape char included in the search string, so I was keeping
> that char away from them. (The buffer-local variables I manipulate for
> project-find-regexp and isearch-forward-thing-at-point have to do with
> ensuring they use the texsymbol thing in the first place -- see
> tex--symbol-or-texsymbol.) Does that make sense?

Hmm, I suppose I skipped over that part of the patch too quickly.

Here's a potential problem with replacing the notion of "symbol": some 
other existing code (also working with TeX/LaTeX) might disagree, as it 
might have some existing notion of what a "symbol" in those modes is (as 
defined by the syntax table).

In general, we change the notion of a symbol by either changing the 
mode's syntax table, or by augmenting its effect using 
syntax-propertize-function (which, for example, could propertize the 
backslashes inside the buffer as "symbol constituent"). The latter might 
actually be a change that would affect how 'M-x xref-find-references' 
works (it will likely start to consider those \tags as symbol 
occurrences together with the backslash). But like other changes of what 
is considered to be a "symbol" in a major mode, it could conflict with 
existing code.

Anyway, I'm not saying you have to change the approach, but that's 
something to be aware of.

And to look at it from another direction: if the default implementation 
of xref-find-references (and etags uses the very generic one) doesn't 
work for you, perhaps it would be worth it to define a TeX-specific Xref 
backend. That would perhaps take 20-30 lines of code total, most of them 
delegating to the etags backend, or the default impl. But while 
delegating, you can modify the passed argument - e.g. if it included a 
backslash, you could forward it to the default impl for "find 
references" without a backslash. Or - alternatively - call 
(project-find-regexp "...") with a more complex regexp of your choice. 
The first alternative could look like this:

  (cl-defmethod xref-backend-references ((_backend (eql 'tex-etags)) 
identifier)
    (xref-backend-references 'etags (string-remove-prefix "\\" 
identifier)))

> I'll look at find-tag, too; thanks for pointing that out.

Doing the above choice on the level of Xref backend's methods 
would/should automatically make it work for all commands appropriately.

>> Why not set the variable find-tag-default-function instead? That seems
>> easier and more appropriate to do inside a major mode function.
> 
> I settled on putting the symbol on the modes because I thought it was
> simpler than setting the variable buffer-locally in all the in-tree
> and AUCTeX modes, but I'll revisit this and see whether I can come up
> with something better.

Do AUCTeX modes inherit from tex-mode? Or all call 
tex-common-initialization? Then you could set that variable locally 
inside that function once.

All in all, it might not be wise to modify the behavior of third-party 
packages from inside Emacs this way (they might have other expectations, 
or there's going to appear a new major mode that needs the same 
treatment anyway).

Setting a variable to be used through mode inheritance or delegation is 
fine, but if that doesn't help, I would probably stop at defining a 
helper function or two and documenting how it should be used. And then 
maybe work with AUCTeX people to get the remaining necessary changes in 
from their side (or just leaving that up to the user, depending on how 
functional the default config ends up being).

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 14 Sep 2023 05:15:01 GMT) Full text and rfc822 format available.

Message #137 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, larsi <at> gnus.org, stefankangas <at> gmail.com,
 dgutov <at> yandex.ru
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 14 Sep 2023 08:14:21 +0300

> From: David Fussner <dfussner <at> googlemail.com>
> Date: Wed, 13 Sep 2023 21:25:28 +0100
> Cc: Dmitry Gutov <dgutov <at> yandex.ru>, Stefan Kangas <stefankangas <at> gmail.com>, 53749 <at> debbugs.gnu.org, 
> 	Lars Ingebrigtsen <larsi <at> gnus.org>
> 
> I'll have a look at the man page, and also at an additional test for the suite. I did run the test suite, and
> all the diffs were where they should be; I can send a patch that I have if you'd like, but if I'm going to
> add tests maybe you'd prefer to wait?

Sure, we will wait.  There's no rush.  Let's have a complete patch
that covers all the aspects of this, and install it in one go.
Meanwhile you will also have time to work on the other review
comments.

Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 14 Sep 2023 06:11:01 GMT) Full text and rfc822 format available.

Message #140 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 Tassilo Horn <tsdh <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, larsi <at> gnus.org, stefankangas <at> gmail.com,
 dfussner <at> googlemail.com
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 14 Sep 2023 09:10:06 +0300

> Cc: 53749 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>,
>  Stefan Kangas <stefankangas <at> gmail.com>
> Date: Thu, 14 Sep 2023 02:59:33 +0300
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> 
> On 13/09/2023 20:01, David Fussner via Bug reports for GNU Emacs, the 
> Swiss army knife of text editors wrote:
> 
> >> These won't be affected either way, right? Because project-find-regexp
> >> defaults its input to (thing-at-point 'symbol t), and isearch...
> >> probably also uses "symbol" if you ask it to.
> >>
> >> So... why not just make tex-thingatpt-include-escape a boolean? What
> >> commands need to be distinguished that way? I think 'find-tag' (it's
> >> obsolete but still used sometimes) would need to obey this var as well.
> > 
> > xref-find-apropos and xref-find-references don't work well (or at all)
> > with the escape char included in the search string, so I was keeping
> > that char away from them. (The buffer-local variables I manipulate for
> > project-find-regexp and isearch-forward-thing-at-point have to do with
> > ensuring they use the texsymbol thing in the first place -- see
> > tex--symbol-or-texsymbol.) Does that make sense?
> 
> Hmm, I suppose I skipped over that part of the patch too quickly.
> 
> Here's a potential problem with replacing the notion of "symbol": some 
> other existing code (also working with TeX/LaTeX) might disagree, as it 
> might have some existing notion of what a "symbol" in those modes is (as 
> defined by the syntax table).
> 
> In general, we change the notion of a symbol by either changing the 
> mode's syntax table, or by augmenting its effect using 
> syntax-propertize-function (which, for example, could propertize the 
> backslashes inside the buffer as "symbol constituent"). The latter might 
> actually be a change that would affect how 'M-x xref-find-references' 
> works (it will likely start to consider those \tags as symbol 
> occurrences together with the backslash). But like other changes of what 
> is considered to be a "symbol" in a major mode, it could conflict with 
> existing code.
> 
> Anyway, I'm not saying you have to change the approach, but that's 
> something to be aware of.
> 
> And to look at it from another direction: if the default implementation 
> of xref-find-references (and etags uses the very generic one) doesn't 
> work for you, perhaps it would be worth it to define a TeX-specific Xref 
> backend. That would perhaps take 20-30 lines of code total, most of them 
> delegating to the etags backend, or the default impl. But while 
> delegating, you can modify the passed argument - e.g. if it included a 
> backslash, you could forward it to the default impl for "find 
> references" without a backslash. Or - alternatively - call 
> (project-find-regexp "...") with a more complex regexp of your choice. 
> The first alternative could look like this:
> 
>    (cl-defmethod xref-backend-references ((_backend (eql 'tex-etags)) 
> identifier)
>      (xref-backend-references 'etags (string-remove-prefix "\\" 
> identifier)))
> 
> > I'll look at find-tag, too; thanks for pointing that out.
> 
> Doing the above choice on the level of Xref backend's methods 
> would/should automatically make it work for all commands appropriately.
> 
> >> Why not set the variable find-tag-default-function instead? That seems
> >> easier and more appropriate to do inside a major mode function.
> > 
> > I settled on putting the symbol on the modes because I thought it was
> > simpler than setting the variable buffer-locally in all the in-tree
> > and AUCTeX modes, but I'll revisit this and see whether I can come up
> > with something better.
> 
> Do AUCTeX modes inherit from tex-mode? Or all call 
> tex-common-initialization? Then you could set that variable locally 
> inside that function once.
> 
> All in all, it might not be wise to modify the behavior of third-party 
> packages from inside Emacs this way (they might have other expectations, 
> or there's going to appear a new major mode that needs the same 
> treatment anyway).
> 
> Setting a variable to be used through mode inheritance or delegation is 
> fine, but if that doesn't help, I would probably stop at defining a 
> helper function or two and documenting how it should be used. And then 
> maybe work with AUCTeX people to get the remaining necessary changes in 
> from their side (or just leaving that up to the user, depending on how 
> functional the default config ends up being).

I think we should add Stefan and Tassilo (CCed) to this discussion, as
they might have valuable comments about this.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 14 Sep 2023 16:12:02 GMT) Full text and rfc822 format available.

Message #143 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>,
 Stefan Kangas <stefankangas <at> gmail.com>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 14 Sep 2023 17:11:13 +0100

Hi Dmitry,

Once again, many thanks for the feedback. I'm still not certain I
agree about the risks involved in creating a new "thing" type, as it
really only appears in a small number of commands and then only in TeX
buffers, and generally I tried to design the code to keep out of the
way of anything outside of such buffers, but needless to say you see
further and more clearly than I can. I've been reviewing your comments
and my code, and have a few ideas and questions about how to go
forward. Though I haven't coded it yet, it's possible that the
simplest (and least intrusive) approach to follow would do something
like this:

1. Get rid of the new texsymbol "thing" and just use a buffer-local
value of find-tag-default-function and a rather more thoroughly
modified syntax table to control what "symbol" means, but _only_ in
the context of commands that use find-tag-default-function. I think
I'd lose the ability to change the behavior of
isearch-forward-thing-at-point and project-find-regexp, as I can't see
how to temporarily modify the syntax table there, though perhaps I'm
missing something.

2. Simply eliminate the TeX escape character entirely, both from tag
names in a TAGS file and from any thing-at-point in a TeX buffer. I
think this would eliminate the need to distinguish among the various
xref commands in terms of whether they can or can't handle the escape
character. It would also eliminate the need for the new user option in
etags.c, as there would no longer be any code to cope with the escape
character when finding a (thing-at-point 'symbol). This is slightly
less powerful than the default I proposed, but there are probably many
use cases where it won't matter at all (though it would for my own,
possibly eccentric, use case).

Does this sound to you like a plausible way forward?

I've tried to reach out to the AUCTeX developers to see what they
might want to do about setting the value of local variables there, and
anything they come up with should be doable.

 Thanks again.

On Thu, 14 Sept 2023 at 00:59, Dmitry Gutov <dgutov <at> yandex.ru> wrote:
>
> On 13/09/2023 20:01, David Fussner via Bug reports for GNU Emacs, the
> Swiss army knife of text editors wrote:
>
> >> These won't be affected either way, right? Because project-find-regexp
> >> defaults its input to (thing-at-point 'symbol t), and isearch...
> >> probably also uses "symbol" if you ask it to.
> >>
> >> So... why not just make tex-thingatpt-include-escape a boolean? What
> >> commands need to be distinguished that way? I think 'find-tag' (it's
> >> obsolete but still used sometimes) would need to obey this var as well.
> >
> > xref-find-apropos and xref-find-references don't work well (or at all)
> > with the escape char included in the search string, so I was keeping
> > that char away from them. (The buffer-local variables I manipulate for
> > project-find-regexp and isearch-forward-thing-at-point have to do with
> > ensuring they use the texsymbol thing in the first place -- see
> > tex--symbol-or-texsymbol.) Does that make sense?
>
> Hmm, I suppose I skipped over that part of the patch too quickly.
>
> Here's a potential problem with replacing the notion of "symbol": some
> other existing code (also working with TeX/LaTeX) might disagree, as it
> might have some existing notion of what a "symbol" in those modes is (as
> defined by the syntax table).
>
> In general, we change the notion of a symbol by either changing the
> mode's syntax table, or by augmenting its effect using
> syntax-propertize-function (which, for example, could propertize the
> backslashes inside the buffer as "symbol constituent"). The latter might
> actually be a change that would affect how 'M-x xref-find-references'
> works (it will likely start to consider those \tags as symbol
> occurrences together with the backslash). But like other changes of what
> is considered to be a "symbol" in a major mode, it could conflict with
> existing code.
>
> Anyway, I'm not saying you have to change the approach, but that's
> something to be aware of.
>
> And to look at it from another direction: if the default implementation
> of xref-find-references (and etags uses the very generic one) doesn't
> work for you, perhaps it would be worth it to define a TeX-specific Xref
> backend. That would perhaps take 20-30 lines of code total, most of them
> delegating to the etags backend, or the default impl. But while
> delegating, you can modify the passed argument - e.g. if it included a
> backslash, you could forward it to the default impl for "find
> references" without a backslash. Or - alternatively - call
> (project-find-regexp "...") with a more complex regexp of your choice.
> The first alternative could look like this:
>
>    (cl-defmethod xref-backend-references ((_backend (eql 'tex-etags))
> identifier)
>      (xref-backend-references 'etags (string-remove-prefix "\\"
> identifier)))
>
> > I'll look at find-tag, too; thanks for pointing that out.
>
> Doing the above choice on the level of Xref backend's methods
> would/should automatically make it work for all commands appropriately.
>
> >> Why not set the variable find-tag-default-function instead? That seems
> >> easier and more appropriate to do inside a major mode function.
> >
> > I settled on putting the symbol on the modes because I thought it was
> > simpler than setting the variable buffer-locally in all the in-tree
> > and AUCTeX modes, but I'll revisit this and see whether I can come up
> > with something better.
>
> Do AUCTeX modes inherit from tex-mode? Or all call
> tex-common-initialization? Then you could set that variable locally
> inside that function once.
>
> All in all, it might not be wise to modify the behavior of third-party
> packages from inside Emacs this way (they might have other expectations,
> or there's going to appear a new major mode that needs the same
> treatment anyway).
>
> Setting a variable to be used through mode inheritance or delegation is
> fine, but if that doesn't help, I would probably stop at defining a
> helper function or two and documenting how it should be used. And then
> maybe work with AUCTeX people to get the remaining necessary changes in
> from their side (or just leaving that up to the user, depending on how
> functional the default config ends up being).

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 14 Sep 2023 23:56:02 GMT) Full text and rfc822 format available.

Message #146 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: David Fussner <dfussner <at> googlemail.com>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>,
 Stefan Kangas <stefankangas <at> gmail.com>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Fri, 15 Sep 2023 02:55:08 +0300

Hi David,

On 14/09/2023 19:11, David Fussner via Bug reports for GNU Emacs, the 
Swiss army knife of text editors wrote:

> Once again, many thanks for the feedback. I'm still not certain I
> agree about the risks involved in creating a new "thing" type, as it
> really only appears in a small number of commands and then only in TeX
> buffers, and generally I tried to design the code to keep out of the
> way of anything outside of such buffers, but needless to say you see
> further and more clearly than I can. I've been reviewing your comments
> and my code, and have a few ideas and questions about how to go
> forward. Though I haven't coded it yet, it's possible that the
> simplest (and least intrusive) approach to follow would do something
> like this:

I agree that the risks are probably low, and my review stems from the 
general approach that doing global modifications to the environment can 
lead to problems. It might or might not happen in your case. If anything 
happens, though, the same modifications tend to make it harder to 
investigate, e.g. to find where a particular bit of behavior comes from. 
So the more local an implementation of a feature can be, is generally 
the better.

But I'm no maintainer of tex-mode, and whatever choices are made here 
won't have effect outside of TeX, so if somebody wants to disagree with 
me, they're more than welcome to.

> 1. Get rid of the new texsymbol "thing" and just use a buffer-local
> value of find-tag-default-function and a rather more thoroughly
> modified syntax table to control what "symbol" means, but _only_ in
> the context of commands that use find-tag-default-function. I think
> I'd lose the ability to change the behavior of
> isearch-forward-thing-at-point and project-find-regexp, as I can't see
> how to temporarily modify the syntax table there, though perhaps I'm
> missing something.

I'm suggesting this approach together with defining a "new" backend for 
TeX. Quotes because while it's going to have its own name, it's mostly 
going to perform forwarding to an existing backend (etags).

This should make it practical for you to treat identifiers in 
xref-backend-definitions differently from that in 
xref-backend-references and xref-backend-apropos.

If you define find-tag-default-function, you don't have to change the 
syntax table too: it might be easier to search around with a regexp.

But for the new backend, you can also define the method 
xref-backend-identifier-at-point, where you would invoke the necessary 
bounds-of-thing logic. Then you won't need a change in 
find-tag-default-function.

Either way, though, the major modes will need to set up 
xref-backend-functions instead (with add-hook). This could also be done 
in a minor mode, which you'd enable in any TeX-related major modes that 
you use.

> 2. Simply eliminate the TeX escape character entirely, both from tag
> names in a TAGS file and from any thing-at-point in a TeX buffer. I
> think this would eliminate the need to distinguish among the various
> xref commands in terms of whether they can or can't handle the escape
> character. It would also eliminate the need for the new user option in
> etags.c, as there would no longer be any code to cope with the escape
> character when finding a (thing-at-point 'symbol). This is slightly
> less powerful than the default I proposed, but there are probably many
> use cases where it won't matter at all (though it would for my own,
> possibly eccentric, use case).

I wanted to ask whether including the backslash is important enough (it 
should not matter too much for disambiguation), but I figured it must 
be, otherwise you wouldn't go to all this effort.

If not, it would simplify things a lot, though.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Fri, 15 Sep 2023 06:48:02 GMT) Full text and rfc822 format available.

Message #149 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 Lars Ingebrigtsen <larsi <at> gnus.org>, Stefan Kangas <stefankangas <at> gmail.com>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Fri, 15 Sep 2023 07:47:29 +0100

[Message part 1 (text/plain, inline)]

Thanks Dmitry,

I'll make another stab at a "new" backend, as suggested. I'll have a look
at the escape char thing, too, and see how I feel about dropping it. It
shouldn't take 18 months this time!

Best,

David.

On Fri, 15 Sept 2023, 00:55 Dmitry Gutov, <dgutov <at> yandex.ru> wrote:

> Hi David,
>
> On 14/09/2023 19:11, David Fussner via Bug reports for GNU Emacs, the
> Swiss army knife of text editors wrote:
>
> > Once again, many thanks for the feedback. I'm still not certain I
> > agree about the risks involved in creating a new "thing" type, as it
> > really only appears in a small number of commands and then only in TeX
> > buffers, and generally I tried to design the code to keep out of the
> > way of anything outside of such buffers, but needless to say you see
> > further and more clearly than I can. I've been reviewing your comments
> > and my code, and have a few ideas and questions about how to go
> > forward. Though I haven't coded it yet, it's possible that the
> > simplest (and least intrusive) approach to follow would do something
> > like this:
>
> I agree that the risks are probably low, and my review stems from the
> general approach that doing global modifications to the environment can
> lead to problems. It might or might not happen in your case. If anything
> happens, though, the same modifications tend to make it harder to
> investigate, e.g. to find where a particular bit of behavior comes from.
> So the more local an implementation of a feature can be, is generally
> the better.
>
> But I'm no maintainer of tex-mode, and whatever choices are made here
> won't have effect outside of TeX, so if somebody wants to disagree with
> me, they're more than welcome to.
>
> > 1. Get rid of the new texsymbol "thing" and just use a buffer-local
> > value of find-tag-default-function and a rather more thoroughly
> > modified syntax table to control what "symbol" means, but _only_ in
> > the context of commands that use find-tag-default-function. I think
> > I'd lose the ability to change the behavior of
> > isearch-forward-thing-at-point and project-find-regexp, as I can't see
> > how to temporarily modify the syntax table there, though perhaps I'm
> > missing something.
>
> I'm suggesting this approach together with defining a "new" backend for
> TeX. Quotes because while it's going to have its own name, it's mostly
> going to perform forwarding to an existing backend (etags).
>
> This should make it practical for you to treat identifiers in
> xref-backend-definitions differently from that in
> xref-backend-references and xref-backend-apropos.
>
> If you define find-tag-default-function, you don't have to change the
> syntax table too: it might be easier to search around with a regexp.
>
> But for the new backend, you can also define the method
> xref-backend-identifier-at-point, where you would invoke the necessary
> bounds-of-thing logic. Then you won't need a change in
> find-tag-default-function.
>
> Either way, though, the major modes will need to set up
> xref-backend-functions instead (with add-hook). This could also be done
> in a minor mode, which you'd enable in any TeX-related major modes that
> you use.
>
> > 2. Simply eliminate the TeX escape character entirely, both from tag
> > names in a TAGS file and from any thing-at-point in a TeX buffer. I
> > think this would eliminate the need to distinguish among the various
> > xref commands in terms of whether they can or can't handle the escape
> > character. It would also eliminate the need for the new user option in
> > etags.c, as there would no longer be any code to cope with the escape
> > character when finding a (thing-at-point 'symbol). This is slightly
> > less powerful than the default I proposed, but there are probably many
> > use cases where it won't matter at all (though it would for my own,
> > possibly eccentric, use case).
>
> I wanted to ask whether including the backslash is important enough (it
> should not matter too much for disambiguation), but I figured it must
> be, otherwise you wouldn't go to all this effort.
>
> If not, it would simplify things a lot, though.
>

[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Fri, 15 Sep 2023 19:13:02 GMT) Full text and rfc822 format available.

Message #152 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Tassilo Horn <tsdh <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 dfussner <at> googlemail.com, stefankangas <at> gmail.com,
 Dmitry Gutov <dgutov <at> yandex.ru>, larsi <at> gnus.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Fri, 15 Sep 2023 20:45:45 +0200

Eli Zaretskii <eliz <at> gnu.org> writes:

Hi Eli & all, thanks for inviting me to the discussion.  I'm adding
Keita, too, because he's currently the by far most active AUCTeX
developer.

>> Do AUCTeX modes inherit from tex-mode?

Not currently but in Keita's feature/fix-mode-names-overlap branch which
will probably become AUCTeX 14, I guess.

>> Or all call tex-common-initialization? Then you could set that
>> variable locally inside that function once.

Again, not right now but probably in the future.

>> All in all, it might not be wise to modify the behavior of
>> third-party packages from inside Emacs this way (they might have
>> other expectations, or there's going to appear a new major mode that
>> needs the same treatment anyway).
>> 
>> Setting a variable to be used through mode inheritance or delegation
>> is fine, but if that doesn't help, I would probably stop at defining
>> a helper function or two and documenting how it should be used. And
>> then maybe work with AUCTeX people to get the remaining necessary
>> changes in from their side (or just leaving that up to the user,
>> depending on how functional the default config ends up being).

That sounds reasonable.

Bye,
Tassilo

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 16 Sep 2023 05:54:02 GMT) Full text and rfc822 format available.

Message #155 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Ikumi Keita <ikumi <at> ikumi.que.jp>
To: Tassilo Horn <tsdh <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, dfussner <at> googlemail.com,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, Dmitry Gutov <dgutov <at> yandex.ru>,
 larsi <at> gnus.org, Eli Zaretskii <eliz <at> gnu.org>, stefankangas <at> gmail.com
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 16 Sep 2023 14:53:16 +0900

Hi all,

>>>>> Tassilo Horn <tsdh <at> gnu.org> writes:
>>> Do AUCTeX modes inherit from tex-mode?

> Not currently but in Keita's feature/fix-mode-names-overlap branch

Currently, no. In feater/fix-mode-names-overlap branch, the major mode
iheritance relations are:

text-mode      --+-- TeX-mode
                 +-- Texinfo-mode

TeX-mode       --+-- plain-TeX-mode
                 +-- LaTeX-mode
                 +-- ConTeXt-mode

plain-TeX-mode --+-- AmSTeX-mode
                 +-- japanese-plain-TeX-mode

LaTeX-mode     --+-- docTeX-mode
                 +-- japanese-LaTeX-mode

(There are ConTeXt-en-mode and ConTeXt-nl-mode as well, but my current
personal plain is to delete them.)

I don't think it's a good idea to inherit from tex-mode; it isn't
diffcult to change the "top" mode from text-mode with tex-mode, but in
that case LaTeX-mode can't have both built-in latex-mode and TeX-mode as
its parent mode.

(Maybe an exception is Texinfo-mode. It would make sense to have
built-in texinfo-mode as parent of Texinfo-mode. If there is a good
reason to do so, I won't object strongly.)

> which will probably become AUCTeX 14, I guess.

I hope so. :-)

>>> Or all call tex-common-initialization? Then you could set that
>>> variable locally inside that function once.

> Again, not right now but probably in the future.

Currently, they don't call tex-common-initialization, but we can do so
in TeX-mode. (But I haven't consider its pros and cons deeply yet.)

Best regards,
Ikumi Keita
#StandWithUkraine #StopWarInUkraine

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sun, 17 Sep 2023 08:51:02 GMT) Full text and rfc822 format available.

Message #158 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Ikumi Keita <ikumi <at> ikumi.que.jp>
Cc: 53749 <at> debbugs.gnu.org, Tassilo Horn <tsdh <at> gnu.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, Dmitry Gutov <dgutov <at> yandex.ru>,
 larsi <at> gnus.org, Eli Zaretskii <eliz <at> gnu.org>, stefankangas <at> gmail.com
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sun, 17 Sep 2023 09:49:46 +0100

Hi Tassilo and Keita,

Thanks for the clarifications. If you look at the current patch to
tex-mode.el, there's one function call added to TeX-mode-hook, mainly
for my own testing purposes, but no matter what the final patch looks
like it should only similarly require a single function call in an
AUCTeX hook to activate the new xref code there, along with one in
tex-common-initialization for the in-tree modes. If and when all
parties are satisfied by the patch I'll certainly be in touch with you
to find out how you'd prefer to handle activating it (or not) in
AUCTeX. The current state of affairs is a convenience for me and for
anyone else who cares to test the code.

Thanks again,

David.

On Sat, 16 Sept 2023 at 06:53, Ikumi Keita <ikumi <at> ikumi.que.jp> wrote:
>
> Hi all,
>
> >>>>> Tassilo Horn <tsdh <at> gnu.org> writes:
> >>> Do AUCTeX modes inherit from tex-mode?
>
> > Not currently but in Keita's feature/fix-mode-names-overlap branch
>
> Currently, no. In feater/fix-mode-names-overlap branch, the major mode
> iheritance relations are:
>
> text-mode      --+-- TeX-mode
>                  +-- Texinfo-mode
>
> TeX-mode       --+-- plain-TeX-mode
>                  +-- LaTeX-mode
>                  +-- ConTeXt-mode
>
> plain-TeX-mode --+-- AmSTeX-mode
>                  +-- japanese-plain-TeX-mode
>
> LaTeX-mode     --+-- docTeX-mode
>                  +-- japanese-LaTeX-mode
>
> (There are ConTeXt-en-mode and ConTeXt-nl-mode as well, but my current
> personal plain is to delete them.)
>
> I don't think it's a good idea to inherit from tex-mode; it isn't
> diffcult to change the "top" mode from text-mode with tex-mode, but in
> that case LaTeX-mode can't have both built-in latex-mode and TeX-mode as
> its parent mode.
>
> (Maybe an exception is Texinfo-mode. It would make sense to have
> built-in texinfo-mode as parent of Texinfo-mode. If there is a good
> reason to do so, I won't object strongly.)
>
> > which will probably become AUCTeX 14, I guess.
>
> I hope so. :-)
>
> >>> Or all call tex-common-initialization? Then you could set that
> >>> variable locally inside that function once.
>
> > Again, not right now but probably in the future.
>
> Currently, they don't call tex-common-initialization, but we can do so
> in TeX-mode. (But I haven't consider its pros and cons deeply yet.)
>
> Best regards,
> Ikumi Keita
> #StandWithUkraine #StopWarInUkraine

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 22 Apr 2024 13:07:02 GMT) Full text and rfc822 format available.

Message #161 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Arash Esbati <arash <at> gnu.org>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 Tassilo Horn <tsdh <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 stefankangas <at> gmail.com
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 22 Apr 2024 15:06:06 +0200

David Fussner <dfussner <at> googlemail.com> writes:

> Thanks for the clarifications. If you look at the current patch to
> tex-mode.el, there's one function call added to TeX-mode-hook, mainly
> for my own testing purposes, but no matter what the final patch looks
> like it should only similarly require a single function call in an
> AUCTeX hook to activate the new xref code there, along with one in
> tex-common-initialization for the in-tree modes. If and when all
> parties are satisfied by the patch I'll certainly be in touch with you
> to find out how you'd prefer to handle activating it (or not) in
> AUCTeX. The current state of affairs is a convenience for me and for
> anyone else who cares to test the code.

Hi David,

I justed wanted to come back on this report and ask if there is any
progress?  It would be nice to get Xref working within TeX buffers.

TIA.  Best, Arash

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 22 Apr 2024 14:57:03 GMT) Full text and rfc822 format available.

Message #164 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Arash Esbati <arash <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 Tassilo Horn <tsdh <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 stefankangas <at> gmail.com
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 22 Apr 2024 15:56:34 +0100

Hi Arash,

Thanks for the nudge. I am in fact in the final stages of preparing a
new patch to get xref working in TeX buffers. As usual, the main
complexities are in xref-find-references, and while I have you here I
wonder whether I could ask your thoughts about addressing one part of
this complexity.

The semantic/symref backend used by xref-find-references greps in
files matching the major-mode of the buffer where the user calls the
command. It looks in semantic-symref-filepattern-alist for
file-extensions matching the major-mode, and if that fails it looks in
auto-mode-alist. When both fail to produce any file extensions it
tells the user to customize semantic-symref-filepattern-alist. Also,
if it finds things in s-s-f-a, it doesn't go on to auto-mode-alist, so
s-s-f-a has to be complete in order to be useful. In effect, we need
s-s-f-a to hold all the extensions for all the modes that can appear
as values of major-mode, and I notice that AUCTeX has started to
populate that alist, though incompletely. I'm also aware that many
packages add their own extensions to files which are basically TeX or
LaTeX files, and I wonder whether we can really keep up with the whole
of CTAN in terms of providing complete lists of extensions for
s-s-f-a.

As an example of where we are, if you open a plain-tex-mode (or
plain-TeX-mode) file and M-? with point on some standard word you'll
currently get the message to customize s-s-f-a, because
auto-mode-alist has only tex-mode and s-s-f-a doesn't cover them,
either.

I ask you Arash, therefore, as an AUCTeX and emacs developer, and I
ask any other developers also, whether you'd prefer me just to put
together as complete a list as possible for addition to s-s-f-a --
with patches for AUCTeX for all the new modes there -- or, and this is
what I'm finishing up now, whether you'd consider it overkill to have
code that constructs (or modifies) entries in s-s-f-a by searching in
auto-mode-alist and in the buffer-list for all the file extensions
emacs knows about that relate to the current major-mode. Changes in
s-s-f-a wouldn't be persistent across sessions, but they would allow a
user to open a file with any file extension, run latex-mode, and M-?
would work in that buffer, and search that buffer from another buffer
with a related major-mode, all without needing any user intervention.
It would also allow customizations in auto-mode-alist to appear in
s-s-f-a automatically, which seems convenient to me.

If your answer is "show me the code", I'll do that shortly, but I
wondered whether anyone had any preliminary thoughts on the matter.

Best, and sorry for the long question,

David.

On Mon, 22 Apr 2024 at 14:06, Arash Esbati <arash <at> gnu.org> wrote:
>
> David Fussner <dfussner <at> googlemail.com> writes:
>
> > Thanks for the clarifications. If you look at the current patch to
> > tex-mode.el, there's one function call added to TeX-mode-hook, mainly
> > for my own testing purposes, but no matter what the final patch looks
> > like it should only similarly require a single function call in an
> > AUCTeX hook to activate the new xref code there, along with one in
> > tex-common-initialization for the in-tree modes. If and when all
> > parties are satisfied by the patch I'll certainly be in touch with you
> > to find out how you'd prefer to handle activating it (or not) in
> > AUCTeX. The current state of affairs is a convenience for me and for
> > anyone else who cares to test the code.
>
> Hi David,
>
> I justed wanted to come back on this report and ask if there is any
> progress?  It would be nice to get Xref working within TeX buffers.
>
> TIA.  Best, Arash

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 22 Apr 2024 16:16:02 GMT) Full text and rfc822 format available.

Message #167 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, Arash Esbati <arash <at> gnu.org>,
 stefankangas <at> gmail.com, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 22 Apr 2024 12:15:09 -0400

> auto-mode-alist. When both fail to produce any file extensions it
> tells the user to customize semantic-symref-filepattern-alist.

Yes, this is not ideal.

I think ideally we'd build a regexp from `auto-mode-alist` and
`major-mode-remap-alist/defaults`, tho it may require additional info.

E.g. we may need to complement that with additional "related modes"
(e.g. html modes may want to mention `php-mode` as "related").


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 22 Apr 2024 16:38:03 GMT) Full text and rfc822 format available.

Message #170 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, Arash Esbati <arash <at> gnu.org>,
 stefankangas <at> gmail.com, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 22 Apr 2024 17:37:18 +0100

Thank you, Stefan -- I didn't know about
major-mode-remap-alist/defaults. Do you think TeX and friends are
handled by emacs distinctively enough to warrant keeping some
specialist extension-handling code in tex-mode.el, or do you think
some changes should be more generally available, in grep.el, say? (I'm
wondering whether it might be useful, for example, for
semantic-symref-derive-find-filepatterns to add extensions from
auto-mode-alist even when some extensions are found in
semantic-symref-filepattern-alist.)

David.

On Mon, 22 Apr 2024 at 17:15, Stefan Monnier <monnier <at> iro.umontreal.ca> wrote:
>
> > auto-mode-alist. When both fail to produce any file extensions it
> > tells the user to customize semantic-symref-filepattern-alist.
>
> Yes, this is not ideal.
>
> I think ideally we'd build a regexp from `auto-mode-alist` and
> `major-mode-remap-alist/defaults`, tho it may require additional info.
>
> E.g. we may need to complement that with additional "related modes"
> (e.g. html modes may want to mention `php-mode` as "related").
>
>
>         Stefan
>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 22 Apr 2024 17:17:02 GMT) Full text and rfc822 format available.

Message #173 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, Arash Esbati <arash <at> gnu.org>,
 stefankangas <at> gmail.com, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 22 Apr 2024 13:16:28 -0400

> (I'm wondering whether it might be useful, for example, for
> semantic-symref-derive-find-filepatterns to add extensions from
> auto-mode-alist even when some extensions are found in
> semantic-symref-filepattern-alist.)

Assuming we can get good enough results from `auto-mode-alist and
friends, I think we'd want to mark `semantic-symref-filepattern-alist`
as obsolete.
But before that, we need to check the assumption.

In the short term, for AUCTeX the only workable option seems to be to
add entries to `semantic-symref-filepattern-alist`.


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 22 Apr 2024 17:27:01 GMT) Full text and rfc822 format available.

Message #176 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, Arash Esbati <arash <at> gnu.org>,
 Stefan Kangas <stefankangas <at> gmail.com>, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 22 Apr 2024 18:25:50 +0100

[Message part 1 (text/plain, inline)]

Thank you. I hope one or two others might join in, but I'll have some code
to look over in a few days, in any case.

David.

On Mon, 22 Apr 2024, 18:16 Stefan Monnier, <monnier <at> iro.umontreal.ca> wrote:

> > (I'm wondering whether it might be useful, for example, for
> > semantic-symref-derive-find-filepatterns to add extensions from
> > auto-mode-alist even when some extensions are found in
> > semantic-symref-filepattern-alist.)
>
> Assuming we can get good enough results from `auto-mode-alist and
> friends, I think we'd want to mark `semantic-symref-filepattern-alist`
> as obsolete.
> But before that, we need to check the assumption.
>
> In the short term, for AUCTeX the only workable option seems to be to
> add entries to `semantic-symref-filepattern-alist`.
>
>
>         Stefan
>
>

[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Tue, 23 Apr 2024 12:06:08 GMT) Full text and rfc822 format available.

Message #179 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Arash Esbati <arash <at> gnu.org>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 Tassilo Horn <tsdh <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 stefankangas <at> gmail.com
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Tue, 23 Apr 2024 14:04:39 +0200

David Fussner <dfussner <at> googlemail.com> writes:

> Thanks for the nudge. I am in fact in the final stages of preparing a
> new patch to get xref working in TeX buffers.

Thanks for the update.

> The semantic/symref backend used by xref-find-references greps in
> files matching the major-mode of the buffer where the user calls the
> command. It looks in semantic-symref-filepattern-alist for
> file-extensions matching the major-mode, and if that fails it looks in
> auto-mode-alist. When both fail to produce any file extensions it
> tells the user to customize semantic-symref-filepattern-alist. Also,
> if it finds things in s-s-f-a, it doesn't go on to auto-mode-alist, so
> s-s-f-a has to be complete in order to be useful. In effect, we need
> s-s-f-a to hold all the extensions for all the modes that can appear
> as values of major-mode, and I notice that AUCTeX has started to
> populate that alist, though incompletely.

I'm not familiar with the way xref works, but reading the above, xref
doesn't care about modes set per file variables, is this correct?

> I'm also aware that many packages add their own extensions to files
> which are basically TeX or LaTeX files, and I wonder whether we can
> really keep up with the whole of CTAN in terms of providing complete
> lists of extensions for s-s-f-a.

I think this is almost impossible.  Besides the effort, take for example
the .cnf extension which is used by other programs as well, so
associating it with LaTeX-mode wouldn't make sense, IMO.  Finally, I
think many packages are written in .dtx format and the ones with many
files with different extensions (.def, .enc, .fd, ...) usually extract
them from the .dtx via an .ins file, so the edited source is inside the
.dtx, and we don't need to care about these extensions.

> As an example of where we are, if you open a plain-tex-mode (or
> plain-TeX-mode) file and M-? with point on some standard word you'll
> currently get the message to customize s-s-f-a, because
> auto-mode-alist has only tex-mode and s-s-f-a doesn't cover them,
> either.

This is possibly the next mess since .tex can be plain-TeX, ConTeXt,
LaTeX ...  So in general, I second what Stefan M. wrote in his other
message, but respecting/using file local variables could help here (if
it doesn't work ATM, see above), e.g.:

--8<---------------cut here---------------start------------->8---
\beginsection 1. Introduction.
This is the start of the introduction.
\bye

%%% Local Variables:
%%% mode: plain-TeX
%%% TeX-master: t
%%% End:
--8<---------------cut here---------------end--------------->8---

HTH.  Best, Arash

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Tue, 23 Apr 2024 13:23:11 GMT) Full text and rfc822 format available.

Message #182 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Arash Esbati <arash <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 Tassilo Horn <tsdh <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 stefankangas <at> gmail.com
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Tue, 23 Apr 2024 14:21:52 +0100

Thanks for the reply, Arash.

> I'm not familiar with the way xref works, but reading the above, xref
> doesn't care about modes set per file variables, is this correct?

As far as I know, the default xref-find-references deals strictly in
file extensions.

> I think this is almost impossible.  Besides the effort, take for example
> the .cnf extension which is used by other programs as well, so
> associating it with LaTeX-mode wouldn't make sense, IMO.

Agreed -- this may be an argument against my current approach. I hope,
however, that the way xref-find-references searches by directory or by
project should limit spurious searching when a more common extension
appears on a TeX file.

> This is possibly the next mess since .tex can be plain-TeX, ConTeXt,
> LaTeX ...

I guess currently I'm thinking that this is sort of a feature, as
searching for symbols in files/buffers from many closely-related modes
may produce useful matches. The code I'm finishing up tends to search
more files rather than fewer, but it should be possible to prune this
if it's deemed too messy.

> So in general, I second what Stefan M. wrote in his other
> message, but respecting/using file local variables could help here.

Currently, the code takes into account file-local variables only by
including in the search list extensions of TeX-related buffers, which
may well only have become TeX-related due to such variables.

I'll post a patch as soon as I solve an outstanding issue or two, and
we'll see where we are.

Thank you indeed for your help,

David.

On Tue, 23 Apr 2024 at 13:04, Arash Esbati <arash <at> gnu.org> wrote:
>
> David Fussner <dfussner <at> googlemail.com> writes:
>
> > Thanks for the nudge. I am in fact in the final stages of preparing a
> > new patch to get xref working in TeX buffers.
>
> Thanks for the update.
>
> > The semantic/symref backend used by xref-find-references greps in
> > files matching the major-mode of the buffer where the user calls the
> > command. It looks in semantic-symref-filepattern-alist for
> > file-extensions matching the major-mode, and if that fails it looks in
> > auto-mode-alist. When both fail to produce any file extensions it
> > tells the user to customize semantic-symref-filepattern-alist. Also,
> > if it finds things in s-s-f-a, it doesn't go on to auto-mode-alist, so
> > s-s-f-a has to be complete in order to be useful. In effect, we need
> > s-s-f-a to hold all the extensions for all the modes that can appear
> > as values of major-mode, and I notice that AUCTeX has started to
> > populate that alist, though incompletely.
>
> I'm not familiar with the way xref works, but reading the above, xref
> doesn't care about modes set per file variables, is this correct?
>
> > I'm also aware that many packages add their own extensions to files
> > which are basically TeX or LaTeX files, and I wonder whether we can
> > really keep up with the whole of CTAN in terms of providing complete
> > lists of extensions for s-s-f-a.
>
> I think this is almost impossible.  Besides the effort, take for example
> the .cnf extension which is used by other programs as well, so
> associating it with LaTeX-mode wouldn't make sense, IMO.  Finally, I
> think many packages are written in .dtx format and the ones with many
> files with different extensions (.def, .enc, .fd, ...) usually extract
> them from the .dtx via an .ins file, so the edited source is inside the
> .dtx, and we don't need to care about these extensions.
>
> > As an example of where we are, if you open a plain-tex-mode (or
> > plain-TeX-mode) file and M-? with point on some standard word you'll
> > currently get the message to customize s-s-f-a, because
> > auto-mode-alist has only tex-mode and s-s-f-a doesn't cover them,
> > either.
>
> This is possibly the next mess since .tex can be plain-TeX, ConTeXt,
> LaTeX ...  So in general, I second what Stefan M. wrote in his other
> message, but respecting/using file local variables could help here (if
> it doesn't work ATM, see above), e.g.:
>
> --8<---------------cut here---------------start------------->8---
> \beginsection 1. Introduction.
> This is the start of the introduction.
> \bye
>
> %%% Local Variables:
> %%% mode: plain-TeX
> %%% TeX-master: t
> %%% End:
> --8<---------------cut here---------------end--------------->8---
>
> HTH.  Best, Arash

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Wed, 24 Apr 2024 00:10:04 GMT) Full text and rfc822 format available.

Message #185 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>,
 David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Arash Esbati <arash <at> gnu.org>, stefankangas <at> gmail.com,
 Tassilo Horn <tsdh <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Wed, 24 Apr 2024 03:09:03 +0300

On 22/04/2024 20:16, Stefan Monnier via Bug reports for GNU Emacs, the 
Swiss army knife of text editors wrote:
>> (I'm wondering whether it might be useful, for example, for
>> semantic-symref-derive-find-filepatterns to add extensions from
>> auto-mode-alist even when some extensions are found in
>> semantic-symref-filepattern-alist.)
> Assuming we can get good enough results from `auto-mode-alist and
> friends, I think we'd want to mark `semantic-symref-filepattern-alist`
> as obsolete.
> But before that, we need to check the assumption.

Last I checked, semantic-symref-filepattern-alist had explicit entries 
only for languages whose auto-mode-alist entries were deemed too complex 
to parse out the matching extensions from the corresponding regexps.

Or had other difficulties like the c-or-c++-mode dispatcher.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Wed, 24 Apr 2024 09:03:08 GMT) Full text and rfc822 format available.

Message #188 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Arash Esbati <arash <at> gnu.org>, stefankangas <at> gmail.com,
 Tassilo Horn <tsdh <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Wed, 24 Apr 2024 10:02:29 +0100

Thanks, Dmitry.

> Last I checked, semantic-symref-filepattern-alist had explicit entries
> only for languages whose auto-mode-alist entries were deemed too complex
> to parse out the matching extensions from the corresponding regexps.
>
> Or had other difficulties like the c-or-c++-mode dispatcher.

That makes sense, and clarifies a few things for me. I guess TeX has
the "plain-tex or latex or context or ams-tex" dispatcher and also
in-tree vs. AUCTeX mode names, both of which at least for the moment
make semantic-symref-filepattern-alist seem a better fit.

Best,

David.

On Wed, 24 Apr 2024 at 01:09, Dmitry Gutov <dgutov <at> yandex.ru> wrote:
>
> On 22/04/2024 20:16, Stefan Monnier via Bug reports for GNU Emacs, the
> Swiss army knife of text editors wrote:
> >> (I'm wondering whether it might be useful, for example, for
> >> semantic-symref-derive-find-filepatterns to add extensions from
> >> auto-mode-alist even when some extensions are found in
> >> semantic-symref-filepattern-alist.)
> > Assuming we can get good enough results from `auto-mode-alist and
> > friends, I think we'd want to mark `semantic-symref-filepattern-alist`
> > as obsolete.
> > But before that, we need to check the assumption.
>
> Last I checked, semantic-symref-filepattern-alist had explicit entries
> only for languages whose auto-mode-alist entries were deemed too complex
> to parse out the matching extensions from the corresponding regexps.
>
> Or had other difficulties like the c-or-c++-mode dispatcher.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 29 Apr 2024 14:17:02 GMT) Full text and rfc822 format available.

Message #191 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Arash Esbati <arash <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 Tassilo Horn <tsdh <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 stefankangas <at> gmail.com
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 29 Apr 2024 15:15:41 +0100

[Message part 1 (text/plain, inline)]

Hi Dmitry and Arash,

Here's my third attempt at a working xref backend for TeX. I'll try
quickly to summarize what's in it:

1. I've modified etags so that it creates findable tags for as many
different sorts of TeX construct as possible, including those written
in the new expl3 syntax. I've now removed the escape character from
the tag names, as this simplifies code all around.

2. 4 of the 6 xref backend functions just call the etags backend.

3. xref-backend-identifier-at-point is modified to provide new regexps
for delineating TeX symbols, and there's also code to cope with expl3
constructs slightly differently in M-? than in the other two main xref
commands.

4. xref-backend-references is a wrapper for the standard backend, the
wrapper doing two things: first, it tries to accumulate as many file
extensions for the current major-mode as emacs knows about, and second
it creates a bespoke syntax-propertize-function for strings that
aren't entirely composed of symbol or word characters. It applies this
function to file-visiting buffers and lets xref apply it in the
*xref-temp buffer, though I had to add a one-liner in xref.el to fix
what I believe is a minor bug there preventing syntax-propertize from
doing its work when the temp buffer holds text from a new file. (I can
provide a recipe for this if you want.)

5. Slightly unrelatedly, I've added new syntax-propertize-rules to
latex-mode so that expl3 constructs with the underscore aren't
fontified as subscripts, which makes such code unreadable. I'm happy
to split this off as another patch.

All comments gratefully received, and thanks,

David.

On Mon, 22 Apr 2024 at 14:06, Arash Esbati <arash <at> gnu.org> wrote:
>
> David Fussner <dfussner <at> googlemail.com> writes:
>
> > Thanks for the clarifications. If you look at the current patch to
> > tex-mode.el, there's one function call added to TeX-mode-hook, mainly
> > for my own testing purposes, but no matter what the final patch looks
> > like it should only similarly require a single function call in an
> > AUCTeX hook to activate the new xref code there, along with one in
> > tex-common-initialization for the in-tree modes. If and when all
> > parties are satisfied by the patch I'll certainly be in touch with you
> > to find out how you'd prefer to handle activating it (or not) in
> > AUCTeX. The current state of affairs is a convenience for me and for
> > anyone else who cares to test the code.
>
> Hi David,
>
> I justed wanted to come back on this report and ask if there is any
> progress?  It would be nice to get Xref working within TeX buffers.
>
> TIA.  Best, Arash

[0001-Provide-a-modified-xref-backend-for-TeX-buffers.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 02 May 2024 00:44:02 GMT) Full text and rfc822 format available.

Message #194 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: David Fussner <dfussner <at> googlemail.com>, Arash Esbati <arash <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 stefankangas <at> gmail.com, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 2 May 2024 03:43:12 +0300

On 29/04/2024 17:15, David Fussner via Bug reports for GNU Emacs, the 
Swiss army knife of text editors wrote:
> though I had to add a one-liner in xref.el to fix
> what I believe is a minor bug there preventing syntax-propertize from
> doing its work when the temp buffer holds text from a new file. (I can
> provide a recipe for this if you want.)

Yes, could you please expand on it separately?

The rest of the patch description just makes sense to me, and I'd be 
happy to leave (or not) the detailed review to whoever reviews TeX 
contributions around here, but this is something I'll need to pay 
special attention to.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 02 May 2024 06:49:01 GMT) Full text and rfc822 format available.

Message #197 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Arash Esbati <arash <at> gnu.org>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 Tassilo Horn <tsdh <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 stefankangas <at> gmail.com
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 02 May 2024 08:47:29 +0200

David Fussner <dfussner <at> googlemail.com> writes:

> Here's my third attempt at a working xref backend for TeX. I'll try
> quickly to summarize what's in it:
>
> 1. I've modified etags so that it creates findable tags for as many
> different sorts of TeX construct as possible, including those written
> in the new expl3 syntax. I've now removed the escape character from
> the tag names, as this simplifies code all around.

Hi David,

Thanks.  I trust your code works, so I have 2 minor comments.

> 5. Slightly unrelatedly, I've added new syntax-propertize-rules to
> latex-mode so that expl3 constructs with the underscore aren't
> fontified as subscripts, which makes such code unreadable. I'm happy
> to split this off as another patch.

I think this makes sense.  AFAIK, Stefan M. looks after tex-mode.el, so
he can the review it.

> @@ -5736,11 +5752,25 @@ Scheme_functions (FILE *inf)
>  static linebuffer *TEX_toktab = NULL; /* Table with tag tokens */
>  
>  /* Default set of control sequences to put into TEX_toktab.
> -   The value of environment var TEXTAGS is prepended to this.  */
> +   The value of environment var TEXTAGS is prepended to this.
> +   (2024) Add variants of '\def', some additional LaTeX (and
> +   former xparse) commands, common variants from the
> +   'etoolbox' package, and the main expl3 commands. */
>  static const char *TEX_defenv = "\
> -:chapter:section:subsection:subsubsection:eqno:label:ref:cite:bibitem\
> -:part:appendix:entry:index:def\
> -:newcommand:renewcommand:newenvironment:renewenvironment";
> +:label:ref:chapter:section:subsection:subsubsection:eqno:cite:bibitem\

I suggest to add 'Ref' and 'footref' as well which are part of LaTeX
kernel.

> +(defvar tex-esc-and-group-chars '(?\\ ?{ ?})

(defvar tex-esc-and-group-chars '(?\\ ?\{ ?\})

> +  "The current TeX escape and grouping characters.

Best, Arash

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 02 May 2024 13:33:02 GMT) Full text and rfc822 format available.

Message #200 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Arash Esbati <arash <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 Tassilo Horn <tsdh <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 stefankangas <at> gmail.com
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 2 May 2024 14:32:37 +0100

[Message part 1 (text/plain, inline)]

Hi Dmitry,

Thanks for looking over the patch. Here's the recipe for the purported
bug in xref.el:

1. Please apply my patch to tex-mode.el (and xref.el).

2. I've attached xref-bug.zip, which contains a directory with 4
identical LaTeX files and one LaTeX file with a single additional
character. Please extract it.

3. emacs -Q

4. C-x C-f xref-bug/mwea.ltx, and please don't visit the other 4
files.

5. Put point on \__hook_debug:n in line 6.

6. M-?, RTN, ... RTN, RTN.

The xref buffer should offer 5 hits, one from each file in the
directory.

7. Comment out the the line I added to xref--collect-matches,
byte-compile and load the file.

8. With point in the same place, M-?, RTN, ... RTN, RTN.

The xref buffer should offer 3 hits. The first is from the
file-visiting buffer (where I also set syntax-propertize--done to 0,
because in my testing there could be some issues here, too). The
second hit is from the first file opened in *xref-temp. Here,
syntax-propertize runs to line-end, and all is well. The next two
files are missed, because syntax-propertize--done is set to line-end
and they have exactly the same line length as file two, and therefore
syntax-propertize thinks that's good enough and doesn't actually
change anything. The fifth file has an additional character in line 6,
so syntax-propertize decides it needs to work on this line because
line-end > syntax-propertize--done.

You can put point on, say, \documentclass, and you'll get all 5 hits,
because this string doesn't begin or end with a non-word, non-symbol
character, and syntax-propertize doesn't need to run. You can make the
search string "\documentclass" and you'll get 2 hits, as line 1 has
the same length in all 5 files. (It's worth trying "\usepackage" as
the search string, too.)

That's my diagnosis anyway. Does it make sense?

Thanks,

David.

On Thu, 2 May 2024 at 01:43, Dmitry Gutov <dgutov <at> yandex.ru> wrote:
>
> On 29/04/2024 17:15, David Fussner via Bug reports for GNU Emacs, the
> Swiss army knife of text editors wrote:
> > though I had to add a one-liner in xref.el to fix
> > what I believe is a minor bug there preventing syntax-propertize from
> > doing its work when the temp buffer holds text from a new file. (I can
> > provide a recipe for this if you want.)
>
> Yes, could you please expand on it separately?
>
> The rest of the patch description just makes sense to me, and I'd be
> happy to leave (or not) the detailed review to whoever reviews TeX
> contributions around here, but this is something I'll need to pay
> special attention to.

[xref-bug.zip (application/zip, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 02 May 2024 13:36:01 GMT) Full text and rfc822 format available.

Message #203 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Arash Esbati <arash <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 Tassilo Horn <tsdh <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 stefankangas <at> gmail.com
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 2 May 2024 14:34:45 +0100

Thanks for the review, Arash, and I'll make those changes.

Best, David.

On Thu, 2 May 2024 at 07:47, Arash Esbati <arash <at> gnu.org> wrote:
>
> David Fussner <dfussner <at> googlemail.com> writes:
>
> > Here's my third attempt at a working xref backend for TeX. I'll try
> > quickly to summarize what's in it:
> >
> > 1. I've modified etags so that it creates findable tags for as many
> > different sorts of TeX construct as possible, including those written
> > in the new expl3 syntax. I've now removed the escape character from
> > the tag names, as this simplifies code all around.
>
> Hi David,
>
> Thanks.  I trust your code works, so I have 2 minor comments.
>
> > 5. Slightly unrelatedly, I've added new syntax-propertize-rules to
> > latex-mode so that expl3 constructs with the underscore aren't
> > fontified as subscripts, which makes such code unreadable. I'm happy
> > to split this off as another patch.
>
> I think this makes sense.  AFAIK, Stefan M. looks after tex-mode.el, so
> he can the review it.
>
> > @@ -5736,11 +5752,25 @@ Scheme_functions (FILE *inf)
> >  static linebuffer *TEX_toktab = NULL; /* Table with tag tokens */
> >
> >  /* Default set of control sequences to put into TEX_toktab.
> > -   The value of environment var TEXTAGS is prepended to this.  */
> > +   The value of environment var TEXTAGS is prepended to this.
> > +   (2024) Add variants of '\def', some additional LaTeX (and
> > +   former xparse) commands, common variants from the
> > +   'etoolbox' package, and the main expl3 commands. */
> >  static const char *TEX_defenv = "\
> > -:chapter:section:subsection:subsubsection:eqno:label:ref:cite:bibitem\
> > -:part:appendix:entry:index:def\
> > -:newcommand:renewcommand:newenvironment:renewenvironment";
> > +:label:ref:chapter:section:subsection:subsubsection:eqno:cite:bibitem\
>
> I suggest to add 'Ref' and 'footref' as well which are part of LaTeX
> kernel.
>
> > +(defvar tex-esc-and-group-chars '(?\\ ?{ ?})
>
> (defvar tex-esc-and-group-chars '(?\\ ?\{ ?\})
>
> > +  "The current TeX escape and grouping characters.
>
> Best, Arash

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Fri, 03 May 2024 13:44:01 GMT) Full text and rfc822 format available.

Message #206 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, Arash Esbati <arash <at> gnu.org>,
 stefankangas <at> gmail.com, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Fri, 03 May 2024 09:42:42 -0400

> Thanks for looking over the patch. Here's the recipe for the purported
> bug in xref.el:

The problem stems from xref.el's constant abuse of
`inhibit-modification-hooks`.  Binding this var to t should be done only
in exceptional circumstances and should ideally be accompanied by a
comment explaining why it's necessary.


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Fri, 03 May 2024 14:12:01 GMT) Full text and rfc822 format available.

Message #209 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Tassilo Horn <tsdh <at> gnu.org>, Arash Esbati <arash <at> gnu.org>,
 stefankangas <at> gmail.com, Dmitry Gutov <dgutov <at> yandex.ru>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Fri, 03 May 2024 10:10:58 -0400

Hi,

Apparently I'm the `tex-mode.el` guy, so I tried to take a look.

> diff --git a/lisp/textmodes/tex-mode.el b/lisp/textmodes/tex-mode.el
> index 97c950267c6..d990a2dbfa9 100644
> --- a/lisp/textmodes/tex-mode.el
> +++ b/lisp/textmodes/tex-mode.el
> @@ -695,7 +696,25 @@ tex-verbatim-environments
>       ("\\\\\\(?:end\\|begin\\) *\\({[^\n{}]*}\\)"
>        (1 (ignore
>            (tex-env-mark (match-beginning 0)
> -                        (match-beginning 1) (match-end 1))))))))
> +                        (match-beginning 1) (match-end 1)))))
> +     ;; The next two rules change the syntax of `:' and `_' in expl3
> +     ;; constructs, so that `tex-font-lock-suscript' can fontify them
> +     ;; more accurately.
> +     ((concat "\\(\\(?:[\\\\[:space:]{]_\\|"
> +              "[\\\\{[:space:]][^][_[:space:][:cntrl:][:digit:]\\\\{}()/=]+\\)"
> +              "\\(?:_+\\(?:[^][[:space:][:cntrl:][:digit:]:\\\\{}()/#_=]+\\|"
> +              "#+[1-9]\\)\\)+\\)\\([:_]?\\)")

Can you add in the comment some URL pointing to some relevant expl3
documentation which "explains" why the above regexp makes sense?
Also I don't clearly see how the above regexp distinguishes expl3 code
from "normal" LaTeX code, so the comment should say something about it.

Side note: I'd avoid [:space:] whose exact meaning is rarely quite what
we need.
Side note: backslash doesn't need to be backslashed in [...].

> +      (1 (ignore
> +          (let* ((expr (buffer-substring-no-properties (match-beginning 1)
> +                                                       (match-end 1)))
> +                 (list (seq-positions expr ?_)))
> +            (dolist (pos list)
> +              (put-text-property (+ pos (match-beginning 1))
> +                                 (1+ (+ pos (match-beginning 1)))
> +                                 'syntax-table (string-to-syntax "_"))))))
> +      (2 "_"))
> +     ("\\\\[[:alpha:]]+\\(:\\)[[:alpha:][:space:]\n]"
> +      (1 "_")))))

Currently we "skip" inappropriate underscores via
`tex-font-lock-match-suscript` and/or by adding a particular `face` text
property rather than via `syntax-table/propertize`.

For algorithmic reasons, it's better to minimize the work done in
`syntax-propertize-function` as much as possible (font-lock is more lazy
than `syntax-propertize`), so I recommend you try and moving the above
to font-lock rules.

> +(defvar tex-esc-and-group-chars '(?\\ ?{ ?})
> +  "The current TeX escape and grouping characters.

I recommend you backslash escape the { and } above (although it's not
indispensable, `emacs-lisp-mode` will parse the code better).
More importantly, the docstring doesn't explain what this list
means/does.  E.g. does the order matter?  Can it be longer than 3 elements?

From the current docstring I can't guess what would be the consequence
of adding/removing elements to/from this list.

> +;; Populate `semantic-symref-filepattern-alist' for the in-tree modes;
> +;; AUCTeX is doing the same for its modes.
> +(defvar semantic-symref-filepattern-alist)
> +(with-eval-after-load 'semantic/symref/grep
> +  (push '(latex-mode "*.[tT]e[xX]" "*.ltx" "*.sty" "*.cl[so]"
> +                     "*.bbl" "*.drv" "*.hva")
> +        semantic-symref-filepattern-alist)
> +  (push '(plain-tex-mode "*.[tT]e[xX]" "*.ins")
> +        semantic-symref-filepattern-alist)
> +  (push '(doctex-mode "*.dtx") semantic-symref-filepattern-alist))

We know `semantic-symref-filepattern-alist` will exist when
`semantic/symref/grep` is loaded, but not before, so I'd put the
`defvar` inside the `with-eval-after-load`.

> +;; Setup AUCTeX modes (for testing purposes only).
> +
> +(add-hook 'TeX-mode-hook #'tex-set-auctex-xref-backend)
> +
> +(defun tex-set-auctex-xref-backend ()
> +  (add-hook 'xref-backend-functions #'tex--xref-backend nil t))

I assume this will be sent to AUCTeX and is not meant to be in
`tex-mode.el`, right?

> +;; `xref-find-references' currently may need this when called from a
> +;; latex-mode buffer in order to search files or buffers with a .tex
> +;; suffix (including the buffer from which it has been called).  We
> +;; append it to `auto-mode-alist' so as not to interfere with the usual
> +;; mode-setting apparatus.  Changes here and in AUCTeX should soon
> +;; render it unnecessary.
> +(add-to-list 'auto-mode-alist '("\\.[tT]e[xX]\\'" . latex-mode) t)

Maybe I have not followed the whole discussion closely enough, but at
least to me the above "soon" is very unclear.
I'll assume that this code will be removed before we install the patch.
If not, please explain in the comment why this specific hack is needed
and how it works.

> +(cl-defmethod xref-backend-references ((_backend (eql 'tex-etags)) identifier)
> +  "Find references of IDENTIFIER in TeX buffers and files."
> +  (require 'semantic/symref/grep)
> +  (let (bufs texbufs
> +             (mode major-mode))
> +    (dolist (buf (buffer-list))
> +      (if (eq (buffer-local-value 'major-mode buf) mode)
> +          (push buf bufs)
> +        (when (string-match-p ".*\\.[tT]e[xX]" (buffer-name buf))
> +          (push buf texbufs))))
> +    (unless (seq-set-equal-p tex--buffers-list bufs)
> +      (let* ((amalist (tex--collect-file-extensions))
> +	     (extlist (alist-get mode semantic-symref-filepattern-alist))
> +	     (extlist-new (seq-uniq
> +                           (seq-union amalist extlist #'string-match-p))))

After sinking the `defvar` above, you'll need to add a new `defvar` for
`semantic-symref-filepattern-alist` just after the `require`.

> +                (setq-local syntax-propertize-function
> +                            (eval
> +                             `(tex-xref-syntax-function
> +                               ,identifier ,beg ,end)))

Why do we need to change `syntax-propertize-function` and why do we need
`eval`?

> +                (setq syntax-propertize--done 0)

This is not sufficient.  You want to `syntax-ppss-flush-cache`.


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 04 May 2024 08:27:01 GMT) Full text and rfc822 format available.

Message #212 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Tassilo Horn <tsdh <at> gnu.org>, Arash Esbati <arash <at> gnu.org>,
 stefankangas <at> gmail.com, Dmitry Gutov <dgutov <at> yandex.ru>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 4 May 2024 09:26:12 +0100

Thank you very much, Stefan, for taking the time to review the patch.
In short, it plainly needs some work, but I'm rather short of time
this weekend so will respond properly and I hope more coherently
Monday or Tuesday.

Best, David.

On Fri, 3 May 2024 at 15:11, Stefan Monnier <monnier <at> iro.umontreal.ca> wrote:
>
> Hi,
>
> Apparently I'm the `tex-mode.el` guy, so I tried to take a look.
>
> > diff --git a/lisp/textmodes/tex-mode.el b/lisp/textmodes/tex-mode.el
> > index 97c950267c6..d990a2dbfa9 100644
> > --- a/lisp/textmodes/tex-mode.el
> > +++ b/lisp/textmodes/tex-mode.el
> > @@ -695,7 +696,25 @@ tex-verbatim-environments
> >       ("\\\\\\(?:end\\|begin\\) *\\({[^\n{}]*}\\)"
> >        (1 (ignore
> >            (tex-env-mark (match-beginning 0)
> > -                        (match-beginning 1) (match-end 1))))))))
> > +                        (match-beginning 1) (match-end 1)))))
> > +     ;; The next two rules change the syntax of `:' and `_' in expl3
> > +     ;; constructs, so that `tex-font-lock-suscript' can fontify them
> > +     ;; more accurately.
> > +     ((concat "\\(\\(?:[\\\\[:space:]{]_\\|"
> > +              "[\\\\{[:space:]][^][_[:space:][:cntrl:][:digit:]\\\\{}()/=]+\\)"
> > +              "\\(?:_+\\(?:[^][[:space:][:cntrl:][:digit:]:\\\\{}()/#_=]+\\|"
> > +              "#+[1-9]\\)\\)+\\)\\([:_]?\\)")
>
> Can you add in the comment some URL pointing to some relevant expl3
> documentation which "explains" why the above regexp makes sense?
> Also I don't clearly see how the above regexp distinguishes expl3 code
> from "normal" LaTeX code, so the comment should say something about it.
>
> Side note: I'd avoid [:space:] whose exact meaning is rarely quite what
> we need.
> Side note: backslash doesn't need to be backslashed in [...].
>
> > +      (1 (ignore
> > +          (let* ((expr (buffer-substring-no-properties (match-beginning 1)
> > +                                                       (match-end 1)))
> > +                 (list (seq-positions expr ?_)))
> > +            (dolist (pos list)
> > +              (put-text-property (+ pos (match-beginning 1))
> > +                                 (1+ (+ pos (match-beginning 1)))
> > +                                 'syntax-table (string-to-syntax "_"))))))
> > +      (2 "_"))
> > +     ("\\\\[[:alpha:]]+\\(:\\)[[:alpha:][:space:]\n]"
> > +      (1 "_")))))
>
> Currently we "skip" inappropriate underscores via
> `tex-font-lock-match-suscript` and/or by adding a particular `face` text
> property rather than via `syntax-table/propertize`.
>
> For algorithmic reasons, it's better to minimize the work done in
> `syntax-propertize-function` as much as possible (font-lock is more lazy
> than `syntax-propertize`), so I recommend you try and moving the above
> to font-lock rules.
>
> > +(defvar tex-esc-and-group-chars '(?\\ ?{ ?})
> > +  "The current TeX escape and grouping characters.
>
> I recommend you backslash escape the { and } above (although it's not
> indispensable, `emacs-lisp-mode` will parse the code better).
> More importantly, the docstring doesn't explain what this list
> means/does.  E.g. does the order matter?  Can it be longer than 3 elements?
>
> From the current docstring I can't guess what would be the consequence
> of adding/removing elements to/from this list.
>
> > +;; Populate `semantic-symref-filepattern-alist' for the in-tree modes;
> > +;; AUCTeX is doing the same for its modes.
> > +(defvar semantic-symref-filepattern-alist)
> > +(with-eval-after-load 'semantic/symref/grep
> > +  (push '(latex-mode "*.[tT]e[xX]" "*.ltx" "*.sty" "*.cl[so]"
> > +                     "*.bbl" "*.drv" "*.hva")
> > +        semantic-symref-filepattern-alist)
> > +  (push '(plain-tex-mode "*.[tT]e[xX]" "*.ins")
> > +        semantic-symref-filepattern-alist)
> > +  (push '(doctex-mode "*.dtx") semantic-symref-filepattern-alist))
>
> We know `semantic-symref-filepattern-alist` will exist when
> `semantic/symref/grep` is loaded, but not before, so I'd put the
> `defvar` inside the `with-eval-after-load`.
>
> > +;; Setup AUCTeX modes (for testing purposes only).
> > +
> > +(add-hook 'TeX-mode-hook #'tex-set-auctex-xref-backend)
> > +
> > +(defun tex-set-auctex-xref-backend ()
> > +  (add-hook 'xref-backend-functions #'tex--xref-backend nil t))
>
> I assume this will be sent to AUCTeX and is not meant to be in
> `tex-mode.el`, right?
>
> > +;; `xref-find-references' currently may need this when called from a
> > +;; latex-mode buffer in order to search files or buffers with a .tex
> > +;; suffix (including the buffer from which it has been called).  We
> > +;; append it to `auto-mode-alist' so as not to interfere with the usual
> > +;; mode-setting apparatus.  Changes here and in AUCTeX should soon
> > +;; render it unnecessary.
> > +(add-to-list 'auto-mode-alist '("\\.[tT]e[xX]\\'" . latex-mode) t)
>
> Maybe I have not followed the whole discussion closely enough, but at
> least to me the above "soon" is very unclear.
> I'll assume that this code will be removed before we install the patch.
> If not, please explain in the comment why this specific hack is needed
> and how it works.
>
> > +(cl-defmethod xref-backend-references ((_backend (eql 'tex-etags)) identifier)
> > +  "Find references of IDENTIFIER in TeX buffers and files."
> > +  (require 'semantic/symref/grep)
> > +  (let (bufs texbufs
> > +             (mode major-mode))
> > +    (dolist (buf (buffer-list))
> > +      (if (eq (buffer-local-value 'major-mode buf) mode)
> > +          (push buf bufs)
> > +        (when (string-match-p ".*\\.[tT]e[xX]" (buffer-name buf))
> > +          (push buf texbufs))))
> > +    (unless (seq-set-equal-p tex--buffers-list bufs)
> > +      (let* ((amalist (tex--collect-file-extensions))
> > +          (extlist (alist-get mode semantic-symref-filepattern-alist))
> > +          (extlist-new (seq-uniq
> > +                           (seq-union amalist extlist #'string-match-p))))
>
> After sinking the `defvar` above, you'll need to add a new `defvar` for
> `semantic-symref-filepattern-alist` just after the `require`.
>
> > +                (setq-local syntax-propertize-function
> > +                            (eval
> > +                             `(tex-xref-syntax-function
> > +                               ,identifier ,beg ,end)))
>
> Why do we need to change `syntax-propertize-function` and why do we need
> `eval`?
>
> > +                (setq syntax-propertize--done 0)
>
> This is not sufficient.  You want to `syntax-ppss-flush-cache`.
>
>
>         Stefan
>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 04 May 2024 14:34:01 GMT) Full text and rfc822 format available.

Message #215 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Arash Esbati <arash <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, David Fussner <dfussner <at> googlemail.com>,
 stefankangas <at> gmail.com, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 04 May 2024 16:32:25 +0200

Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

>> diff --git a/lisp/textmodes/tex-mode.el b/lisp/textmodes/tex-mode.el
>> index 97c950267c6..d990a2dbfa9 100644
>> --- a/lisp/textmodes/tex-mode.el
>> +++ b/lisp/textmodes/tex-mode.el
>> @@ -695,7 +696,25 @@ tex-verbatim-environments
>>       ("\\\\\\(?:end\\|begin\\) *\\({[^\n{}]*}\\)"
>>        (1 (ignore
>>            (tex-env-mark (match-beginning 0)
>> -                        (match-beginning 1) (match-end 1))))))))
>> +                        (match-beginning 1) (match-end 1)))))
>> +     ;; The next two rules change the syntax of `:' and `_' in expl3
>> +     ;; constructs, so that `tex-font-lock-suscript' can fontify them
>> +     ;; more accurately.
>> +     ((concat "\\(\\(?:[\\\\[:space:]{]_\\|"
>> +              "[\\\\{[:space:]][^][_[:space:][:cntrl:][:digit:]\\\\{}()/=]+\\)"
>> +              "\\(?:_+\\(?:[^][[:space:][:cntrl:][:digit:]:\\\\{}()/#_=]+\\|"
>> +              "#+[1-9]\\)\\)+\\)\\([:_]?\\)")
>
> Can you add in the comment some URL pointing to some relevant expl3
> documentation which "explains" why the above regexp makes sense?
> Also I don't clearly see how the above regexp distinguishes expl3 code
> from "normal" LaTeX code, so the comment should say something about
> it.

FWIW, I'm not sure if there is an URL for that, but in interface3.pdf,
chap.1, you'll find:

    1.1 Naming functions and variables

    LATEX3 does not use @ as a "letter"" for defining internal macros.
    Instead, the symbols _ and : are used in internal macro names to provide
    structure. The name of each function is divided into logical units using
    _, while : separates the name of the function from the argument
    specifier ("arg-spec"). This describes the arguments expected by the
    function. In most cases, each argument is represented by a single
    letter. The complete list of arg-spec letters for a function is referred
    to as the signature of the function.

So expect things like this:

    \tl_set:Nn \l_mya_tl { A }
    \tl_set:Nn \l_myb_tl { B }
    \tl_set:Nf \l_mya_tl { \l_mya_tl \l_myb_tl }

>> +;; Setup AUCTeX modes (for testing purposes only).
>> +
>> +(add-hook 'TeX-mode-hook #'tex-set-auctex-xref-backend)
>> +
>> +(defun tex-set-auctex-xref-backend ()
>> +  (add-hook 'xref-backend-functions #'tex--xref-backend nil t))
>
> I assume this will be sent to AUCTeX and is not meant to be in
> `tex-mode.el`, right?

That would have been a question from my side, but I saw that "testing
purposes only" and skipped it for this round.

Best, Arash

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 04 May 2024 14:56:02 GMT) Full text and rfc822 format available.

Message #218 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Arash Esbati <arash <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, David Fussner <dfussner <at> googlemail.com>,
 stefankangas <at> gmail.com, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 04 May 2024 10:54:46 -0400

> FWIW, I'm not sure if there is an URL for that, but in interface3.pdf,
> chap.1, you'll find:
[...]
> So expect things like this:
>
>     \tl_set:Nn \l_mya_tl { A }
>     \tl_set:Nn \l_myb_tl { B }
>     \tl_set:Nf \l_mya_tl { \l_mya_tl \l_myb_tl }

But that is *also* valid LaTeX, with a different meaning (i.e. where
`_` has its subscript meaning).  So we need some other info in order to
know which of the two we're dealing with.

Maybe that info is simply "assume LaTeX3 if the _ is followed by several
letters" or some such heuristic, but the comment should say so.


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 04 May 2024 21:17:02 GMT) Full text and rfc822 format available.

Message #221 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Arash Esbati <arash <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, David Fussner <dfussner <at> googlemail.com>,
 stefankangas <at> gmail.com, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 04 May 2024 23:15:55 +0200

Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

> But that is *also* valid LaTeX, with a different meaning (i.e. where
> `_` has its subscript meaning).  So we need some other info in order to
> know which of the two we're dealing with.

That's true.  AFAIK, one has to deal with:

  • \_ in ordinary text like foo\_bar
  • _ in math mode like $a_b$
  • expl3 macros like \tl_set:Nn
  • expl3 macros like \__kernel_kern:n

> Maybe that info is simply "assume LaTeX3 if the _ is followed by several
> letters" or some such heuristic, but the comment should say so.

Last time I looked at this, my conclusion was: Deal with \_ and _ in
usual .tex files and expect expl3 macros in .dtx file only.

Best, Arash

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Tue, 07 May 2024 02:08:01 GMT) Full text and rfc822 format available.

Message #224 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Arash Esbati <arash <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 Tassilo Horn <tsdh <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 stefankangas <at> gmail.com
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Tue, 7 May 2024 05:06:36 +0300

On 02/05/2024 16:32, David Fussner via Bug reports for GNU Emacs, the 
Swiss army knife of text editors wrote:
> Hi Dmitry,
> 
> Thanks for looking over the patch. Here's the recipe for the purported
> bug in xref.el:
> 
> 1. Please apply my patch to tex-mode.el (and xref.el).
> 
> 2. I've attached xref-bug.zip, which contains a directory with 4
> identical LaTeX files and one LaTeX file with a single additional
> character. Please extract it.
> 
> 3. emacs -Q
> 
> 4. C-x C-f xref-bug/mwea.ltx, and please don't visit the other 4
> files.
> 
> 5. Put point on \__hook_debug:n in line 6.
> 
> 6. M-?, RTN, ... RTN, RTN.
> 
> The xref buffer should offer 5 hits, one from each file in the
> directory.
> 
> 7. Comment out the the line I added to xref--collect-matches,
> byte-compile and load the file.
> 
> 8. With point in the same place, M-?, RTN, ... RTN, RTN.
> 
> The xref buffer should offer 3 hits. The first is from the
> file-visiting buffer (where I also set syntax-propertize--done to 0,
> because in my testing there could be some issues here, too). The
> second hit is from the first file opened in *xref-temp. Here,
> syntax-propertize runs to line-end, and all is well. The next two
> files are missed, because syntax-propertize--done is set to line-end
> and they have exactly the same line length as file two, and therefore
> syntax-propertize thinks that's good enough and doesn't actually
> change anything. The fifth file has an additional character in line 6,
> so syntax-propertize decides it needs to work on this line because
> line-end > syntax-propertize--done.
> 
> You can put point on, say, \documentclass, and you'll get all 5 hits,
> because this string doesn't begin or end with a non-word, non-symbol
> character, and syntax-propertize doesn't need to run. You can make the
> search string "\documentclass" and you'll get 2 hits, as line 1 has
> the same length in all 5 files. (It's worth trying "\usepackage" as
> the search string, too.)
> 
> That's my diagnosis anyway. Does it make sense?

Thank you, David, for the thorough scenario. I see the problem now.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Tue, 07 May 2024 02:28:02 GMT) Full text and rfc822 format available.

Message #227 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>,
 David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Arash Esbati <arash <at> gnu.org>, stefankangas <at> gmail.com,
 Tassilo Horn <tsdh <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Tue, 7 May 2024 05:27:14 +0300

On 03/05/2024 16:42, Stefan Monnier via Bug reports for GNU Emacs, the 
Swiss army knife of text editors wrote:
>> Thanks for looking over the patch. Here's the recipe for the purported
>> bug in xref.el:
> The problem stems from xref.el's constant abuse of
> `inhibit-modification-hooks`.  Binding this var to t should be done only
> in exceptional circumstances and should ideally be accompanied by a
> comment explaining why it's necessary.

Well, the reason is performance: I've tried to wring out the most out of 
it, given that we have to parse the buffer for syntax in Elisp, and 
that'll always have a certain overhead.

The difference between inhibiting and not could be up to 20% of runtime.

David's fix makes things slower (just due to having us do the necessary 
work), but still has an edge over having no inhibit-modification-hooks.

That remaining improvement is around 4-7% in my testing, though, so 
maybe it's the point where we should prioritize simplicity.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Tue, 07 May 2024 13:16:01 GMT) Full text and rfc822 format available.

Message #230 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Tassilo Horn <tsdh <at> gnu.org>, Arash Esbati <arash <at> gnu.org>,
 stefankangas <at> gmail.com, Dmitry Gutov <dgutov <at> yandex.ru>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Tue, 7 May 2024 14:15:33 +0100

[Message part 1 (text/plain, inline)]

Hi Stefan,

I apologize in advance if my reply gets lengthy.

> Also I don't clearly see how the above regexp distinguishes expl3 code
> from "normal" LaTeX code, so the comment should say something about it.

You are quite right that the regexp can (and does) match "normal"
LaTeX code, and I can see that this isn't acceptable as it stands. The
only way to be sure about ":" and "_" is to determine whether they're
inside a matched pair of \ExplSyntaxOn \ExplSyntaxOff macros (or in a
file which does something like \ProvidesExpl[File|Class]). The file
case can, I think, be sorted by modifying the syntax table as part of
setting up the relevant major modes. The temporary toggling of
ExplSyntax is trickier, but I have some proof-of-concept code that
adds a function to the `syntax-propertize-extend-region-functions'
hook that creates a list (`tex-expl-region-list`) of the start and end
of all such regions in a buffer and updates it whenever
`syntax-propertize` runs. In the `syntax-propertize-function` we test
whether hits are inside one of these regions and only change the
syntax when they are. (A very lightly-tested and incomplete patch on
top of my earlier patch is attached. It only applies to the "_" now,
but would need extending to the other sub-matches, too.)

> Currently we "skip" inappropriate underscores via
> `tex-font-lock-match-suscript` and/or by adding a particular `face` text
> property rather than via `syntax-table/propertize`.
>
> For algorithmic reasons, it's better to minimize the work done in
> `syntax-propertize-function` as much as possible (font-lock is more lazy
> than `syntax-propertize`), so I recommend you try and moving the above
> to font-lock rules.

The reason I've been using `syntax-propertize` rather than `font-lock`
is because the former may confer advantages when using
`xref-find-references`, but that in turn depends on how we decide to
define that function in the `tex-etags` backend. Please see below. In
any case, I think I can easily use `tex-expl-region-list` in a test
for how to fontify "_", so if you don't object to the addition of
`tex-expl-region-set` to the
`syntax-propertize-extend-region-functions' hook then we should be
able to get pretty close to a rigorous demarcation between "normal"
LaTeX and expl3 code in this context.

> > +                (setq-local syntax-propertize-function
> > +                            (eval
> > +                             `(tex-xref-syntax-function
> > +                               ,identifier ,beg ,end)))
>
> Why do we need to change `syntax-propertize-function` and why do we need
> `eval`?
>
> > +                (setq syntax-propertize--done 0)
>
> This is not sufficient.  You want to `syntax-ppss-flush-cache`.

We only need `eval` because I'm confused about the handling of macros
-- I have some code in progress to fix this. As for why we need to
change `syntax-propertize-function`, that's the core of the issue with
`xref-find-references`. In the current patch, the wrapper for the
standard `xref-backend-references` gathers file extensions and also
tests whether the search string begins and/or ends with a non-word,
non-symbol character. In `xref-references-in-directory` the only hits
offered to the user match (format "\\_<%s\\_>" ...), so I create a
bespoke `syntax-propertize-function` to ensure that the search string
matches that format. (Actually, I would need to improve that to cope
with searches for "\command" in code that looks like
"\let\command\othercommand" -- even when the "\" has the right syntax
the search fails because the "t" in "let" doesn't.)

My mental model for `syntax-propertize` was/is insufficient, as you
point out, so your improvement ensures that buffers return to the
status quo ante after the search is complete, but it's still an open
question whether we want to do this at all. I see at least 3 options:

1. The maximalist approach, which tries to ensure that any TeX symbol
may be searched for successfully, even if the syntax of its components
is inconvenient. My patch is a (faulty) attempt at this.

2. The minimalist approach, providing no bespoke
`syntax-propertize-function`, and accepting failure when searching
some strings, especially strings which aren't offered up by
default. (In the above example, "command" would be the default offered
up, so manual intervention is needed to search for "\command".) In
this case I'd be very keen to have the expl3 "_" and ":" code actually
in `syntax-propertize`, because then searches for expl3 constructs
(without the "\") would work. (I'd also be very keen on having
something similar in AUCTeX, though their current method works fine in
most files.)

3. The non-standard approach, the `tex-etags` backend calling a
variant of `project-find-regexp` instead of `xref-backend-references`
when someone presses M-?. We could supply file extensions to search,
as now, and allow the choice of projects and/or directories, as now,
but the output will always be very non-standard, more like
`xref-backend-apropos` than `xref-backend-references`. The syntax of
the search string won't matter, and the problem will be "too many
hits" rather than "too few or none".

If you have any thoughts on the matter I'd be all ears.

> > +(add-to-list 'auto-mode-alist '("\\.[tT]e[xX]\\'" . latex-mode) t)
>
> Maybe I have not followed the whole discussion closely enough, but at
> least to me the above "soon" is very unclear.
> I'll assume that this code will be removed before we install the patch.
> If not, please explain in the comment why this specific hack is needed
> and how it works.

As soon as AUCTeX has "*.[tT]e[xX]" in its contributions to
`semantic-symref-filepattern-alist` this will be redundant. As it
stands, not searching *.tex files for symbols in LaTeX-mode buffers
is kind of terrible.

> > +;; Setup AUCTeX modes (for testing purposes only).
> > +
> > +(add-hook 'TeX-mode-hook #'tex-set-auctex-xref-backend)
> > +
> > +(defun tex-set-auctex-xref-backend ()
> > +  (add-hook 'xref-backend-functions #'tex--xref-backend nil t))
>
> I assume this will be sent to AUCTeX and is not meant to be in
> `tex-mode.el`, right?

Yes.

Please assume I agree with all of your other corrections and
clarifications, and that I'll modify the patch accordingly. Once
again, thank you for the careful review, and my apologies for
occupying too much of your time.

Best, David.

On Fri, 3 May 2024 at 15:11, Stefan Monnier <monnier <at> iro.umontreal.ca> wrote:
>
> Hi,
>
> Apparently I'm the `tex-mode.el` guy, so I tried to take a look.
>
> > diff --git a/lisp/textmodes/tex-mode.el b/lisp/textmodes/tex-mode.el
> > index 97c950267c6..d990a2dbfa9 100644
> > --- a/lisp/textmodes/tex-mode.el
> > +++ b/lisp/textmodes/tex-mode.el
> > @@ -695,7 +696,25 @@ tex-verbatim-environments
> >       ("\\\\\\(?:end\\|begin\\) *\\({[^\n{}]*}\\)"
> >        (1 (ignore
> >            (tex-env-mark (match-beginning 0)
> > -                        (match-beginning 1) (match-end 1))))))))
> > +                        (match-beginning 1) (match-end 1)))))
> > +     ;; The next two rules change the syntax of `:' and `_' in expl3
> > +     ;; constructs, so that `tex-font-lock-suscript' can fontify them
> > +     ;; more accurately.
> > +     ((concat "\\(\\(?:[\\\\[:space:]{]_\\|"
> > +              "[\\\\{[:space:]][^][_[:space:][:cntrl:][:digit:]\\\\{}()/=]+\\)"
> > +              "\\(?:_+\\(?:[^][[:space:][:cntrl:][:digit:]:\\\\{}()/#_=]+\\|"
> > +              "#+[1-9]\\)\\)+\\)\\([:_]?\\)")
>
> Can you add in the comment some URL pointing to some relevant expl3
> documentation which "explains" why the above regexp makes sense?
> Also I don't clearly see how the above regexp distinguishes expl3 code
> from "normal" LaTeX code, so the comment should say something about it.
>
> Side note: I'd avoid [:space:] whose exact meaning is rarely quite what
> we need.
> Side note: backslash doesn't need to be backslashed in [...].
>
> > +      (1 (ignore
> > +          (let* ((expr (buffer-substring-no-properties (match-beginning 1)
> > +                                                       (match-end 1)))
> > +                 (list (seq-positions expr ?_)))
> > +            (dolist (pos list)
> > +              (put-text-property (+ pos (match-beginning 1))
> > +                                 (1+ (+ pos (match-beginning 1)))
> > +                                 'syntax-table (string-to-syntax "_"))))))
> > +      (2 "_"))
> > +     ("\\\\[[:alpha:]]+\\(:\\)[[:alpha:][:space:]\n]"
> > +      (1 "_")))))
>
> Currently we "skip" inappropriate underscores via
> `tex-font-lock-match-suscript` and/or by adding a particular `face` text
> property rather than via `syntax-table/propertize`.
>
> For algorithmic reasons, it's better to minimize the work done in
> `syntax-propertize-function` as much as possible (font-lock is more lazy
> than `syntax-propertize`), so I recommend you try and moving the above
> to font-lock rules.
>
> > +(defvar tex-esc-and-group-chars '(?\\ ?{ ?})
> > +  "The current TeX escape and grouping characters.
>
> I recommend you backslash escape the { and } above (although it's not
> indispensable, `emacs-lisp-mode` will parse the code better).
> More importantly, the docstring doesn't explain what this list
> means/does.  E.g. does the order matter?  Can it be longer than 3 elements?
>
> From the current docstring I can't guess what would be the consequence
> of adding/removing elements to/from this list.
>
> > +;; Populate `semantic-symref-filepattern-alist' for the in-tree modes;
> > +;; AUCTeX is doing the same for its modes.
> > +(defvar semantic-symref-filepattern-alist)
> > +(with-eval-after-load 'semantic/symref/grep
> > +  (push '(latex-mode "*.[tT]e[xX]" "*.ltx" "*.sty" "*.cl[so]"
> > +                     "*.bbl" "*.drv" "*.hva")
> > +        semantic-symref-filepattern-alist)
> > +  (push '(plain-tex-mode "*.[tT]e[xX]" "*.ins")
> > +        semantic-symref-filepattern-alist)
> > +  (push '(doctex-mode "*.dtx") semantic-symref-filepattern-alist))
>
> We know `semantic-symref-filepattern-alist` will exist when
> `semantic/symref/grep` is loaded, but not before, so I'd put the
> `defvar` inside the `with-eval-after-load`.
>
> > +;; Setup AUCTeX modes (for testing purposes only).
> > +
> > +(add-hook 'TeX-mode-hook #'tex-set-auctex-xref-backend)
> > +
> > +(defun tex-set-auctex-xref-backend ()
> > +  (add-hook 'xref-backend-functions #'tex--xref-backend nil t))
>
> I assume this will be sent to AUCTeX and is not meant to be in
> `tex-mode.el`, right?
>
> > +;; `xref-find-references' currently may need this when called from a
> > +;; latex-mode buffer in order to search files or buffers with a .tex
> > +;; suffix (including the buffer from which it has been called).  We
> > +;; append it to `auto-mode-alist' so as not to interfere with the usual
> > +;; mode-setting apparatus.  Changes here and in AUCTeX should soon
> > +;; render it unnecessary.
> > +(add-to-list 'auto-mode-alist '("\\.[tT]e[xX]\\'" . latex-mode) t)
>
> Maybe I have not followed the whole discussion closely enough, but at
> least to me the above "soon" is very unclear.
> I'll assume that this code will be removed before we install the patch.
> If not, please explain in the comment why this specific hack is needed
> and how it works.
>
> > +(cl-defmethod xref-backend-references ((_backend (eql 'tex-etags)) identifier)
> > +  "Find references of IDENTIFIER in TeX buffers and files."
> > +  (require 'semantic/symref/grep)
> > +  (let (bufs texbufs
> > +             (mode major-mode))
> > +    (dolist (buf (buffer-list))
> > +      (if (eq (buffer-local-value 'major-mode buf) mode)
> > +          (push buf bufs)
> > +        (when (string-match-p ".*\\.[tT]e[xX]" (buffer-name buf))
> > +          (push buf texbufs))))
> > +    (unless (seq-set-equal-p tex--buffers-list bufs)
> > +      (let* ((amalist (tex--collect-file-extensions))
> > +          (extlist (alist-get mode semantic-symref-filepattern-alist))
> > +          (extlist-new (seq-uniq
> > +                           (seq-union amalist extlist #'string-match-p))))
>
> After sinking the `defvar` above, you'll need to add a new `defvar` for
> `semantic-symref-filepattern-alist` just after the `require`.
>
> > +                (setq-local syntax-propertize-function
> > +                            (eval
> > +                             `(tex-xref-syntax-function
> > +                               ,identifier ,beg ,end)))
>
> Why do we need to change `syntax-propertize-function` and why do we need
> `eval`?
>
> > +                (setq syntax-propertize--done 0)
>
> This is not sufficient.  You want to `syntax-ppss-flush-cache`.
>
>
>         Stefan
>

[0001-expl-region.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 09 May 2024 03:01:02 GMT) Full text and rfc822 format available.

Message #233 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>,
 David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Arash Esbati <arash <at> gnu.org>, stefankangas <at> gmail.com,
 Tassilo Horn <tsdh <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 9 May 2024 06:00:02 +0300

On 07/05/2024 05:27, Dmitry Gutov wrote:
> On 03/05/2024 16:42, Stefan Monnier via Bug reports for GNU Emacs, the 
> Swiss army knife of text editors wrote:
>>> Thanks for looking over the patch. Here's the recipe for the purported
>>> bug in xref.el:
>> The problem stems from xref.el's constant abuse of
>> `inhibit-modification-hooks`.  Binding this var to t should be done only
>> in exceptional circumstances and should ideally be accompanied by a
>> comment explaining why it's necessary.
> 
> Well, the reason is performance: I've tried to wring out the most out of 
> it, given that we have to parse the buffer for syntax in Elisp, and 
> that'll always have a certain overhead.
> 
> The difference between inhibiting and not could be up to 20% of runtime.
> 
> David's fix makes things slower (just due to having us do the necessary 
> work), but still has an edge over having no inhibit-modification-hooks.
> 
> That remaining improvement is around 4-7% in my testing, though, so 
> maybe it's the point where we should prioritize simplicity.

For now, I've pushed a fix in 86187d43e2d which seems to handle David's 
scenario and address your review comment as well.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 09 May 2024 06:40:01 GMT) Full text and rfc822 format available.

Message #236 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Arash Esbati <arash <at> gnu.org>, Stefan Kangas <stefankangas <at> gmail.com>,
 Tassilo Horn <tsdh <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 9 May 2024 07:38:22 +0100

[Message part 1 (text/plain, inline)]

Thank you, Dmitry! I'll run tests later today and get back to you.

David.

On Thu, 9 May 2024, 04:00 Dmitry Gutov, <dgutov <at> yandex.ru> wrote:

> On 07/05/2024 05:27, Dmitry Gutov wrote:
> > On 03/05/2024 16:42, Stefan Monnier via Bug reports for GNU Emacs, the
> > Swiss army knife of text editors wrote:
> >>> Thanks for looking over the patch. Here's the recipe for the purported
> >>> bug in xref.el:
> >> The problem stems from xref.el's constant abuse of
> >> `inhibit-modification-hooks`.  Binding this var to t should be done only
> >> in exceptional circumstances and should ideally be accompanied by a
> >> comment explaining why it's necessary.
> >
> > Well, the reason is performance: I've tried to wring out the most out of
> > it, given that we have to parse the buffer for syntax in Elisp, and
> > that'll always have a certain overhead.
> >
> > The difference between inhibiting and not could be up to 20% of runtime.
> >
> > David's fix makes things slower (just due to having us do the necessary
> > work), but still has an edge over having no inhibit-modification-hooks.
> >
> > That remaining improvement is around 4-7% in my testing, though, so
> > maybe it's the point where we should prioritize simplicity.
>
> For now, I've pushed a fix in 86187d43e2d which seems to handle David's
> scenario and address your review comment as well.
>

[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 09 May 2024 10:51:02 GMT) Full text and rfc822 format available.

Message #239 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Arash Esbati <arash <at> gnu.org>, stefankangas <at> gmail.com,
 Tassilo Horn <tsdh <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 9 May 2024 11:49:37 +0100

Hi Dmitry,

All of my tests work well now, thank you.

Best, David.

On Thu, 9 May 2024 at 04:00, Dmitry Gutov <dgutov <at> yandex.ru> wrote:
>
> On 07/05/2024 05:27, Dmitry Gutov wrote:
> > On 03/05/2024 16:42, Stefan Monnier via Bug reports for GNU Emacs, the
> > Swiss army knife of text editors wrote:
> >>> Thanks for looking over the patch. Here's the recipe for the purported
> >>> bug in xref.el:
> >> The problem stems from xref.el's constant abuse of
> >> `inhibit-modification-hooks`.  Binding this var to t should be done only
> >> in exceptional circumstances and should ideally be accompanied by a
> >> comment explaining why it's necessary.
> >
> > Well, the reason is performance: I've tried to wring out the most out of
> > it, given that we have to parse the buffer for syntax in Elisp, and
> > that'll always have a certain overhead.
> >
> > The difference between inhibiting and not could be up to 20% of runtime.
> >
> > David's fix makes things slower (just due to having us do the necessary
> > work), but still has an edge over having no inhibit-modification-hooks.
> >
> > That remaining improvement is around 4-7% in my testing, though, so
> > maybe it's the point where we should prioritize simplicity.
>
> For now, I've pushed a fix in 86187d43e2d which seems to handle David's
> scenario and address your review comment as well.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 13 May 2024 20:55:02 GMT) Full text and rfc822 format available.

Message #242 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 David Fussner <dfussner <at> googlemail.com>, Arash Esbati <arash <at> gnu.org>,
 stefankangas <at> gmail.com, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 13 May 2024 16:54:32 -0400

> For now, I've pushed a fix in 86187d43e2d which seems to handle David's
> scenario and address your review comment as well.

The let-binding is done outside of `with-current-buffer`, so it relies
on the fact (itself problematic) that `inhibit-modification-hooks` is
not buffer-local.

Would it be OK to use a patch like the one below?

IIUC, in the `syntax-needed` case, the let-binding of
`inhibit-modification-hooks` is just not useful very (4-7% is not worth
the trouble), so its purpose is to speed up the other case.
Did I understand it right?

Also, what about the other two bindings of `inhibit-modification-hooks`?


        Stefan


diff --git a/lisp/progmodes/xref.el b/lisp/progmodes/xref.el
index f9faec1b474..214e9cb6c09 100644
--- a/lisp/progmodes/xref.el
+++ b/lisp/progmodes/xref.el
@@ -1282,7 +1282,7 @@ xref--show-common-initialize
     (erase-buffer)
     (setq overlay-arrow-position nil)
     (xref--insert-xrefs xref-alist)
-    (add-hook 'post-command-hook 'xref--apply-truncation nil t)
+    (add-hook 'post-command-hook #'xref--apply-truncation nil t)
     (goto-char (point-min))
     (setq xref--original-window (assoc-default 'window alist)
           xref--original-window-intent (assoc-default 'display-action alist))
@@ -2112,10 +2112,7 @@ xref--convert-hits
 (defun xref--collect-matches (hit regexp tmp-buffer syntax-needed)
   (pcase-let* ((`(,line ,file ,text) hit)
                (file (and file (concat xref--hits-remote-id file)))
-               (buf (xref--find-file-buffer file))
-               ;; This is fairly dangerouns, but improves performance
-               ;; for large lists, see https://debbugs.gnu.org/53749#227
-               (inhibit-modification-hooks t))
+               (buf (xref--find-file-buffer file)))
     (if buf
         (with-current-buffer buf
           (save-excursion
@@ -2130,6 +2130,9 @@
       ;; Using the temporary buffer is both a performance and a buffer
       ;; management optimization.
       (with-current-buffer tmp-buffer
+        ;; This let is fairly dangerouns, but improves performance
+        ;; for large lists, see https://debbugs.gnu.org/53749#227
+        (let ((inhibit-modification-hooks t))
         (erase-buffer)
         (when (and syntax-needed
                    (not (equal file xref--temp-buffer-file-name)))
@@ -2144,7 +2147,7 @@
           (setq-local xref--temp-buffer-file-name file)
           (setq-local inhibit-read-only t)
           (erase-buffer))
-        (insert text)
+          (insert text))
         (goto-char (point-min))
         (when syntax-needed
           (syntax-ppss-flush-cache (point)))

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Tue, 14 May 2024 21:26:01 GMT) Full text and rfc822 format available.

Message #245 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 David Fussner <dfussner <at> googlemail.com>, Arash Esbati <arash <at> gnu.org>,
 stefankangas <at> gmail.com, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Wed, 15 May 2024 00:24:24 +0300

On 13/05/2024 23:54, Stefan Monnier via Bug reports for GNU Emacs, the 
Swiss army knife of text editors wrote:
>> For now, I've pushed a fix in 86187d43e2d which seems to handle David's
>> scenario and address your review comment as well.
> 
> The let-binding is done outside of `with-current-buffer`, so it relies
> on the fact (itself problematic) that `inhibit-modification-hooks` is
> not buffer-local.

Good point.

> Would it be OK to use a patch like the one below?

Sure, thank you. Pushed.

> IIUC, in the `syntax-needed` case, the let-binding of
> `inhibit-modification-hooks` is just not useful very (4-7% is not worth
> the trouble), so its purpose is to speed up the other case.

4-10% is the improvement for both cases (the "syntax needed" and not).

I could be on the fence whether it's "not useful" - on the one hand 4% 
might not sound like much - on the other we already have this bit of 
improvement which has no known bugs. And when you combine a few of such 
performance hacks, the difference gets more noticeable.

Also, I'm eyeing another performance improvement (simplifying file type 
detection) - the call to set-auto-mode is not fast. Simply commenting 
this call out improves the performance by 4x or so - but we'll need a 
simpler version of it instead, of course.

And with the above change (commenting out the set-auto-mode call), the 
difference that the inhibit-modification-hooks hack makes is amplified: 
it can get up to 20%. Ultimately it'll be somewhere in between, but this 
sounds better, right?

> Also, what about the other two bindings of `inhibit-modification-hooks`?

The other two are used while the contents of the Xref buffer are printed 
(or re-printed), so there's none of the syntax-ppss complications there. 
The performance difference is 8.5% in my last measurement.

> 
>          Stefan
> 
> 
> diff --git a/lisp/progmodes/xref.el b/lisp/progmodes/xref.el
> index f9faec1b474..214e9cb6c09 100644
> --- a/lisp/progmodes/xref.el
> +++ b/lisp/progmodes/xref.el
> @@ -1282,7 +1282,7 @@ xref--show-common-initialize
>       (erase-buffer)
>       (setq overlay-arrow-position nil)
>       (xref--insert-xrefs xref-alist)
> -    (add-hook 'post-command-hook 'xref--apply-truncation nil t)
> +    (add-hook 'post-command-hook #'xref--apply-truncation nil t)
>       (goto-char (point-min))
>       (setq xref--original-window (assoc-default 'window alist)
>             xref--original-window-intent (assoc-default 'display-action alist))
> @@ -2112,10 +2112,7 @@ xref--convert-hits
>   (defun xref--collect-matches (hit regexp tmp-buffer syntax-needed)
>     (pcase-let* ((`(,line ,file ,text) hit)
>                  (file (and file (concat xref--hits-remote-id file)))
> -               (buf (xref--find-file-buffer file))
> -               ;; This is fairly dangerouns, but improves performance
> -               ;; for large lists, see https://debbugs.gnu.org/53749#227
> -               (inhibit-modification-hooks t))
> +               (buf (xref--find-file-buffer file)))
>       (if buf
>           (with-current-buffer buf
>             (save-excursion
> @@ -2130,6 +2130,9 @@
>         ;; Using the temporary buffer is both a performance and a buffer
>         ;; management optimization.
>         (with-current-buffer tmp-buffer
> +        ;; This let is fairly dangerouns, but improves performance
> +        ;; for large lists, see https://debbugs.gnu.org/53749#227
> +        (let ((inhibit-modification-hooks t))
>           (erase-buffer)
>           (when (and syntax-needed
>                      (not (equal file xref--temp-buffer-file-name)))
> @@ -2144,7 +2147,7 @@
>             (setq-local xref--temp-buffer-file-name file)
>             (setq-local inhibit-read-only t)
>             (erase-buffer))
> -        (insert text)
> +          (insert text))
>           (goto-char (point-min))
>           (when syntax-needed
>             (syntax-ppss-flush-cache (point)))
> 
> 
> 
>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Wed, 15 May 2024 15:49:01 GMT) Full text and rfc822 format available.

Message #248 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Tassilo Horn <tsdh <at> gnu.org>, Arash Esbati <arash <at> gnu.org>,
 stefankangas <at> gmail.com, Dmitry Gutov <dgutov <at> yandex.ru>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Wed, 15 May 2024 16:47:46 +0100

[Message part 1 (text/plain, inline)]

Hi Stefan,

I attach an updated patch for tex-mode and etags, in which I've
attempted to include all of your recommendations.  A few notes:

1. I changed the name as well as the doc string of the variable
holding the TeX escape and grouping chars (now
`tex-thingatpt-exclude-chars`).  I hope this makes it clearer.

2. I removed the regexp for detecting expl3 constructs, and now rely
on the mechanism I outlined in my previous work-in-progress patch:

(a) Test for expl3 class or package, set `tex-expl-file-p` to t.

(b) If not (a), add `tex-expl-region-set` to the
    `syntax-propertize-extend-region-functions` hook to list all
    regions between \ExplSyntaxOn and \ExplSyntaxOff
    (`tex-expl-region-list`).

(c) Add test in `tex-font-lock-suscript` for (a) then (b), don't
    subscript after the underscore when either is t.

3. I tried benchmarking `syntax-ppss-flush-cache` and
   `font-lock-flush` before and after the changes.  The former had a
   maximum slowdown of 0.5% (usually less) and the latter a maximum of
   0.2%, but if you want to see my methodology or suggest something to
   try please let me know.

4. I left the bespoke `syntax-propertize-function` in the
   `xref-backend-references` method uncompiled, as simple benchmarking
   suggested no perceptible gain from byte compiling it.  Using
   `syntax-ppss-flush-cache` to restore the status quo ante in each
   file-visiting buffer streamlined the code and made it do what it
   was supposed to do.

Thanks again for your advice, and please let me know what still needs
work.

Best, David.

On Fri, 3 May 2024 at 15:11, Stefan Monnier <monnier <at> iro.umontreal.ca> wrote:
>
> Hi,
>
> Apparently I'm the `tex-mode.el` guy, so I tried to take a look.
>
> > diff --git a/lisp/textmodes/tex-mode.el b/lisp/textmodes/tex-mode.el
> > index 97c950267c6..d990a2dbfa9 100644
> > --- a/lisp/textmodes/tex-mode.el
> > +++ b/lisp/textmodes/tex-mode.el
> > @@ -695,7 +696,25 @@ tex-verbatim-environments
> >       ("\\\\\\(?:end\\|begin\\) *\\({[^\n{}]*}\\)"
> >        (1 (ignore
> >            (tex-env-mark (match-beginning 0)
> > -                        (match-beginning 1) (match-end 1))))))))
> > +                        (match-beginning 1) (match-end 1)))))
> > +     ;; The next two rules change the syntax of `:' and `_' in expl3
> > +     ;; constructs, so that `tex-font-lock-suscript' can fontify them
> > +     ;; more accurately.
> > +     ((concat "\\(\\(?:[\\\\[:space:]{]_\\|"
> > +              "[\\\\{[:space:]][^][_[:space:][:cntrl:][:digit:]\\\\{}()/=]+\\)"
> > +              "\\(?:_+\\(?:[^][[:space:][:cntrl:][:digit:]:\\\\{}()/#_=]+\\|"
> > +              "#+[1-9]\\)\\)+\\)\\([:_]?\\)")
>
> Can you add in the comment some URL pointing to some relevant expl3
> documentation which "explains" why the above regexp makes sense?
> Also I don't clearly see how the above regexp distinguishes expl3 code
> from "normal" LaTeX code, so the comment should say something about it.
>
> Side note: I'd avoid [:space:] whose exact meaning is rarely quite what
> we need.
> Side note: backslash doesn't need to be backslashed in [...].
>
> > +      (1 (ignore
> > +          (let* ((expr (buffer-substring-no-properties (match-beginning 1)
> > +                                                       (match-end 1)))
> > +                 (list (seq-positions expr ?_)))
> > +            (dolist (pos list)
> > +              (put-text-property (+ pos (match-beginning 1))
> > +                                 (1+ (+ pos (match-beginning 1)))
> > +                                 'syntax-table (string-to-syntax "_"))))))
> > +      (2 "_"))
> > +     ("\\\\[[:alpha:]]+\\(:\\)[[:alpha:][:space:]\n]"
> > +      (1 "_")))))
>
> Currently we "skip" inappropriate underscores via
> `tex-font-lock-match-suscript` and/or by adding a particular `face` text
> property rather than via `syntax-table/propertize`.
>
> For algorithmic reasons, it's better to minimize the work done in
> `syntax-propertize-function` as much as possible (font-lock is more lazy
> than `syntax-propertize`), so I recommend you try and moving the above
> to font-lock rules.
>
> > +(defvar tex-esc-and-group-chars '(?\\ ?{ ?})
> > +  "The current TeX escape and grouping characters.
>
> I recommend you backslash escape the { and } above (although it's not
> indispensable, `emacs-lisp-mode` will parse the code better).
> More importantly, the docstring doesn't explain what this list
> means/does.  E.g. does the order matter?  Can it be longer than 3 elements?
>
> From the current docstring I can't guess what would be the consequence
> of adding/removing elements to/from this list.
>
> > +;; Populate `semantic-symref-filepattern-alist' for the in-tree modes;
> > +;; AUCTeX is doing the same for its modes.
> > +(defvar semantic-symref-filepattern-alist)
> > +(with-eval-after-load 'semantic/symref/grep
> > +  (push '(latex-mode "*.[tT]e[xX]" "*.ltx" "*.sty" "*.cl[so]"
> > +                     "*.bbl" "*.drv" "*.hva")
> > +        semantic-symref-filepattern-alist)
> > +  (push '(plain-tex-mode "*.[tT]e[xX]" "*.ins")
> > +        semantic-symref-filepattern-alist)
> > +  (push '(doctex-mode "*.dtx") semantic-symref-filepattern-alist))
>
> We know `semantic-symref-filepattern-alist` will exist when
> `semantic/symref/grep` is loaded, but not before, so I'd put the
> `defvar` inside the `with-eval-after-load`.
>
> > +;; Setup AUCTeX modes (for testing purposes only).
> > +
> > +(add-hook 'TeX-mode-hook #'tex-set-auctex-xref-backend)
> > +
> > +(defun tex-set-auctex-xref-backend ()
> > +  (add-hook 'xref-backend-functions #'tex--xref-backend nil t))
>
> I assume this will be sent to AUCTeX and is not meant to be in
> `tex-mode.el`, right?
>
> > +;; `xref-find-references' currently may need this when called from a
> > +;; latex-mode buffer in order to search files or buffers with a .tex
> > +;; suffix (including the buffer from which it has been called).  We
> > +;; append it to `auto-mode-alist' so as not to interfere with the usual
> > +;; mode-setting apparatus.  Changes here and in AUCTeX should soon
> > +;; render it unnecessary.
> > +(add-to-list 'auto-mode-alist '("\\.[tT]e[xX]\\'" . latex-mode) t)
>
> Maybe I have not followed the whole discussion closely enough, but at
> least to me the above "soon" is very unclear.
> I'll assume that this code will be removed before we install the patch.
> If not, please explain in the comment why this specific hack is needed
> and how it works.
>
> > +(cl-defmethod xref-backend-references ((_backend (eql 'tex-etags)) identifier)
> > +  "Find references of IDENTIFIER in TeX buffers and files."
> > +  (require 'semantic/symref/grep)
> > +  (let (bufs texbufs
> > +             (mode major-mode))
> > +    (dolist (buf (buffer-list))
> > +      (if (eq (buffer-local-value 'major-mode buf) mode)
> > +          (push buf bufs)
> > +        (when (string-match-p ".*\\.[tT]e[xX]" (buffer-name buf))
> > +          (push buf texbufs))))
> > +    (unless (seq-set-equal-p tex--buffers-list bufs)
> > +      (let* ((amalist (tex--collect-file-extensions))
> > +          (extlist (alist-get mode semantic-symref-filepattern-alist))
> > +          (extlist-new (seq-uniq
> > +                           (seq-union amalist extlist #'string-match-p))))
>
> After sinking the `defvar` above, you'll need to add a new `defvar` for
> `semantic-symref-filepattern-alist` just after the `require`.
>
> > +                (setq-local syntax-propertize-function
> > +                            (eval
> > +                             `(tex-xref-syntax-function
> > +                               ,identifier ,beg ,end)))
>
> Why do we need to change `syntax-propertize-function` and why do we need
> `eval`?
>
> > +                (setq syntax-propertize--done 0)
>
> This is not sufficient.  You want to `syntax-ppss-flush-cache`.
>
>
>         Stefan
>

[0002-Provide-a-modified-xref-backend-for-TeX-buffers.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 16 May 2024 07:57:01 GMT) Full text and rfc822 format available.

Message #251 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Arash Esbati <arash <at> gnu.org>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 Tassilo Horn <tsdh <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 stefankangas <at> gmail.com
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 16 May 2024 09:53:59 +0200

David Fussner <dfussner <at> googlemail.com> writes:
  
> +(defun tex-expl-buffer-parse ()
> +  "Identify buffers where expl3 syntax is always active."
> +  (save-excursion
> +    (goto-char (point-min))
> +    (when (tex-search-noncomment
> +	   (re-search-forward
> +	    "\\(?:\\\\\\(?:ExplFile\\|ProvidesExpl\\|__xparse_file\\)\\)"

Is the outer grouping necessary?  Why not just:

"\\\\\\(?:ExplFile\\|ProvidesExpl\\|__xparse_file\\)"

> +	    nil t))
> +      (setq tex-expl-buffer-p t))))
> +
> +(defun tex-expl-region-set (_beg _end)
> +  "Create a list of regions where expl3 syntax is active.
> +This function updates the list whenever `syntax-propertize' runs, and
> +stores it in the buffer-local variable `tex-expl-region-list'.  The
> +list will always be nil when the buffer visits an expl3 file, e.g., an
> +expl3 class or package, where expl3 syntax is always active."
> +  (unless syntax-ppss--updated-cache;; Stop forward search running twice.
> +    (setq tex-expl-region-list nil)
> +    ;; Leaving this test here allows users to set `tex-expl-buffer-p'
> +    ;; independently of the mode's automatic detection of an expl3 file.
> +    (unless tex-expl-buffer-p
> +      (goto-char (point-min))
> +      (while (tex-search-noncomment
> +              (re-search-forward "\\ExplSyntaxOn" nil t))

This looks wrong, I think you want `search-forward'.

> +        (let ((new-beg (point))
> +              (new-end (or (tex-search-noncomment
> +                            (re-search-forward "\\ExplSyntaxOff" nil t))

Same here.

> +                           (point-max))))
> +          (push (cons new-beg new-end) tex-expl-region-list))))))

Best, Arash

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 16 May 2024 12:58:02 GMT) Full text and rfc822 format available.

Message #254 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Arash Esbati <arash <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 Tassilo Horn <tsdh <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 stefankangas <at> gmail.com
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 16 May 2024 13:56:56 +0100

[Message part 1 (text/plain, inline)]

Thanks, Arash. Agreed, on all counts. Revised patch attached.

Best, David.

On Thu, 16 May 2024 at 08:54, Arash Esbati <arash <at> gnu.org> wrote:
>
> David Fussner <dfussner <at> googlemail.com> writes:
>
> > +(defun tex-expl-buffer-parse ()
> > +  "Identify buffers where expl3 syntax is always active."
> > +  (save-excursion
> > +    (goto-char (point-min))
> > +    (when (tex-search-noncomment
> > +        (re-search-forward
> > +         "\\(?:\\\\\\(?:ExplFile\\|ProvidesExpl\\|__xparse_file\\)\\)"
>
> Is the outer grouping necessary?  Why not just:
>
> "\\\\\\(?:ExplFile\\|ProvidesExpl\\|__xparse_file\\)"
>
> > +         nil t))
> > +      (setq tex-expl-buffer-p t))))
> > +
> > +(defun tex-expl-region-set (_beg _end)
> > +  "Create a list of regions where expl3 syntax is active.
> > +This function updates the list whenever `syntax-propertize' runs, and
> > +stores it in the buffer-local variable `tex-expl-region-list'.  The
> > +list will always be nil when the buffer visits an expl3 file, e.g., an
> > +expl3 class or package, where expl3 syntax is always active."
> > +  (unless syntax-ppss--updated-cache;; Stop forward search running twice.
> > +    (setq tex-expl-region-list nil)
> > +    ;; Leaving this test here allows users to set `tex-expl-buffer-p'
> > +    ;; independently of the mode's automatic detection of an expl3 file.
> > +    (unless tex-expl-buffer-p
> > +      (goto-char (point-min))
> > +      (while (tex-search-noncomment
> > +              (re-search-forward "\\ExplSyntaxOn" nil t))
>
> This looks wrong, I think you want `search-forward'.
>
> > +        (let ((new-beg (point))
> > +              (new-end (or (tex-search-noncomment
> > +                            (re-search-forward "\\ExplSyntaxOff" nil t))
>
> Same here.
>
> > +                           (point-max))))
> > +          (push (cons new-beg new-end) tex-expl-region-list))))))
>
> Best, Arash

[0003-Provide-a-modified-xref-backend-for-TeX-buffers.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Thu, 16 May 2024 18:20:01 GMT) Full text and rfc822 format available.

Message #257 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 David Fussner <dfussner <at> googlemail.com>, Arash Esbati <arash <at> gnu.org>,
 stefankangas <at> gmail.com, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Thu, 16 May 2024 14:18:54 -0400

>> IIUC, in the `syntax-needed` case, the let-binding of
>> `inhibit-modification-hooks` is just not useful very (4-7% is not worth
>> the trouble), so its purpose is to speed up the other case.
> 4-10% is the improvement for both cases (the "syntax needed" and not).

Hmm... not sure it's worth the trouble, then.
Also, it might be worth trying to see where those 4-10% are spent: this
is done in a temp buffer where there should presumably be very little
need for before/after-change-functions, so maybe we can get rid of the
specific offenders rather than inhibit all modification hooks.

> Also, I'm eyeing another performance improvement (simplifying file type
> detection) - the call to set-auto-mode is not fast. Simply commenting this
> call out improves the performance by 4x or so - but we'll need a simpler
> version of it instead, of course.
>
> And with the above change (commenting out the set-auto-mode call), the
> difference that the inhibit-modification-hooks hack makes is amplified: it
> can get up to 20%.

I wonder what we do during those 20% of the time if the buffer is left
in fundamental-mode.

>> Also, what about the other two bindings of `inhibit-modification-hooks`?
> The other two are used while the contents of the Xref buffer are printed (or
> re-printed), so there's none of the syntax-ppss complications there. The
> performance difference is 8.5% in my last measurement.

Is this 8.5% of a function that's fast anyway of 8.5% of a function
which takes a fair bit of time?  Again, I'm not sure it's worth
the trouble.  But as a start, every such binding should have a comment
mentioning that it's there only to gain a few percents of performance.


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 20 May 2024 00:23:02 GMT) Full text and rfc822 format available.

Message #260 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 David Fussner <dfussner <at> googlemail.com>, Arash Esbati <arash <at> gnu.org>,
 stefankangas <at> gmail.com, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 20 May 2024 03:21:33 +0300

On 16/05/2024 21:18, Stefan Monnier via Bug reports for GNU Emacs, the 
Swiss army knife of text editors wrote:
>>> IIUC, in the `syntax-needed` case, the let-binding of
>>> `inhibit-modification-hooks` is just not useful very (4-7% is not worth
>>> the trouble), so its purpose is to speed up the other case.
>> 4-10% is the improvement for both cases (the "syntax needed" and not).
> 
> Hmm... not sure it's worth the trouble, then.
> Also, it might be worth trying to see where those 4-10% are spent: this
> is done in a temp buffer where there should presumably be very little
> need for before/after-change-functions, so maybe we can get rid of the
> specific offenders rather than inhibit all modification hooks.

Given the relatively low percentages, it might be difficult to glance 
from a profiler report. I was assuming the time was mostly spent in 
syntax-ppss-flush-cache, but the function is pretty simple.

>> Also, I'm eyeing another performance improvement (simplifying file type
>> detection) - the call to set-auto-mode is not fast. Simply commenting this
>> call out improves the performance by 4x or so - but we'll need a simpler
>> version of it instead, of course.
>>
>> And with the above change (commenting out the set-auto-mode call), the
>> difference that the inhibit-modification-hooks hack makes is amplified: it
>> can get up to 20%.
> 
> I wonder what we do during those 20% of the time if the buffer is left
> in fundamental-mode.

Good question.

>>> Also, what about the other two bindings of `inhibit-modification-hooks`?
>> The other two are used while the contents of the Xref buffer are printed (or
>> re-printed), so there's none of the syntax-ppss complications there. The
>> performance difference is 8.5% in my last measurement.
> 
> Is this 8.5% of a function that's fast anyway of 8.5% of a function
> which takes a fair bit of time?

When there are a lot of matches, it can take some time. Note that 100% 
in this case is the whole list-files-do-search-print-results pipeline, 
not just the printing phase. So printing is sped up by more than 8% (my 
last test says it's by 27%).

> But as a start, every such binding should have a comment
> mentioning that it's there only to gain a few percents of performance.

Sure.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 20 May 2024 02:40:01 GMT) Full text and rfc822 format available.

Message #263 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 David Fussner <dfussner <at> googlemail.com>, Arash Esbati <arash <at> gnu.org>,
 stefankangas <at> gmail.com, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sun, 19 May 2024 22:38:45 -0400

>> Hmm... not sure it's worth the trouble, then.
>> Also, it might be worth trying to see where those 4-10% are spent: this
>> is done in a temp buffer where there should presumably be very little
>> need for before/after-change-functions, so maybe we can get rid of the
>> specific offenders rather than inhibit all modification hooks.
> Given the relatively low percentages, it might be difficult to glance from
> a profiler report. I was assuming the time was mostly spent in
> syntax-ppss-flush-cache, but the function is pretty simple.

Rather than a profiler report, maybe a better approach would be to
remove things from the non-inhibited-modification-hooks paths and see
how/if they change the performance.
E.g. replace the `inhibit-modification-hooks` binding by one that binds
`before/after-change-functions` to nil.

>> I wonder what we do during those 20% of the time if the buffer is left
>> in fundamental-mode.
> Good question.

It's probably the better case to investigate since it might be easier to
see the effects.

>>>> Also, what about the other two bindings of `inhibit-modification-hooks`?
>>> The other two are used while the contents of the Xref buffer are printed (or
>>> re-printed), so there's none of the syntax-ppss complications there. The
>>> performance difference is 8.5% in my last measurement.
>> Is this 8.5% of a function that's fast anyway of 8.5% of a function
>> which takes a fair bit of time?
> When there are a lot of matches, it can take some time. Note that 100% in
> this case is the whole list-files-do-search-print-results pipeline, not just
> the printing phase. So printing is sped up by more than 8% (my last test
> says it's by 27%).

I guess during printing if it's done in many small steps we may indeed
run modification hooks many times, so that could explain the
higher percentage.

It still seems hard to justify 27% since those modification hooks should
usually do nothing, AFAICT.  Maybe there's something silly going on.


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 25 May 2024 07:58:01 GMT) Full text and rfc822 format available.

Message #266 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 53749 <at> debbugs.gnu.org, ikumi <at> ikumi.que.jp, tsdh <at> gnu.org,
 dfussner <at> googlemail.com, arash <at> gnu.org, stefankangas <at> gmail.com,
 dgutov <at> yandex.ru
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 25 May 2024 10:57:22 +0300

How should we proceed about this bug report?  Is David's last
changeset acceptable or isn't it?

> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: 53749 <at> debbugs.gnu.org,  Ikumi Keita <ikumi <at> ikumi.que.jp>,  David Fussner
>  <dfussner <at> googlemail.com>,  Arash Esbati <arash <at> gnu.org>,
>   stefankangas <at> gmail.com,  Tassilo Horn <tsdh <at> gnu.org>,  Eli Zaretskii
>  <eliz <at> gnu.org>
> Date: Sun, 19 May 2024 22:38:45 -0400
> 
> >> Hmm... not sure it's worth the trouble, then.
> >> Also, it might be worth trying to see where those 4-10% are spent: this
> >> is done in a temp buffer where there should presumably be very little
> >> need for before/after-change-functions, so maybe we can get rid of the
> >> specific offenders rather than inhibit all modification hooks.
> > Given the relatively low percentages, it might be difficult to glance from
> > a profiler report. I was assuming the time was mostly spent in
> > syntax-ppss-flush-cache, but the function is pretty simple.
> 
> Rather than a profiler report, maybe a better approach would be to
> remove things from the non-inhibited-modification-hooks paths and see
> how/if they change the performance.
> E.g. replace the `inhibit-modification-hooks` binding by one that binds
> `before/after-change-functions` to nil.
> 
> >> I wonder what we do during those 20% of the time if the buffer is left
> >> in fundamental-mode.
> > Good question.
> 
> It's probably the better case to investigate since it might be easier to
> see the effects.
> 
> >>>> Also, what about the other two bindings of `inhibit-modification-hooks`?
> >>> The other two are used while the contents of the Xref buffer are printed (or
> >>> re-printed), so there's none of the syntax-ppss complications there. The
> >>> performance difference is 8.5% in my last measurement.
> >> Is this 8.5% of a function that's fast anyway of 8.5% of a function
> >> which takes a fair bit of time?
> > When there are a lot of matches, it can take some time. Note that 100% in
> > this case is the whole list-files-do-search-print-results pipeline, not just
> > the printing phase. So printing is sped up by more than 8% (my last test
> > says it's by 27%).
> 
> I guess during printing if it's done in many small steps we may indeed
> run modification hooks many times, so that could explain the
> higher percentage.
> 
> It still seems hard to justify 27% since those modification hooks should
> usually do nothing, AFAICT.  Maybe there's something silly going on.
> 
> 
>         Stefan
> 
>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 25 May 2024 23:03:01 GMT) Full text and rfc822 format available.

Message #269 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 David Fussner <dfussner <at> googlemail.com>, Arash Esbati <arash <at> gnu.org>,
 stefankangas <at> gmail.com, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sun, 26 May 2024 02:01:28 +0300

On 20/05/2024 05:38, Stefan Monnier via Bug reports for GNU Emacs, the 
Swiss army knife of text editors wrote:
>>> Hmm... not sure it's worth the trouble, then.
>>> Also, it might be worth trying to see where those 4-10% are spent: this
>>> is done in a temp buffer where there should presumably be very little
>>> need for before/after-change-functions, so maybe we can get rid of the
>>> specific offenders rather than inhibit all modification hooks.
>> Given the relatively low percentages, it might be difficult to glance from
>> a profiler report. I was assuming the time was mostly spent in
>> syntax-ppss-flush-cache, but the function is pretty simple.
> 
> Rather than a profiler report, maybe a better approach would be to
> remove things from the non-inhibited-modification-hooks paths and see
> how/if they change the performance.
> E.g. replace the `inhibit-modification-hooks` binding by one that binds
> `before/after-change-functions` to nil.
> 
>>> I wonder what we do during those 20% of the time if the buffer is left
>>> in fundamental-mode.
>> Good question.
> 
> It's probably the better case to investigate since it might be easier to
> see the effects.

Revisiting this, I haven't been able to reproduce the 20% number. :-(

The effect of that specific inhibit-modification-hooks binding seems to 
stay around 4-8%, and it's actually on the higher end when the 
set-auto-mode call it present (probably due to text manipulation inside it).

Binding before/after-change-functions, both of the hooks have their 
impact - one more than the other, but like 60/40. Maybe just funcall 
overhead.

>>>>> Also, what about the other two bindings of `inhibit-modification-hooks`?
>>>> The other two are used while the contents of the Xref buffer are printed (or
>>>> re-printed), so there's none of the syntax-ppss complications there. The
>>>> performance difference is 8.5% in my last measurement.
>>> Is this 8.5% of a function that's fast anyway of 8.5% of a function
>>> which takes a fair bit of time?
>> When there are a lot of matches, it can take some time. Note that 100% in
>> this case is the whole list-files-do-search-print-results pipeline, not just
>> the printing phase. So printing is sped up by more than 8% (my last test
>> says it's by 27%).
> 
> I guess during printing if it's done in many small steps we may indeed
> run modification hooks many times, so that could explain the
> higher percentage.
> 
> It still seems hard to justify 27% since those modification hooks should
> usually do nothing, AFAICT.  Maybe there's something silly going on.

On this step (xref--show-common-initialize) the numbers still hold, 
however. What's different, is that replacing the 
inhibit-modification-hooks with two (before-change-functions and 
after-change-functions both to nil) doesn't have a similar effect. Which 
makes sense, since the buffer is almost in fundamental-mode, both hooks 
are nil there.

Binding create-lockfiles or select-active-regions to nil doesn't have 
any impact. And replacing the use of all of the above with 
combine-change-calls makes performance worse.

If we're going to continue this subthread, it's probably better to move 
it somewhere else (separate bug, or emacs-devel).

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Wed, 05 Jun 2024 09:48:02 GMT) Full text and rfc822 format available.

Message #272 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, ikumi <at> ikumi.que.jp, tsdh <at> gnu.org, arash <at> gnu.org,
 stefankangas <at> gmail.com, dgutov <at> yandex.ru,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Wed, 5 Jun 2024 10:46:10 +0100

[Message part 1 (text/plain, inline)]

Hi Eli, Stefan, and Dmitry,

In case the changeset might prove acceptable for version 30, I attach
the latest patch, which clears out the code I was using to simplify
testing of the AUCTeX modes. I can if requested send a patch for the
manual etags tests, also, in case that might prove helpful down the
line.

Best, David.

On Sat, 25 May 2024 at 12:01, Eli Zaretskii <eliz <at> gnu.org> wrote:
>
> How should we proceed about this bug report?  Is David's last
> changeset acceptable or isn't it?
>
> > From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> > Cc: 53749 <at> debbugs.gnu.org,  Ikumi Keita <ikumi <at> ikumi.que.jp>,  David Fussner
> >  <dfussner <at> googlemail.com>,  Arash Esbati <arash <at> gnu.org>,
> >   stefankangas <at> gmail.com,  Tassilo Horn <tsdh <at> gnu.org>,  Eli Zaretskii
> >  <eliz <at> gnu.org>
> > Date: Sun, 19 May 2024 22:38:45 -0400
> >
> > >> Hmm... not sure it's worth the trouble, then.
> > >> Also, it might be worth trying to see where those 4-10% are spent: this
> > >> is done in a temp buffer where there should presumably be very little
> > >> need for before/after-change-functions, so maybe we can get rid of the
> > >> specific offenders rather than inhibit all modification hooks.
> > > Given the relatively low percentages, it might be difficult to glance from
> > > a profiler report. I was assuming the time was mostly spent in
> > > syntax-ppss-flush-cache, but the function is pretty simple.
> >
> > Rather than a profiler report, maybe a better approach would be to
> > remove things from the non-inhibited-modification-hooks paths and see
> > how/if they change the performance.
> > E.g. replace the `inhibit-modification-hooks` binding by one that binds
> > `before/after-change-functions` to nil.
> >
> > >> I wonder what we do during those 20% of the time if the buffer is left
> > >> in fundamental-mode.
> > > Good question.
> >
> > It's probably the better case to investigate since it might be easier to
> > see the effects.
> >
> > >>>> Also, what about the other two bindings of `inhibit-modification-hooks`?
> > >>> The other two are used while the contents of the Xref buffer are printed (or
> > >>> re-printed), so there's none of the syntax-ppss complications there. The
> > >>> performance difference is 8.5% in my last measurement.
> > >> Is this 8.5% of a function that's fast anyway of 8.5% of a function
> > >> which takes a fair bit of time?
> > > When there are a lot of matches, it can take some time. Note that 100% in
> > > this case is the whole list-files-do-search-print-results pipeline, not just
> > > the printing phase. So printing is sped up by more than 8% (my last test
> > > says it's by 27%).
> >
> > I guess during printing if it's done in many small steps we may indeed
> > run modification hooks many times, so that could explain the
> > higher percentage.
> >
> > It still seems hard to justify 27% since those modification hooks should
> > usually do nothing, AFAICT.  Maybe there's something silly going on.
> >
> >
> >         Stefan
> >
> >

[0004-Provide-a-modified-xref-backend-for-TeX-buffers.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 08 Jun 2024 12:39:01 GMT) Full text and rfc822 format available.

Message #275 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, ikumi <at> ikumi.que.jp, tsdh <at> gnu.org, arash <at> gnu.org,
 stefankangas <at> gmail.com, dgutov <at> yandex.ru, monnier <at> iro.umontreal.ca
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 08 Jun 2024 15:38:17 +0300

> From: David Fussner <dfussner <at> googlemail.com>
> Date: Wed, 5 Jun 2024 10:46:10 +0100
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, dgutov <at> yandex.ru, 53749 <at> debbugs.gnu.org, 
> 	ikumi <at> ikumi.que.jp, arash <at> gnu.org, stefankangas <at> gmail.com, tsdh <at> gnu.org
> 
> Hi Eli, Stefan, and Dmitry,
> 
> In case the changeset might prove acceptable for version 30, I attach
> the latest patch, which clears out the code I was using to simplify
> testing of the AUCTeX modes. I can if requested send a patch for the
> manual etags tests, also, in case that might prove helpful down the
> line.

Thanks, I'm still waiting for answers to my question:

> On Sat, 25 May 2024 at 12:01, Eli Zaretskii <eliz <at> gnu.org> wrote:
> >
> > How should we proceed about this bug report?  Is David's last
> > changeset acceptable or isn't it?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 08 Jun 2024 21:56:02 GMT) Full text and rfc822 format available.

Message #278 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Eli Zaretskii <eliz <at> gnu.org>, David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, ikumi <at> ikumi.que.jp, arash <at> gnu.org,
 stefankangas <at> gmail.com, tsdh <at> gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 8 Jun 2024 23:54:30 +0300

On 08/06/2024 15:38, Eli Zaretskii wrote:
>> From: David Fussner<dfussner <at> googlemail.com>
>> Date: Wed, 5 Jun 2024 10:46:10 +0100
>> Cc: Stefan Monnier<monnier <at> iro.umontreal.ca>,dgutov <at> yandex.ru,53749 <at> debbugs.gnu.org,
>> 	ikumi <at> ikumi.que.jp,arash <at> gnu.org,stefankangas <at> gmail.com,tsdh <at> gnu.org
>>
>> Hi Eli, Stefan, and Dmitry,
>>
>> In case the changeset might prove acceptable for version 30, I attach
>> the latest patch, which clears out the code I was using to simplify
>> testing of the AUCTeX modes. I can if requested send a patch for the
>> manual etags tests, also, in case that might prove helpful down the
>> line.
> Thanks, I'm still waiting for answers to my question:
> 
>> On Sat, 25 May 2024 at 12:01, Eli Zaretskii<eliz <at> gnu.org>  wrote:
>>> How should we proceed about this bug report?  Is David's last
>>> changeset acceptable or isn't it?

To the extent that I can evaluate the code, it look pretty good.

And it's an additive change, so I don't see blockers to installing it.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sun, 09 Jun 2024 11:20:02 GMT) Full text and rfc822 format available.

Message #281 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: David Fussner <dfussner <at> googlemail.com>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, ikumi <at> ikumi.que.jp, dgutov <at> yandex.ru, arash <at> gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, tsdh <at> gnu.org
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sun, 9 Jun 2024 04:10:42 -0700

David Fussner <dfussner <at> googlemail.com> writes:

> I can if requested send a patch for the manual etags tests, also, in
> case that might prove helpful down the line.

I believe that such tests would help, yes.  Thanks in advance.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sun, 09 Jun 2024 12:41:03 GMT) Full text and rfc822 format available.

Message #284 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: David Fussner <dfussner <at> googlemail.com>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, ikumi <at> ikumi.que.jp, dgutov <at> yandex.ru, arash <at> gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, tsdh <at> gnu.org
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sun, 9 Jun 2024 07:36:50 -0400

David Fussner <dfussner <at> googlemail.com> writes:

> In case the changeset might prove acceptable for version 30, I attach
> the latest patch, which clears out the code I was using to simplify
> testing of the AUCTeX modes.

I have some comments and questions:

- Does this need a NEWS entry?

- I see the brief text you added to tex-mode.el explaining more about
  expl3, but perhaps there should be a clear explanation in the commit
  message too.

- [Optional: In most places you use spaces for indentation, but here and
  there, there is a single tab followed by one or more spaces.  Consider
  using only spaces.]

> From: David Fussner <dfussner <at> googlemail.com>
> Date: Wed, 5 Jun 2024 10:26:18 +0100
> Subject: [PATCH] Provide a modified xref backend for TeX buffers

[Don't forget to add the bug number to the ChangeLog.]

> diff --git a/doc/emacs/maintaining.texi b/doc/emacs/maintaining.texi
> index 579098c81b1..a064103aa25 100644
> --- a/doc/emacs/maintaining.texi
> +++ b/doc/emacs/maintaining.texi
> @@ -2529,6 +2529,15 @@ Identifier Search
>  referenced.  The XREF mode commands are available in this buffer, see
>  @ref{Xref Commands}.
>
> +When invoked in a buffer whose major mode uses the @code{etags} backend,
> +@kbd{M-?} searches files and buffers whose major mode matches that of
> +the original buffer.  It guesses that mode from file extensions, so if
> +@kbd{M-?} seems to be skipping relevant buffers or files, try
> +customizing either the variable @code{semantic-symref-filepattern-alist}

Why does this speak of Semantic?  Does `xref-find-references` depend on
it somehow?

> diff --git a/lib-src/etags.c b/lib-src/etags.c
> index 03bc55de03d..6bc734e7df0 100644
> --- a/lib-src/etags.c
> +++ b/lib-src/etags.c
> @@ -5740,11 +5756,25 @@ Scheme_functions (FILE *inf)
>  static linebuffer *TEX_toktab = NULL; /* Table with tag tokens */
>
>  /* Default set of control sequences to put into TEX_toktab.
> -   The value of environment var TEXTAGS is prepended to this.  */
> +   The value of environment var TEXTAGS is prepended to this.
> +   (2024) Add variants of '\def', some additional LaTeX (and
> +   former xparse) commands, common variants from the
> +   'etoolbox' package, and the main expl3 commands. */

Do we really need this comment?  Isn't the git log enough?

> diff --git a/lisp/textmodes/tex-mode.el b/lisp/textmodes/tex-mode.el
> index 97c950267c6..fbf08840699 100644
> --- a/lisp/textmodes/tex-mode.el
> +++ b/lisp/textmodes/tex-mode.el
> @@ -636,6 +636,14 @@ tex-font-lock-keywords-2
>  	      3 '(tex-font-lock-append-prop 'bold) 'append)))))
>     "Gaudy expressions to highlight in TeX modes.")
>
> +(defvar-local tex-expl-region-list nil
> +  "List of region boundaries where expl3 syntax is active.
> +It will be nil in buffers where expl3 syntax is always active, e.g.,

Please prefer "for example" to "e.g.".

> +(defvar-local tex-expl-buffer-p nil
> +  "Non-nil in buffers where expl3 syntax is always active.")

What does "always active" mean as compared to just "active"?
Does this need to be elaborated?

> +;; Populate `semantic-symref-filepattern-alist' for the in-tree modes;
> +;; AUCTeX is doing the same for its modes.
> +(with-eval-after-load 'semantic/symref/grep
> +  (defvar semantic-symref-filepattern-alist)
> +  (push '(latex-mode "*.[tT]e[xX]" "*.ltx" "*.sty" "*.cl[so]"
> +                     "*.bbl" "*.drv" "*.hva")
> +        semantic-symref-filepattern-alist)
> +  (push '(plain-tex-mode "*.[tT]e[xX]" "*.ins")
> +        semantic-symref-filepattern-alist)
> +  (push '(doctex-mode "*.dtx") semantic-symref-filepattern-alist))

Doesn't this stuff rather belong in semantic itself?

> +(cl-defmethod xref-backend-references ((_backend (eql 'tex-etags)) identifier)
> +  "Find references of IDENTIFIER in TeX buffers and files."
> +  (require 'semantic/symref/grep)

Are we sure that we want to make this depend on semantic?

Is there any way around that?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sun, 09 Jun 2024 18:47:01 GMT) Full text and rfc822 format available.

Message #287 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Stefan Kangas <stefankangas <at> gmail.com>,
 David Fussner <dfussner <at> googlemail.com>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, arash <at> gnu.org, tsdh <at> gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, ikumi <at> ikumi.que.jp
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sun, 9 Jun 2024 21:45:40 +0300

On 09/06/2024 14:36, Stefan Kangas wrote:
>> diff --git a/doc/emacs/maintaining.texi b/doc/emacs/maintaining.texi
>> index 579098c81b1..a064103aa25 100644
>> --- a/doc/emacs/maintaining.texi
>> +++ b/doc/emacs/maintaining.texi
>> @@ -2529,6 +2529,15 @@ Identifier Search
>>   referenced.  The XREF mode commands are available in this buffer, see
>>   @ref{Xref Commands}.
>>
>> +When invoked in a buffer whose major mode uses the @code{etags} backend,
>> +@kbd{M-?} searches files and buffers whose major mode matches that of
>> +the original buffer.  It guesses that mode from file extensions, so if
>> +@kbd{M-?} seems to be skipping relevant buffers or files, try
>> +customizing either the variable @code{semantic-symref-filepattern-alist}
> Why does this speak of Semantic?  Does `xref-find-references` depend on
> it somehow?

xref-backend-references's default implementation calls 
semantic-symref-perform-search under the cover. It's just the "symref" 
package, not the parser or the rest.

David's addition also uses it, and that's fine.

>> +;; Populate `semantic-symref-filepattern-alist' for the in-tree modes;
>> +;; AUCTeX is doing the same for its modes.
>> +(with-eval-after-load 'semantic/symref/grep
>> +  (defvar semantic-symref-filepattern-alist)
>> +  (push '(latex-mode "*.[tT]e[xX]" "*.ltx" "*.sty" "*.cl[so]"
>> +                     "*.bbl" "*.drv" "*.hva")
>> +        semantic-symref-filepattern-alist)
>> +  (push '(plain-tex-mode "*.[tT]e[xX]" "*.ins")
>> +        semantic-symref-filepattern-alist)
>> +  (push '(doctex-mode "*.dtx") semantic-symref-filepattern-alist))
> Doesn't this stuff rather belong in semantic itself?

Good point.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sun, 09 Jun 2024 18:53:02 GMT) Full text and rfc822 format available.

Message #290 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Tassilo Horn <tsdh <at> gnu.org>, Arash Esbati <arash <at> gnu.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, Dmitry Gutov <dgutov <at> yandex.ru>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sun, 9 Jun 2024 19:42:29 +0100

[Message part 1 (text/plain, inline)]

Hi Stefan,

Thanks very much for the review. I'll try to address most of it in a
revised patch tomorrow, but I did want to explain now that the default
implementation of xref-find-references in xref.el uses semantic symref
functionality to search files. The TeX backend puts a wrapper around it,
but doesn't, following Dmitry's advice, try to reinvent that wheel. As for
where we set the filepattern variable, I don't mind, but AUCTeX is setting
it internally for their modes, so it seemed OK in tex-mode, too.

More tomorrow, and thanks again.

David.

On Sun, 9 Jun 2024, 12:36 Stefan Kangas, <stefankangas <at> gmail.com> wrote:

> David Fussner <dfussner <at> googlemail.com> writes:
>
> > In case the changeset might prove acceptable for version 30, I attach
> > the latest patch, which clears out the code I was using to simplify
> > testing of the AUCTeX modes.
>
> I have some comments and questions:
>
> - Does this need a NEWS entry?
>
> - I see the brief text you added to tex-mode.el explaining more about
>   expl3, but perhaps there should be a clear explanation in the commit
>   message too.
>
> - [Optional: In most places you use spaces for indentation, but here and
>   there, there is a single tab followed by one or more spaces.  Consider
>   using only spaces.]
>
> > From: David Fussner <dfussner <at> googlemail.com>
> > Date: Wed, 5 Jun 2024 10:26:18 +0100
> > Subject: [PATCH] Provide a modified xref backend for TeX buffers
>
> [Don't forget to add the bug number to the ChangeLog.]
>
> > diff --git a/doc/emacs/maintaining.texi b/doc/emacs/maintaining.texi
> > index 579098c81b1..a064103aa25 100644
> > --- a/doc/emacs/maintaining.texi
> > +++ b/doc/emacs/maintaining.texi
> > @@ -2529,6 +2529,15 @@ Identifier Search
> >  referenced.  The XREF mode commands are available in this buffer, see
> >  @ref{Xref Commands}.
> >
> > +When invoked in a buffer whose major mode uses the @code{etags} backend,
> > +@kbd{M-?} searches files and buffers whose major mode matches that of
> > +the original buffer.  It guesses that mode from file extensions, so if
> > +@kbd{M-?} seems to be skipping relevant buffers or files, try
> > +customizing either the variable @code{semantic-symref-filepattern-alist}
>
> Why does this speak of Semantic?  Does `xref-find-references` depend on
> it somehow?
>
> > diff --git a/lib-src/etags.c b/lib-src/etags.c
> > index 03bc55de03d..6bc734e7df0 100644
> > --- a/lib-src/etags.c
> > +++ b/lib-src/etags.c
> > @@ -5740,11 +5756,25 @@ Scheme_functions (FILE *inf)
> >  static linebuffer *TEX_toktab = NULL; /* Table with tag tokens */
> >
> >  /* Default set of control sequences to put into TEX_toktab.
> > -   The value of environment var TEXTAGS is prepended to this.  */
> > +   The value of environment var TEXTAGS is prepended to this.
> > +   (2024) Add variants of '\def', some additional LaTeX (and
> > +   former xparse) commands, common variants from the
> > +   'etoolbox' package, and the main expl3 commands. */
>
> Do we really need this comment?  Isn't the git log enough?
>
> > diff --git a/lisp/textmodes/tex-mode.el b/lisp/textmodes/tex-mode.el
> > index 97c950267c6..fbf08840699 100644
> > --- a/lisp/textmodes/tex-mode.el
> > +++ b/lisp/textmodes/tex-mode.el
> > @@ -636,6 +636,14 @@ tex-font-lock-keywords-2
> >             3 '(tex-font-lock-append-prop 'bold) 'append)))))
> >     "Gaudy expressions to highlight in TeX modes.")
> >
> > +(defvar-local tex-expl-region-list nil
> > +  "List of region boundaries where expl3 syntax is active.
> > +It will be nil in buffers where expl3 syntax is always active, e.g.,
>
> Please prefer "for example" to "e.g.".
>
> > +(defvar-local tex-expl-buffer-p nil
> > +  "Non-nil in buffers where expl3 syntax is always active.")
>
> What does "always active" mean as compared to just "active"?
> Does this need to be elaborated?
>
> > +;; Populate `semantic-symref-filepattern-alist' for the in-tree modes;
> > +;; AUCTeX is doing the same for its modes.
> > +(with-eval-after-load 'semantic/symref/grep
> > +  (defvar semantic-symref-filepattern-alist)
> > +  (push '(latex-mode "*.[tT]e[xX]" "*.ltx" "*.sty" "*.cl[so]"
> > +                     "*.bbl" "*.drv" "*.hva")
> > +        semantic-symref-filepattern-alist)
> > +  (push '(plain-tex-mode "*.[tT]e[xX]" "*.ins")
> > +        semantic-symref-filepattern-alist)
> > +  (push '(doctex-mode "*.dtx") semantic-symref-filepattern-alist))
>
> Doesn't this stuff rather belong in semantic itself?
>
> > +(cl-defmethod xref-backend-references ((_backend (eql 'tex-etags))
> identifier)
> > +  "Find references of IDENTIFIER in TeX buffers and files."
> > +  (require 'semantic/symref/grep)
>
> Are we sure that we want to make this depend on semantic?
>
> Is there any way around that?
>

[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sun, 09 Jun 2024 21:41:01 GMT) Full text and rfc822 format available.

Message #293 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org, ikumi <at> ikumi.que.jp,
 David Fussner <dfussner <at> googlemail.com>, arash <at> gnu.org,
 Stefan Kangas <stefankangas <at> gmail.com>, tsdh <at> gnu.org,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sun, 09 Jun 2024 17:03:43 -0400

>>> +;; Populate `semantic-symref-filepattern-alist' for the in-tree modes;
>>> +;; AUCTeX is doing the same for its modes.
>>> +(with-eval-after-load 'semantic/symref/grep
>>> +  (defvar semantic-symref-filepattern-alist)
>>> +  (push '(latex-mode "*.[tT]e[xX]" "*.ltx" "*.sty" "*.cl[so]"
>>> +                     "*.bbl" "*.drv" "*.hva")
>>> +        semantic-symref-filepattern-alist)
>>> +  (push '(plain-tex-mode "*.[tT]e[xX]" "*.ins")
>>> +        semantic-symref-filepattern-alist)
>>> +  (push '(doctex-mode "*.dtx") semantic-symref-filepattern-alist))
>> Doesn't this stuff rather belong in semantic itself?
> Good point.

FWIW, I think the responsability of `symref.el` is to provide hooks like
the `semantic-symref-filepattern-alist` var along with the code that
uses them, but the mode-specific settings, such as knowledge about which
glob patterns should be used for `latex-mode` belong to the
corresponding mode.


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sun, 09 Jun 2024 22:15:01 GMT) Full text and rfc822 format available.

Message #296 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 53749 <at> debbugs.gnu.org, ikumi <at> ikumi.que.jp,
 David Fussner <dfussner <at> googlemail.com>, arash <at> gnu.org,
 Stefan Kangas <stefankangas <at> gmail.com>, tsdh <at> gnu.org,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 10 Jun 2024 01:13:43 +0300

On 10/06/2024 00:03, Stefan Monnier via Bug reports for GNU Emacs, the 
Swiss army knife of text editors wrote:
>>>> +;; Populate `semantic-symref-filepattern-alist' for the in-tree modes;
>>>> +;; AUCTeX is doing the same for its modes.
>>>> +(with-eval-after-load 'semantic/symref/grep
>>>> +  (defvar semantic-symref-filepattern-alist)
>>>> +  (push '(latex-mode "*.[tT]e[xX]" "*.ltx" "*.sty" "*.cl[so]"
>>>> +                     "*.bbl" "*.drv" "*.hva")
>>>> +        semantic-symref-filepattern-alist)
>>>> +  (push '(plain-tex-mode "*.[tT]e[xX]" "*.ins")
>>>> +        semantic-symref-filepattern-alist)
>>>> +  (push '(doctex-mode "*.dtx") semantic-symref-filepattern-alist))
>>> Doesn't this stuff rather belong in semantic itself?
>> Good point.
> FWIW, I think the responsability of `symref.el` is to provide hooks like
> the `semantic-symref-filepattern-alist` var along with the code that
> uses them, but the mode-specific settings, such as knowledge about which
> glob patterns should be used for `latex-mode` belong to the
> corresponding mode.

I've been looking at semantic-symref-filepattern-alist like a workaround 
for the fast that the major modes don't provide enough relevant 
information in auto-mode-alist (or, to look at it differently, don't 
provide such info in some other variable which the auto-mode-alist entry 
would be computed from).

From that perspective, storing the missing association inside the 
semantic-symref package seems suitable. But a more "proper" place for it 
would be better, of course.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 10 Jun 2024 13:31:03 GMT) Full text and rfc822 format available.

Message #299 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org, ikumi <at> ikumi.que.jp, arash <at> gnu.org,
 Stefan Kangas <stefankangas <at> gmail.com>, tsdh <at> gnu.org,
 Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 10 Jun 2024 14:29:51 +0100

[Message part 1 (text/plain, inline)]

Hi Dmitry, Stefan M and Stefan K,

Here's the latest patch, with most of the modifications requested by
Stefan K. The code populating `semantic-symref-filepattern-alist` I've
left in tex-mode.el, because I wasn't sure how to adjudicate the
differing opinions on where it should go. Moving it won't take any
time, should that be the verdict. I've had a stab at a NEWS entry, and
included the patches to the manual etags test files. The diffs for the
test files are large, since we remove the TeX escape char from any tag
names where it occurs, as discussed elsewhere on this thread.

Please let me know if more changes are needed.

Best, David.

On Sun, 9 Jun 2024 at 23:13, Dmitry Gutov <dgutov <at> yandex.ru> wrote:
>
> On 10/06/2024 00:03, Stefan Monnier via Bug reports for GNU Emacs, the
> Swiss army knife of text editors wrote:
> >>>> +;; Populate `semantic-symref-filepattern-alist' for the in-tree modes;
> >>>> +;; AUCTeX is doing the same for its modes.
> >>>> +(with-eval-after-load 'semantic/symref/grep
> >>>> +  (defvar semantic-symref-filepattern-alist)
> >>>> +  (push '(latex-mode "*.[tT]e[xX]" "*.ltx" "*.sty" "*.cl[so]"
> >>>> +                     "*.bbl" "*.drv" "*.hva")
> >>>> +        semantic-symref-filepattern-alist)
> >>>> +  (push '(plain-tex-mode "*.[tT]e[xX]" "*.ins")
> >>>> +        semantic-symref-filepattern-alist)
> >>>> +  (push '(doctex-mode "*.dtx") semantic-symref-filepattern-alist))
> >>> Doesn't this stuff rather belong in semantic itself?
> >> Good point.
> > FWIW, I think the responsability of `symref.el` is to provide hooks like
> > the `semantic-symref-filepattern-alist` var along with the code that
> > uses them, but the mode-specific settings, such as knowledge about which
> > glob patterns should be used for `latex-mode` belong to the
> > corresponding mode.
>
> I've been looking at semantic-symref-filepattern-alist like a workaround
> for the fast that the major modes don't provide enough relevant
> information in auto-mode-alist (or, to look at it differently, don't
> provide such info in some other variable which the auto-mode-alist entry
> would be computed from).
>
>  From that perspective, storing the missing association inside the
> semantic-symref package seems suitable. But a more "proper" place for it
> would be better, of course.

[0005-Provide-a-modified-xref-backend-for-TeX-buffers.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 27 Jul 2024 15:40:02 GMT) Full text and rfc822 format available.

Message #302 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Arash Esbati <arash <at> gnu.org>
To: David Fussner via "Bug reports for GNU Emacs, the Swiss army knife of
 text editors" <bug-gnu-emacs <at> gnu.org>
Cc: 53749 <at> debbugs.gnu.org, ikumi <at> ikumi.que.jp, Dmitry Gutov <dgutov <at> yandex.ru>,
 David Fussner <dfussner <at> googlemail.com>,
 Stefan Kangas <stefankangas <at> gmail.com>, tsdh <at> gnu.org,
 Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 27 Jul 2024 17:39:01 +0200

David Fussner via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org> writes:

> Hi Dmitry, Stefan M and Stefan K,
>
> Here's the latest patch, with most of the modifications requested by
> Stefan K. The code populating `semantic-symref-filepattern-alist` I've
> left in tex-mode.el, because I wasn't sure how to adjudicate the
> differing opinions on where it should go. Moving it won't take any
> time, should that be the verdict. I've had a stab at a NEWS entry, and
> included the patches to the manual etags test files. The diffs for the
> test files are large, since we remove the TeX escape char from any tag
> names where it occurs, as discussed elsewhere on this thread.
>
> Please let me know if more changes are needed.

Ping!  Gents, is it possible to give David your comments?  There was
some effort behind this, maybe it's time to bring this to an end.  TIA.

Best, Arash

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 27 Jul 2024 15:40:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sun, 28 Jul 2024 23:59:02 GMT) Full text and rfc822 format available.

Message #308 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Arash Esbati <arash <at> gnu.org>, 53749 <at> debbugs.gnu.org
Cc: ikumi <at> ikumi.que.jp, dfussner <at> googlemail.com, monnier <at> iro.umontreal.ca,
 tsdh <at> gnu.org, eliz <at> gnu.org, stefankangas <at> gmail.com
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 29 Jul 2024 02:57:47 +0300

On 27/07/2024 18:39, Arash Esbati wrote:
> David Fussner via "Bug reports for GNU Emacs, the Swiss army knife of text editors"<bug-gnu-emacs <at> gnu.org>  writes:
> 
>> Hi Dmitry, Stefan M and Stefan K,
>>
>> Here's the latest patch, with most of the modifications requested by
>> Stefan K. The code populating `semantic-symref-filepattern-alist` I've
>> left in tex-mode.el, because I wasn't sure how to adjudicate the
>> differing opinions on where it should go. Moving it won't take any
>> time, should that be the verdict. I've had a stab at a NEWS entry, and
>> included the patches to the manual etags test files. The diffs for the
>> test files are large, since we remove the TeX escape char from any tag
>> names where it occurs, as discussed elsewhere on this thread.
>>
>> Please let me know if more changes are needed.
> Ping!  Gents, is it possible to give David your comments?  There was
> some effort behind this, maybe it's time to bring this to an end.  TIA.

It's good enough from my side, but I hope someone else could comment as 
well.

Additionally though, it might or might not be too big a change for Emacs 
30 (I'm not sure), and if we're talking about 31 only, then it could 
wait a little more for the review.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 29 Jul 2024 10:33:01 GMT) Full text and rfc822 format available.

Message #311 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 53749 <at> debbugs.gnu.org, ikumi <at> ikumi.que.jp, Arash Esbati <arash <at> gnu.org>,
 stefankangas <at> gmail.com, tsdh <at> gnu.org, eliz <at> gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 29 Jul 2024 11:31:21 +0100

Hi Dmitry and Arash,

> Additionally though, it might or might not be too big a change for Emacs
> 30 (I'm not sure), and if we're talking about 31 only, then it could
> wait a little more for the review.

Good point -- I'm OK either way. Getting it tested, even lightly, on
master for a while might make sense for a patch of its size, but
others will have a better sense of it.

Best, David.

On Mon, 29 Jul 2024 at 00:58, Dmitry Gutov <dgutov <at> yandex.ru> wrote:
>
> On 27/07/2024 18:39, Arash Esbati wrote:
> > David Fussner via "Bug reports for GNU Emacs, the Swiss army knife of text editors"<bug-gnu-emacs <at> gnu.org>  writes:
> >
> >> Hi Dmitry, Stefan M and Stefan K,
> >>
> >> Here's the latest patch, with most of the modifications requested by
> >> Stefan K. The code populating `semantic-symref-filepattern-alist` I've
> >> left in tex-mode.el, because I wasn't sure how to adjudicate the
> >> differing opinions on where it should go. Moving it won't take any
> >> time, should that be the verdict. I've had a stab at a NEWS entry, and
> >> included the patches to the manual etags test files. The diffs for the
> >> test files are large, since we remove the TeX escape char from any tag
> >> names where it occurs, as discussed elsewhere on this thread.
> >>
> >> Please let me know if more changes are needed.
> > Ping!  Gents, is it possible to give David your comments?  There was
> > some effort behind this, maybe it's time to bring this to an end.  TIA.
>
> It's good enough from my side, but I hope someone else could comment as
> well.
>
> Additionally though, it might or might not be too big a change for Emacs
> 30 (I'm not sure), and if we're talking about 31 only, then it could
> wait a little more for the review.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 14 Sep 2024 13:45:02 GMT) Full text and rfc822 format available.

Message #314 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Dmitry Gutov <dgutov <at> yandex.ru>, Arash Esbati <arash <at> gnu.org>,
 53749 <at> debbugs.gnu.org
Cc: ikumi <at> ikumi.que.jp, eliz <at> gnu.org, monnier <at> iro.umontreal.ca,
 dfussner <at> googlemail.com, tsdh <at> gnu.org
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 14 Sep 2024 06:43:01 -0700

Dmitry Gutov <dgutov <at> yandex.ru> writes:

> It's good enough from my side, but I hope someone else could comment as
> well.
>
> Additionally though, it might or might not be too big a change for Emacs
> 30 (I'm not sure), and if we're talking about 31 only, then it could
> wait a little more for the review.

Are we happy to install this now, or are we still waiting for more
comments?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 14 Sep 2024 14:07:01 GMT) Full text and rfc822 format available.

Message #317 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 53749 <at> debbugs.gnu.org, ikumi <at> ikumi.que.jp, dgutov <at> yandex.ru,
 dfussner <at> googlemail.com, arash <at> gnu.org, monnier <at> iro.umontreal.ca, tsdh <at> gnu.org
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 14 Sep 2024 17:06:22 +0300

> From: Stefan Kangas <stefankangas <at> gmail.com>
> Date: Sat, 14 Sep 2024 06:43:01 -0700
> Cc: ikumi <at> ikumi.que.jp, dfussner <at> googlemail.com, tsdh <at> gnu.org, eliz <at> gnu.org, 
> 	monnier <at> iro.umontreal.ca
> 
> Dmitry Gutov <dgutov <at> yandex.ru> writes:
> 
> > It's good enough from my side, but I hope someone else could comment as
> > well.
> >
> > Additionally though, it might or might not be too big a change for Emacs
> > 30 (I'm not sure), and if we're talking about 31 only, then it could
> > wait a little more for the review.
> 
> Are we happy to install this now, or are we still waiting for more
> comments?

We've waited long enough, I think.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 14 Sep 2024 15:04:01 GMT) Full text and rfc822 format available.

Message #320 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>, Stefan Kangas <stefankangas <at> gmail.com>
Cc: 53749 <at> debbugs.gnu.org, ikumi <at> ikumi.que.jp, dfussner <at> googlemail.com,
 arash <at> gnu.org, monnier <at> iro.umontreal.ca, tsdh <at> gnu.org
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 14 Sep 2024 18:02:59 +0300

On 14/09/2024 17:06, Eli Zaretskii wrote:
>> From: Stefan Kangas<stefankangas <at> gmail.com>
>> Date: Sat, 14 Sep 2024 06:43:01 -0700
>> Cc:ikumi <at> ikumi.que.jp,dfussner <at> googlemail.com,tsdh <at> gnu.org,eliz <at> gnu.org,
>> 	monnier <at> iro.umontreal.ca
>>
>> Dmitry Gutov<dgutov <at> yandex.ru>  writes:
>>
>>> It's good enough from my side, but I hope someone else could comment as
>>> well.
>>>
>>> Additionally though, it might or might not be too big a change for Emacs
>>> 30 (I'm not sure), and if we're talking about 31 only, then it could
>>> wait a little more for the review.
>> Are we happy to install this now, or are we still waiting for more
>> comments?
> We've waited long enough, I think.

I also agree.

Reply sent to Stefan Kangas <stefankangas <at> gmail.com>:
You have taken responsibility. (Sat, 14 Sep 2024 15:10:01 GMT) Full text and rfc822 format available.

Notification sent to David Fussner <dfussner <at> googlemail.com>:
bug acknowledged by developer. (Sat, 14 Sep 2024 15:10:02 GMT) Full text and rfc822 format available.

Message #325 received at 53749-done <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 53749-done <at> debbugs.gnu.org, ikumi <at> ikumi.que.jp, dgutov <at> yandex.ru,
 dfussner <at> googlemail.com, arash <at> gnu.org, monnier <at> iro.umontreal.ca, tsdh <at> gnu.org
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 14 Sep 2024 08:08:08 -0700

Version: 31.1

Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Stefan Kangas <stefankangas <at> gmail.com>
>> Date: Sat, 14 Sep 2024 06:43:01 -0700
>> Cc: ikumi <at> ikumi.que.jp, dfussner <at> googlemail.com, tsdh <at> gnu.org, eliz <at> gnu.org,
>> 	monnier <at> iro.umontreal.ca
>>
>> Dmitry Gutov <dgutov <at> yandex.ru> writes:
>>
>> > It's good enough from my side, but I hope someone else could comment as
>> > well.
>>
>> Are we happy to install this now, or are we still waiting for more
>> comments?
>
> We've waited long enough, I think.

Thanks, so I've now installed the patch on master.  Note that I broke
out the etags/ctags changes into a separate patch (in David's name) to
make the original patch a bit less unwieldy.

    3090b2304e7 Update the etags/ctags test files
    b44c00669ac Provide a modified xref backend for TeX buffers

And with that, I'm closing this bug report.  Congratulations to David
for landing his first contribution to Emacs, and thanks again.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 14 Sep 2024 15:30:02 GMT) Full text and rfc822 format available.

Message #328 received at 53749-done <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 53749-done <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dgutov <at> yandex.ru>, Arash Esbati <arash <at> gnu.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 14 Sep 2024 16:28:03 +0100

[Message part 1 (text/plain, inline)]

Many thanks to you all for your help and advice. I'm not subscribed to this
list, though I am a frequent lurker, so if I can be of any help fixing
issues with the patches please ping me if I'm being oblivious.

Best, David.

On Sat, 14 Sept 2024, 16:08 Stefan Kangas, <stefankangas <at> gmail.com> wrote:

> Version: 31.1
>
> Eli Zaretskii <eliz <at> gnu.org> writes:
>
> >> From: Stefan Kangas <stefankangas <at> gmail.com>
> >> Date: Sat, 14 Sep 2024 06:43:01 -0700
> >> Cc: ikumi <at> ikumi.que.jp, dfussner <at> googlemail.com, tsdh <at> gnu.org,
> eliz <at> gnu.org,
> >>      monnier <at> iro.umontreal.ca
> >>
> >> Dmitry Gutov <dgutov <at> yandex.ru> writes:
> >>
> >> > It's good enough from my side, but I hope someone else could comment
> as
> >> > well.
> >>
> >> Are we happy to install this now, or are we still waiting for more
> >> comments?
> >
> > We've waited long enough, I think.
>
> Thanks, so I've now installed the patch on master.  Note that I broke
> out the etags/ctags changes into a separate patch (in David's name) to
> make the original patch a bit less unwieldy.
>
>     3090b2304e7 Update the etags/ctags test files
>     b44c00669ac Provide a modified xref backend for TeX buffers
>
> And with that, I'm closing this bug report.  Congratulations to David
> for landing his first contribution to Emacs, and thanks again.
>

[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 14 Sep 2024 17:28:02 GMT) Full text and rfc822 format available.

Message #331 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dmitry <at> gutov.dev>, Arash Esbati <arash <at> gnu.org>,
 Stefan Kangas <stefankangas <at> gmail.com>, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 14 Sep 2024 19:26:07 +0200

Thanks for the contribution, David! My electronic servant complained about some regexps:

3881    (re-search-backward (concat "[]["
3882                                (mapconcat #'regexp-quote
3883                                           (mapcar #'char-to-string
3884                                                   tex-thingatpt-exclude-chars))
3885                                "\"*`'#=&()%,|$[:cntrl:][:blank:]]"))

This is not a correct way to build a regexp; `regexp-quote` can only be used to quote strings that should match literally, not characters inside [...], where backslashes have no escaping power.

There are various ways of doing this properly. I would suggest something like

  (rx-to-string `(or (in "\"#$%&'()*,=[]`|" cntrl blank)
                     ,@tex-thingatpt-exclude-chars)
                t))

but it also depends on what you want that `tex-thingatpt-exclude-chars` variable to be. Should it be a user option (defcustom)? The variable's doc string is a wall of text that basically says that it can be set to whatever you want but things will stop working so you'd better not.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 16 Sep 2024 08:36:02 GMT) Full text and rfc822 format available.

Message #334 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dmitry <at> gutov.dev>, Arash Esbati <arash <at> gnu.org>,
 Stefan Kangas <stefankangas <at> gmail.com>, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 16 Sep 2024 09:33:48 +0100

Hi Mattias,

Thanks for this, and of course you are quite right on both counts:

> but it also depends on what you want that `tex-thingatpt-exclude-chars` variable to be

I believe it's time I abandoned this piece of idiocy, already flagged
by several commentators before you. The basic motivation was to allow
using the new xref code in TeX files that used non-standard escape and
grouping characters. Both etags and AUCTeX address this possibility,
but the latter does it right by having three separate variables:
TeX-esc, TeX-grop, and TeX-grcl, so that these can be part of the
whole regular expression apparatus for syntax highlighting and
everything else. I had in the back of my mind to implement something
similar in tex-mode, but the half-measure of
`tex-thingatpt-exclude-chars` is the wrong way to go.

I propose, therefore, to eliminate this variable from tex-mode and
just hard code the standard escape and grouping characters for the
time being, as in the rest of tex-mode.el, then to submit later a
patch to implement something similar to what AUCTeX does, and one
interoperable with AUCTeX, also.

> This is not a correct way to build a regexp;

Given the above changes, would you and your electronic servant be OK
with code like the following?

(re-search-forward "[][\\{}\"*`'#=&()%,|$[:cntrl:][:blank:]]")

(I still, I'm ashamed to confess, find the traditional regexp syntax
easier to read than rx notation, but could overcome this if you think
rx is a better fit here.)

What do you think?

Best,  David.

On Sat, 14 Sept 2024 at 18:26, Mattias Engdegård
<mattias.engdegard <at> gmail.com> wrote:
>
> Thanks for the contribution, David! My electronic servant complained about some regexps:
>
> 3881    (re-search-backward (concat "[]["
> 3882                                (mapconcat #'regexp-quote
> 3883                                           (mapcar #'char-to-string
> 3884                                                   tex-thingatpt-exclude-chars))
> 3885                                "\"*`'#=&()%,|$[:cntrl:][:blank:]]"))
>
> This is not a correct way to build a regexp; `regexp-quote` can only be used to quote strings that should match literally, not characters inside [...], where backslashes have no escaping power.
>
> There are various ways of doing this properly. I would suggest something like
>
>   (rx-to-string `(or (in "\"#$%&'()*,=[]`|" cntrl blank)
>                      ,@tex-thingatpt-exclude-chars)
>                 t))
>
> but it also depends on what you want that `tex-thingatpt-exclude-chars` variable to be. Should it be a user option (defcustom)? The variable's doc string is a wall of text that basically says that it can be set to whatever you want but things will stop working so you'd better not.
>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Mon, 16 Sep 2024 13:32:01 GMT) Full text and rfc822 format available.

Message #337 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dmitry <at> gutov.dev>, Arash Esbati <arash <at> gnu.org>,
 Stefan Kangas <stefankangas <at> gmail.com>, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Mon, 16 Sep 2024 14:30:20 +0100

[Message part 1 (text/plain, inline)]

Hi Mattias, Arash,

Here's a patch to fix the regexps and delete the unnecessary variable.
Any thoughts?

I'll look at adding new vars for the TeX escape and grouping chars,
creating a new bug number when I have a working patch.

Thanks, David.

On Mon, 16 Sept 2024 at 09:33, David Fussner <dfussner <at> googlemail.com> wrote:
>
> Hi Mattias,
>
> Thanks for this, and of course you are quite right on both counts:
>
> > but it also depends on what you want that `tex-thingatpt-exclude-chars` variable to be
>
> I believe it's time I abandoned this piece of idiocy, already flagged
> by several commentators before you. The basic motivation was to allow
> using the new xref code in TeX files that used non-standard escape and
> grouping characters. Both etags and AUCTeX address this possibility,
> but the latter does it right by having three separate variables:
> TeX-esc, TeX-grop, and TeX-grcl, so that these can be part of the
> whole regular expression apparatus for syntax highlighting and
> everything else. I had in the back of my mind to implement something
> similar in tex-mode, but the half-measure of
> `tex-thingatpt-exclude-chars` is the wrong way to go.
>
> I propose, therefore, to eliminate this variable from tex-mode and
> just hard code the standard escape and grouping characters for the
> time being, as in the rest of tex-mode.el, then to submit later a
> patch to implement something similar to what AUCTeX does, and one
> interoperable with AUCTeX, also.
>
> > This is not a correct way to build a regexp;
>
> Given the above changes, would you and your electronic servant be OK
> with code like the following?
>
> (re-search-forward "[][\\{}\"*`'#=&()%,|$[:cntrl:][:blank:]]")
>
> (I still, I'm ashamed to confess, find the traditional regexp syntax
> easier to read than rx notation, but could overcome this if you think
> rx is a better fit here.)
>
> What do you think?
>
> Best,  David.
>
> On Sat, 14 Sept 2024 at 18:26, Mattias Engdegård
> <mattias.engdegard <at> gmail.com> wrote:
> >
> > Thanks for the contribution, David! My electronic servant complained about some regexps:
> >
> > 3881    (re-search-backward (concat "[]["
> > 3882                                (mapconcat #'regexp-quote
> > 3883                                           (mapcar #'char-to-string
> > 3884                                                   tex-thingatpt-exclude-chars))
> > 3885                                "\"*`'#=&()%,|$[:cntrl:][:blank:]]"))
> >
> > This is not a correct way to build a regexp; `regexp-quote` can only be used to quote strings that should match literally, not characters inside [...], where backslashes have no escaping power.
> >
> > There are various ways of doing this properly. I would suggest something like
> >
> >   (rx-to-string `(or (in "\"#$%&'()*,=[]`|" cntrl blank)
> >                      ,@tex-thingatpt-exclude-chars)
> >                 t))
> >
> > but it also depends on what you want that `tex-thingatpt-exclude-chars` variable to be. Should it be a user option (defcustom)? The variable's doc string is a wall of text that basically says that it can be set to whatever you want but things will stop working so you'd better not.
> >

[0001-Fix-regexps-for-TeX-xref-backend-Bug-53749.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Tue, 17 Sep 2024 12:37:01 GMT) Full text and rfc822 format available.

Message #340 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: David Fussner <dfussner <at> googlemail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dmitry <at> gutov.dev>, Arash Esbati <arash <at> gnu.org>,
 Stefan Kangas <stefankangas <at> gmail.com>, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Tue, 17 Sep 2024 14:35:00 +0200

16 sep. 2024 kl. 10.33 skrev David Fussner <dfussner <at> googlemail.com>:

> (re-search-forward "[][\\{}\"*`'#=&()%,|$[:cntrl:][:blank:]]")

Fine as far as I'm concerned.

> (I still, I'm ashamed to confess, find the traditional regexp syntax
> easier to read than rx notation, but could overcome this if you think
> rx is a better fit here.)

There is not a big difference in this case so stick to what you prefer.

Otherwise, I encourage you to try using rx from time to time – it's habit-forming!

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 21 Sep 2024 02:10:01 GMT) Full text and rfc822 format available.

Message #343 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: David Fussner <dfussner <at> googlemail.com>, 
 Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Dmitry Gutov <dmitry <at> gutov.dev>, Arash Esbati <arash <at> gnu.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Fri, 20 Sep 2024 19:07:47 -0700

David Fussner <dfussner <at> googlemail.com> writes:

> Hi Mattias, Arash,
>
> Here's a patch to fix the regexps and delete the unnecessary variable.
> Any thoughts?

Thanks, installed on master.

> I'll look at adding new vars for the TeX escape and grouping chars,
> creating a new bug number when I have a working patch.

Thanks in advance for this also.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53749; Package emacs. (Sat, 21 Sep 2024 16:47:01 GMT) Full text and rfc822 format available.

Message #346 received at 53749 <at> debbugs.gnu.org (full text, mbox):

From: David Fussner <dfussner <at> googlemail.com>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 53749 <at> debbugs.gnu.org, Ikumi Keita <ikumi <at> ikumi.que.jp>,
 Mattias Engdegård <mattias.engdegard <at> gmail.com>,
 Dmitry Gutov <dmitry <at> gutov.dev>, Arash Esbati <arash <at> gnu.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, Tassilo Horn <tsdh <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#53749: 29.0.50; [PATCH] Xref backend for TeX buffers
Date: Sat, 21 Sep 2024 17:44:31 +0100

[Message part 1 (text/plain, inline)]

Thanks for installing it, Stefan.

On Sat, 21 Sept 2024, 03:07 Stefan Kangas, <stefankangas <at> gmail.com> wrote:

> David Fussner <dfussner <at> googlemail.com> writes:
>
> > Hi Mattias, Arash,
> >
> > Here's a patch to fix the regexps and delete the unnecessary variable.
> > Any thoughts?
>
> Thanks, installed on master.
>
> > I'll look at adding new vars for the TeX escape and grouping chars,
> > creating a new bug number when I have a working patch.
>
> Thanks in advance for this also.
>

[Message part 2 (text/html, inline)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 20 Oct 2024 11:24:11 GMT) Full text and rfc822 format available.

This bug report was last modified 267 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #53749 29.0.50; [PATCH] Xref backend for TeX buffers

GNU bug report logs - #53749
29.0.50; [PATCH] Xref backend for TeX buffers