GNU bug report logs - #23501
Non-regex-based syntax highlighting

Previous Next

Package: emacs;

Reported by: Nir Friedman <quicknir <at> gmail.com>

Date: Tue, 10 May 2016 03:29:02 UTC

Severity: wishlist

Done: Stefan Kangas <stefan <at> marxist.se>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 23501 in the body.
You can then email your comments to 23501 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#23501; Package emacs. (Tue, 10 May 2016 03:29:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Nir Friedman <quicknir <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Tue, 10 May 2016 03:29:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Nir Friedman <quicknir <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Non-regex-based syntax highlighting
Date: Mon, 9 May 2016 23:12:47 -0400
[Message part 1 (text/plain, inline)]
I'm considering using emacs as a platform for C++ development. One thing
that seems to lag behind on emacs at the moment is that all of the syntax
highlighting for C++ is (as far as I can tell) regex based. This severely
limits the accuracy and discrimination that the syntax highlighter can
achieve. There are now some packages for emacs that use a clang based
backends to get actual AST information. Perhaps it would be possible to
write some kind of hooks or template for major modes that would make it
easier for package authors to change how syntax highlighting is performed
in major modes?
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#23501; Package emacs. (Tue, 10 May 2016 15:58:01 GMT) Full text and rfc822 format available.

Message #8 received at 23501 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Nir Friedman <quicknir <at> gmail.com>
Cc: 23501 <at> debbugs.gnu.org
Subject: Re: bug#23501: Non-regex-based syntax highlighting
Date: Tue, 10 May 2016 11:57:25 -0400
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

We develop GCC as well as Emacs.  To adopt a competitor to GCC
as a "solition" would be self defeating.

A proper solution is to extend GCC so that it does the necessary job.

-- 
Dr Richard Stallman
President, Free Software Foundation (gnu.org, fsf.org)
Internet Hall-of-Famer (internethalloffame.org)
Skype: No way! See stallman.org/skype.html.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#23501; Package emacs. (Tue, 10 May 2016 16:00:03 GMT) Full text and rfc822 format available.

Message #11 received at 23501 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Nir Friedman <quicknir <at> gmail.com>
Cc: 23501 <at> debbugs.gnu.org
Subject: Re: bug#23501: Non-regex-based syntax highlighting
Date: Tue, 10 May 2016 18:59:22 +0300
> From: Nir Friedman <quicknir <at> gmail.com>
> Date: Mon, 9 May 2016 23:12:47 -0400
> 
> I'm considering using emacs as a platform for C++ development. One thing that seems to lag behind on
> emacs at the moment is that all of the syntax highlighting for C++ is (as far as I can tell) regex based. This
> severely limits the accuracy and discrimination that the syntax highlighter can achieve. There are now some
> packages for emacs that use a clang based backends to get actual AST information. Perhaps it would be
> possible to write some kind of hooks or template for major modes that would make it easier for package
> authors to change how syntax highlighting is performed in major modes?

Sorry, I don't think I really understand what is the complaint/issue
you are raising here, and what solution would you like to suggest for
those issues.  Could you perhaps elaborate?  A specific example where
the current code doesn't work would be a good starting point.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#23501; Package emacs. (Tue, 10 May 2016 18:57:02 GMT) Full text and rfc822 format available.

Message #14 received at 23501 <at> debbugs.gnu.org (full text, mbox):

From: Nir Friedman <quicknir <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 23501 <at> debbugs.gnu.org
Subject: Re: bug#23501: Non-regex-based syntax highlighting
Date: Tue, 10 May 2016 14:55:41 -0400
[Message part 1 (text/plain, inline)]
For instance, suppose I write some C++ that looks like this:

using MyType = Something::OtherType;

There's no way to determine locally whether Something is a namespace or
itself a type, so a regex based syntax highlighter cannot consistently
color namespaces and classes differently. To take one example, Eclipse will
perform this determination and will consistently color namespaces and
classes any color you like. It can do this because it parses the code and
uses the AST. It makes many more useful distinctions which cannot be made
locally; for example when calling a function foo from a member function bar
of an object, there is no way to easily tell whether foo is also a member
of the same object as bar, or whether foo is just a free function in the
same namespace. One has privileged access and the other probably doesn't,
so it's a genuinely useful distinction.

I guess I'm a bit less clear on the solution, because I don't have a good
sense of who the owner of the C++ major mode is, and how the code is
structured. My thinking was that perhaps hooks could be added to make it
easier for plugin writers to modify the syntax coloring of the major mode.
As opposed to plugin writers needing to rewrite the C++ major mode from
scratch just to change the syntax coloring.

On Tue, May 10, 2016 at 11:59 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:

> > From: Nir Friedman <quicknir <at> gmail.com>
> > Date: Mon, 9 May 2016 23:12:47 -0400
> >
> > I'm considering using emacs as a platform for C++ development. One thing
> that seems to lag behind on
> > emacs at the moment is that all of the syntax highlighting for C++ is
> (as far as I can tell) regex based. This
> > severely limits the accuracy and discrimination that the syntax
> highlighter can achieve. There are now some
> > packages for emacs that use a clang based backends to get actual AST
> information. Perhaps it would be
> > possible to write some kind of hooks or template for major modes that
> would make it easier for package
> > authors to change how syntax highlighting is performed in major modes?
>
> Sorry, I don't think I really understand what is the complaint/issue
> you are raising here, and what solution would you like to suggest for
> those issues.  Could you perhaps elaborate?  A specific example where
> the current code doesn't work would be a good starting point.
>
> Thanks.
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#23501; Package emacs. (Tue, 10 May 2016 19:22:02 GMT) Full text and rfc822 format available.

Message #17 received at 23501 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Nir Friedman <quicknir <at> gmail.com>
Cc: 23501 <at> debbugs.gnu.org
Subject: Re: bug#23501: Non-regex-based syntax highlighting
Date: Tue, 10 May 2016 22:21:35 +0300
> From: Nir Friedman <quicknir <at> gmail.com>
> Date: Tue, 10 May 2016 14:55:41 -0400
> Cc: 23501 <at> debbugs.gnu.org
> 
> I guess I'm a bit less clear on the solution, because I don't have a good sense of who the owner of the C++
> major mode is, and how the code is structured. My thinking was that perhaps hooks could be added to make
> it easier for plugin writers to modify the syntax coloring of the major mode. As opposed to plugin writers
> needing to rewrite the C++ major mode from scratch just to change the syntax coloring.

Colors are added at display time, so hooks will not help here.  Or at
least it isn't immediately clear to me how they could help.

I suggest to study how syntax highlighting works in Emacs, including
the JIT font-lock feature and its relation to the display engine.
Until you have a good understanding of how this stuff works, I don't
think you will be able to come with a design for hooks which external
tools could use for this purpose.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#23501; Package emacs. (Tue, 10 May 2016 20:17:01 GMT) Full text and rfc822 format available.

Message #20 received at 23501 <at> debbugs.gnu.org (full text, mbox):

From: Nir Friedman <quicknir <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 23501 <at> debbugs.gnu.org
Subject: Re: bug#23501: Non-regex-based syntax highlighting
Date: Tue, 10 May 2016 16:16:03 -0400
[Message part 1 (text/plain, inline)]
My idea for a hook was basically to make it possible to provide a callback
function to the Major mode. If this callback function is provided, then
when a new file is loaded or an existing one saved with modifications, the
callback function is called with the full path to the file. The callback
function must return something that basically tells the major mode how to
color everything. A simple way would just be to return a list of the colors
for every single non-whitespace character taken sequentially. A single very
fast pass through this list would then be able to color every character.

Is there a reason why that would not be workable? Also, can you point me to
where exactly (e.g. via link to the emacs github mirror) the major modes
are stored?



On Tue, May 10, 2016 at 3:21 PM, Eli Zaretskii <eliz <at> gnu.org> wrote:

> > From: Nir Friedman <quicknir <at> gmail.com>
> > Date: Tue, 10 May 2016 14:55:41 -0400
> > Cc: 23501 <at> debbugs.gnu.org
> >
> > I guess I'm a bit less clear on the solution, because I don't have a
> good sense of who the owner of the C++
> > major mode is, and how the code is structured. My thinking was that
> perhaps hooks could be added to make
> > it easier for plugin writers to modify the syntax coloring of the major
> mode. As opposed to plugin writers
> > needing to rewrite the C++ major mode from scratch just to change the
> syntax coloring.
>
> Colors are added at display time, so hooks will not help here.  Or at
> least it isn't immediately clear to me how they could help.
>
> I suggest to study how syntax highlighting works in Emacs, including
> the JIT font-lock feature and its relation to the display engine.
> Until you have a good understanding of how this stuff works, I don't
> think you will be able to come with a design for hooks which external
> tools could use for this purpose.
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#23501; Package emacs. (Wed, 11 May 2016 07:50:02 GMT) Full text and rfc822 format available.

Message #23 received at 23501 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Nir Friedman <quicknir <at> gmail.com>
Cc: 23501 <at> debbugs.gnu.org
Subject: Re: bug#23501: Non-regex-based syntax highlighting
Date: Wed, 11 May 2016 10:49:34 +0300
> From: Nir Friedman <quicknir <at> gmail.com>
> Date: Tue, 10 May 2016 16:16:03 -0400
> Cc: 23501 <at> debbugs.gnu.org
> 
> My idea for a hook was basically to make it possible to provide a callback function to the Major mode. If this
> callback function is provided, then when a new file is loaded or an existing one saved with modifications, the
> callback function is called with the full path to the file.

The syntax highlighting should change also when you modify the buffer,
not only when you save it.  How will that work with your proposed hook?

> The callback function must return something that
> basically tells the major mode how to color everything. A simple way would just be to return a list of the colors
> for every single non-whitespace character taken sequentially. A single very fast pass through this list would
> then be able to color every character.

The hook cannot return a color, because the colors are defined via
faces.  It should return faces instead.

> Is there a reason why that would not be workable?

Maybe it is workable, but you are missing too many details of how
syntax highlight works in Emacs.  As I wrote previously, I encourage
you to study how that works, in order for the proposal to be workable
and practical.

> Also, can you point me to where exactly (e.g. via link to the
> emacs github mirror) the major modes are stored?

It's not the major mode that you need to look at, it's the font-lock
machinery.  Major modes just use the font-lock features by setting the
font-lock faces on portions of the buffer.  Then at display time, the
visible portion of the buffer are displayed as specified by those
faces.  You will see that each major mode simply sets the font-lock
faces, and leaves the rest to the core features.

See font-lock.el and font-core.el for the font-lock features, and
jit-lock.el for the JIT coloring of the visible portions of the
buffer.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#23501; Package emacs. (Wed, 11 May 2016 17:57:01 GMT) Full text and rfc822 format available.

Message #26 received at 23501 <at> debbugs.gnu.org (full text, mbox):

From: John Mastro <john.b.mastro <at> gmail.com>
To: 23501 <at> debbugs.gnu.org
Cc: Nir Friedman <quicknir <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#23501: Non-regex-based syntax highlighting
Date: Wed, 11 May 2016 10:56:32 -0700
Nir Friedman <quicknir <at> gmail.com> wrote:

> Is there a reason why that would not be workable? Also, can you point me to
> where exactly (e.g. via link to the emacs github mirror) the major modes are
> stored?

To find a particular major mode (or other library), you can use
`find-library'. For instance, try `M-x find-library RET cc-mode RET' and
`M-x find-library RET font-lock RET'.

-- 
john




Reply sent to Stefan Kangas <stefan <at> marxist.se>:
You have taken responsibility. (Wed, 12 Aug 2020 02:32:02 GMT) Full text and rfc822 format available.

Notification sent to Nir Friedman <quicknir <at> gmail.com>:
bug acknowledged by developer. (Wed, 12 Aug 2020 02:32:03 GMT) Full text and rfc822 format available.

Message #31 received at 23501-done <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Nir Friedman <quicknir <at> gmail.com>, 23501-done <at> debbugs.gnu.org
Subject: Re: bug#23501: Non-regex-based syntax highlighting
Date: Tue, 11 Aug 2020 19:31:07 -0700
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Nir Friedman <quicknir <at> gmail.com>
>> Date: Tue, 10 May 2016 16:16:03 -0400
>> Cc: 23501 <at> debbugs.gnu.org
>>
>> My idea for a hook was basically to make it possible to provide a callback function to the Major mode. If this
>> callback function is provided, then when a new file is loaded or an existing one saved with modifications, the
>> callback function is called with the full path to the file.
>
> The syntax highlighting should change also when you modify the buffer,
> not only when you save it.  How will that work with your proposed hook?
>
>> The callback function must return something that
>> basically tells the major mode how to color everything. A simple way would just be to return a list of the colors
>> for every single non-whitespace character taken sequentially. A single very fast pass through this list would
>> then be able to color every character.
>
> The hook cannot return a color, because the colors are defined via
> faces.  It should return faces instead.
>
>> Is there a reason why that would not be workable?
>
> Maybe it is workable, but you are missing too many details of how
> syntax highlight works in Emacs.  As I wrote previously, I encourage
> you to study how that works, in order for the proposal to be workable
> and practical.
>
>> Also, can you point me to where exactly (e.g. via link to the
>> emacs github mirror) the major modes are stored?
>
> It's not the major mode that you need to look at, it's the font-lock
> machinery.  Major modes just use the font-lock features by setting the
> font-lock faces on portions of the buffer.  Then at display time, the
> visible portion of the buffer are displayed as specified by those
> faces.  You will see that each major mode simply sets the font-lock
> faces, and leaves the rest to the core features.
>
> See font-lock.el and font-core.el for the font-lock features, and
> jit-lock.el for the JIT coloring of the visible portions of the
> buffer.

It seems like there was a proposal here 4 years ago, that Eli said was
unworkable.  There were also some outstanding questions regarding said
proposal.  But there were no further updates after that.

I'm therefore closing this bug report now.  If anyone wants to continue
working on something along these lines, feel free to reopen this bug
report or file a new one.

Thanks.

Best regards,
Stefan Kangas




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 09 Sep 2020 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 230 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.