GNU bug report logs - #40844
html mode sometimes fooled by apostrophe

Previous Next

Package: emacs;

Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>

Date: Sat, 25 Apr 2020 11:27:02 UTC

Severity: minor

Tags: confirmed, patch

Merged with 43941, 46312

Found in versions 26.3, 27.0.91

Fixed in version 28.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 40844 in the body.
You can then email your comments to 40844 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#40844; Package emacs. (Sat, 25 Apr 2020 11:27:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sat, 25 Apr 2020 11:27:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: bug-gnu-emacs <at> gnu.org
Subject: html mode sometimes fooled by apostrophe
Date: Sat, 25 Apr 2020 19:26:04 +0800
$ wget https://www.jidanni.org/lang/pinyin/older.html
$ emacs -q older.html

You will see that by mid-file, emacs has already got confused by
U+0027 APOSTROPHE
causing it to turn on and off font-lock-string-face!
So there is something in this file that causes it to screw up.

emacs-version "26.3"




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40844; Package emacs. (Sat, 25 Apr 2020 14:10:01 GMT) Full text and rfc822 format available.

Message #8 received at 40844 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Cc: 40844 <at> debbugs.gnu.org
Subject: Re: bug#40844: html mode sometimes fooled by apostrophe
Date: Sat, 25 Apr 2020 10:09:06 -0400
tags 40844 + confirmed
found 40844 27.0.91
quit

積丹尼 Dan Jacobson <jidanni <at> jidanni.org> writes:

> $ wget https://www.jidanni.org/lang/pinyin/older.html
> $ emacs -q older.html
>
> You will see that by mid-file, emacs has already got confused by
> U+0027 APOSTROPHE
> causing it to turn on and off font-lock-string-face!
> So there is something in this file that causes it to screw up.

Seems to be related to the paren character, here's a smaller
reproducer:

    <!DOCTYPE html>
    <html>
    <body>
     (and counties' names
    </body>
    </html>





Added tag(s) confirmed. Request was from Noam Postavsky <npostavs <at> gmail.com> to control <at> debbugs.gnu.org. (Sat, 25 Apr 2020 14:10:02 GMT) Full text and rfc822 format available.

bug Marked as found in versions 27.0.91. Request was from Noam Postavsky <npostavs <at> gmail.com> to control <at> debbugs.gnu.org. (Sat, 25 Apr 2020 14:10:02 GMT) Full text and rfc822 format available.

bug Marked as found in versions 26.3. Request was from Noam Postavsky <npostavs <at> gmail.com> to control <at> debbugs.gnu.org. (Fri, 01 May 2020 15:47:03 GMT) Full text and rfc822 format available.

Forcibly Merged 40844 43941. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Fri, 05 Feb 2021 09:42:03 GMT) Full text and rfc822 format available.

Forcibly Merged 40844 43941 46312. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Fri, 05 Feb 2021 09:43:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40844; Package emacs. (Sun, 13 Jun 2021 12:22:01 GMT) Full text and rfc822 format available.

Message #21 received at 40844 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Stephen Berman <stephen.berman <at> gmx.net>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 40844 <at> debbugs.gnu.org, 43941 <at> debbugs.gnu.org,
 jidanni <at> jidanni.org
Subject: Re: bug#40844: html mode sometimes fooled by apostrophe
Date: Sun, 13 Jun 2021 14:21:36 +0200
Stephen Berman <stephen.berman <at> gmx.net> writes:

> I made a silly mistake (it was late and I was tired).  Here is a
> corrected version:

I can confirm that this patch solves the test cases here.

> With this patch, when any of the paired-bracket characters is followed
> by `'' in html-mode, there is indeed no string face fontification on the
> latter (and following characters).  The following function demonstrates
> this:

[...]

> I wanted to turn this function into a test, and that's what the
> commented out lines are supposed to do.  But when I uncomment these
> lines and call this function with the unpatched (i.e. current) version
> of sgml-mode-syntax-table, it still shows default face for `'' with all
> the paired-bracket characters.  Yet when I step through the function
> with Ediff, I do see some cases with font-lock-string-face.  I don't
> understand what's going on here.

Might be a timing issue, perhaps?

In any case, the patch is an improvement, so perhaps that should be
pushed anyway?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Added tag(s) patch. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Sun, 13 Jun 2021 12:22:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40844; Package emacs. (Sun, 13 Jun 2021 18:15:02 GMT) Full text and rfc822 format available.

Message #26 received at 40844 <at> debbugs.gnu.org (full text, mbox):

From: Stephen Berman <stephen.berman <at> gmx.net>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 40844 <at> debbugs.gnu.org, 43941 <at> debbugs.gnu.org,
 jidanni <at> jidanni.org
Subject: Re: bug#40844: html mode sometimes fooled by apostrophe
Date: Sun, 13 Jun 2021 20:14:08 +0200
On Sun, 13 Jun 2021 14:21:36 +0200 Lars Ingebrigtsen <larsi <at> gnus.org> wrote:

> Stephen Berman <stephen.berman <at> gmx.net> writes:
>
>> I made a silly mistake (it was late and I was tired).  Here is a
>> corrected version:
>
> I can confirm that this patch solves the test cases here.

Thanks for checking.

>> With this patch, when any of the paired-bracket characters is followed
>> by `'' in html-mode, there is indeed no string face fontification on the
>> latter (and following characters).  The following function demonstrates
>> this:
>
> [...]
>
>> I wanted to turn this function into a test, and that's what the
>> commented out lines are supposed to do.  But when I uncomment these
>> lines and call this function with the unpatched (i.e. current) version
>> of sgml-mode-syntax-table, it still shows default face for `'' with all
>> the paired-bracket characters.  Yet when I step through the function
>> with Ediff, I do see some cases with font-lock-string-face.  I don't
>> understand what's going on here.
>
> Might be a timing issue, perhaps?

I tried adding sit-for at different points but it made no difference.

> In any case, the patch is an improvement, so perhaps that should be
> pushed anyway?

Upthread Eli said "some SGML/HTML expert should say if that is TRT".
I'm no such expert so I can't make that decision.  FWIW, I rewrote the
test using ert, and the result is as above: it passes with the patch, as
expected, but also without the patch, even though in the latter case the
test buffer clearly contains characters fontified with
font-lock-string-face.  And just as I wrote above, when stepping through
the ert-deftest using the unpatched sgml-tag-syntax-table, the test does
fail as expected.  Here's the test, in case someone else wants to see if
they can figure it out; I haven't succeeded:

(ert-deftest sgml-test-brackets ()
  "Test fontification of apostrophe preceded by paired-bracket character."
  (let ((buf (get-buffer-create "*sgml-test*"))
	brackets results)
    (map-char-table
     (lambda (key value)
       (setq brackets (cons (list
			     (if (consp key)
				 (list (car key) (cdr key))
			       key)
			     value)
			    brackets)))
     (unicode-property-table-internal 'paired-bracket))
    (setq brackets (delete-dups (flatten-tree brackets)))
    (setq brackets (append brackets (list ?$ ?% ?& ?* ?+ ?/)))
    (with-current-buffer buf
      (erase-buffer)
      (fundamental-mode)
      (while brackets
	(let ((char (string (pop brackets))))
	  (insert (concat "<p>" char "'s</p>\n"))))
      (html-mode)
      (goto-char (point-min))
      (while (not (eobp))
	(goto-char (next-single-char-property-change (point) 'face))
	(let ((val (get-text-property (point) 'face)))
	  (when val
	    (should-not (eq val 'font-lock-string-face))))))))

Steve Berman




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40844; Package emacs. (Mon, 14 Jun 2021 13:01:01 GMT) Full text and rfc822 format available.

Message #29 received at 40844 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Stephen Berman <stephen.berman <at> gmx.net>
Cc: jidanni <at> jidanni.org, Eli Zaretskii <eliz <at> gnu.org>, 40844 <at> debbugs.gnu.org,
 43941 <at> debbugs.gnu.org, 46312 <at> debbugs.gnu.org
Subject: Re: bug#46312: HTML+ mode vs. quotes
Date: Mon, 14 Jun 2021 15:00:16 +0200
Stephen Berman <stephen.berman <at> gmx.net> writes:

> Upthread Eli said "some SGML/HTML expert should say if that is TRT".
> I'm no such expert so I can't make that decision.

I'm not either, but at this point I'd rather apply the patch and then we
can see whether some SGML expert pipes up...

> Here's the test, in case someone else wants to see if
> they can figure it out; I haven't succeeded:

Just needs a `font-lock-ensure' after `html-mode'.  :-)  Then the test
fails without your patch, and passes with your patch.

So I went ahead and pushed your patch (and the test) to Emacs 28 (with
some minor changes to the test).

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug marked as fixed in version 28.1, send any further explanations to 46312 <at> debbugs.gnu.org and 積丹尼 Dan Jacobson <jidanni <at> jidanni.org> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Mon, 14 Jun 2021 13:01:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40844; Package emacs. (Mon, 14 Jun 2021 13:03:01 GMT) Full text and rfc822 format available.

Message #34 received at 40844 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: jidanni <at> jidanni.org, 40844 <at> debbugs.gnu.org, stephen.berman <at> gmx.net,
 43941 <at> debbugs.gnu.org, 46312 <at> debbugs.gnu.org
Subject: Re: bug#46312: HTML+ mode vs. quotes
Date: Mon, 14 Jun 2021 16:02:02 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: 46312 <at> debbugs.gnu.org,  Eli Zaretskii <eliz <at> gnu.org>,
>   40844 <at> debbugs.gnu.org,  43941 <at> debbugs.gnu.org,  jidanni <at> jidanni.org
> Date: Mon, 14 Jun 2021 15:00:16 +0200
> 
> Stephen Berman <stephen.berman <at> gmx.net> writes:
> 
> > Upthread Eli said "some SGML/HTML expert should say if that is TRT".
> > I'm no such expert so I can't make that decision.
> 
> I'm not either, but at this point I'd rather apply the patch and then we
> can see whether some SGML expert pipes up...

Fine with me.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40844; Package emacs. (Mon, 14 Jun 2021 13:53:01 GMT) Full text and rfc822 format available.

Message #37 received at 40844 <at> debbugs.gnu.org (full text, mbox):

From: Stephen Berman <stephen.berman <at> gmx.net>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: jidanni <at> jidanni.org, Eli Zaretskii <eliz <at> gnu.org>, 40844 <at> debbugs.gnu.org,
 43941 <at> debbugs.gnu.org, 46312 <at> debbugs.gnu.org
Subject: Re: bug#46312: HTML+ mode vs. quotes
Date: Mon, 14 Jun 2021 15:52:40 +0200
On Mon, 14 Jun 2021 15:00:16 +0200 Lars Ingebrigtsen <larsi <at> gnus.org> wrote:

> Stephen Berman <stephen.berman <at> gmx.net> writes:
>
>> Upthread Eli said "some SGML/HTML expert should say if that is TRT".
>> I'm no such expert so I can't make that decision.
>
> I'm not either, but at this point I'd rather apply the patch and then we
> can see whether some SGML expert pipes up...

Sounds good.

>> Here's the test, in case someone else wants to see if
>> they can figure it out; I haven't succeeded:
>
> Just needs a `font-lock-ensure' after `html-mode'.  :-)  Then the test
> fails without your patch, and passes with your patch.

Ah, thanks.  The ways of font lock are unfathomable to me.

> So I went ahead and pushed your patch (and the test) to Emacs 28 (with
> some minor changes to the test).

Thanks.  I noticed, unfortunately only just now, that the ert test
includes the unused let-bound variable `results' which I inadvertantly
left behind from the previous version, and which will probably make the
byte-compiler complain.

Steve Berman




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40844; Package emacs. (Mon, 14 Jun 2021 13:59:02 GMT) Full text and rfc822 format available.

Message #40 received at 40844 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Stephen Berman <stephen.berman <at> gmx.net>
Cc: jidanni <at> jidanni.org, Eli Zaretskii <eliz <at> gnu.org>, 40844 <at> debbugs.gnu.org,
 43941 <at> debbugs.gnu.org, 46312 <at> debbugs.gnu.org
Subject: Re: bug#46312: HTML+ mode vs. quotes
Date: Mon, 14 Jun 2021 15:58:14 +0200
Stephen Berman <stephen.berman <at> gmx.net> writes:

> Thanks.  I noticed, unfortunately only just now, that the ert test
> includes the unused let-bound variable `results' which I inadvertantly
> left behind from the previous version, and which will probably make the
> byte-compiler complain.

Yup; now removed.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 13 Jul 2021 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 288 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.