GNU bug report logs -
#33887
26.1; Emacs hangs for several seconds when going to the end of an XML file in nXML mode
Previous Next
Reported by: Vincent Lefevre <vincent <at> vinc17.net>
Date: Thu, 27 Dec 2018 10:14:02 UTC
Severity: normal
Tags: fixed
Merged with 25176
Found in versions 26.0.50, 26.1
Fixed in version 27.1
Done: Noam Postavsky <npostavs <at> gmail.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 33887 in the body.
You can then email your comments to 33887 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Thu, 27 Dec 2018 10:14:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Vincent Lefevre <vincent <at> vinc17.net>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Thu, 27 Dec 2018 10:14:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
When I open a large XML file and immediately go to the end of the
file with '<ESC> >', Emacs hangs for several seconds. For instance,
on /usr/share/xml/iso-codes/iso_639-3.xml from iso-codes in Debian
(a 1-MB file), it takes 5 seconds. On a 4-MB personal XML file, it
takes 15 seconds.
This is a regression: Emacs 25 did not hang at all.
In GNU Emacs 26.1 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.2)
of 2018-12-26, modified by Debian built on x86-ubc-01
Windowing system distributor 'The X.Org Foundation', version 11.0.12003000
System Description: Debian GNU/Linux buster/sid
Recent messages:
Loading /etc/emacs/site-start.d/50latex-cjk-common.el (source)...done
Loading /etc/emacs/site-start.d/50latex-cjk-thai.el (source)...done
Loading /etc/emacs/site-start.d/50maxima-emacs.el (source)...done
Loading /etc/emacs/site-start.d/50psvn.el (source)...done
Loading /etc/emacs/site-start.d/50python-docutils.el (source)...done
Loading /etc/emacs/site-start.d/50texlive-lang-english.el (source)...done
Loading /etc/emacs/site-start.d/50why3.el (source)...done
Loading /home/vinc17/share/emacs/site-lisp/mutteditor.el (source)...done
Loading time...done
For information about GNU Emacs and the GNU system, type C-h C-a.
Configured using:
'configure --build x86_64-linux-gnu --prefix=/usr
--sharedstatedir=/var/lib --libexecdir=/usr/lib
--localstatedir=/var/lib --infodir=/usr/share/info
--mandir=/usr/share/man --enable-libsystemd --with-pop=yes
--enable-locallisppath=/etc/emacs:/usr/local/share/emacs/26.1/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/26.1/site-lisp:/usr/share/emacs/site-lisp
--with-sound=alsa --without-gconf --with-mailutils --build
x86_64-linux-gnu --prefix=/usr --sharedstatedir=/var/lib
--libexecdir=/usr/lib --localstatedir=/var/lib
--infodir=/usr/share/info --mandir=/usr/share/man --enable-libsystemd
--with-pop=yes
--enable-locallisppath=/etc/emacs:/usr/local/share/emacs/26.1/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/26.1/site-lisp:/usr/share/emacs/site-lisp
--with-sound=alsa --without-gconf --with-mailutils --with-x=yes
--with-x-toolkit=gtk3 --with-toolkit-scroll-bars 'CFLAGS=-g -O2
-fdebug-prefix-map=/build/emacs-3ThesY/emacs-26.1+1=.
-fstack-protector-strong -Wformat -Werror=format-security -Wall'
'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' LDFLAGS=-Wl,-z,relro'
Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS NOTIFY
ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 THREADS LIBSYSTEMD LCMS2
Important settings:
value of $LC_COLLATE: POSIX
value of $LC_CTYPE: en_US.UTF-8
value of $LC_TIME: en_DK
value of $LANG: POSIX
locale-coding-system: utf-8-unix
Major mode: Lisp Interaction
Minor modes in effect:
display-time-mode: t
show-paren-mode: t
tooltip-mode: t
global-eldoc-mode: t
eldoc-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
column-number-mode: t
line-number-mode: t
transient-mark-mode: t
Load-path shadows:
/usr/share/emacs/site-lisp/llvm-3.5/tablegen-mode hides /usr/share/emacs/site-lisp/llvm-3.6/tablegen-mode
/usr/share/emacs/site-lisp/llvm-3.5/llvm-mode hides /usr/share/emacs/site-lisp/llvm-3.6/llvm-mode
/usr/share/emacs/site-lisp/llvm-3.5/emacs hides /usr/share/emacs/site-lisp/llvm-3.6/emacs
/usr/share/emacs/site-lisp/llvm-3.5/tablegen-mode hides /usr/share/emacs/site-lisp/llvm-3.7/tablegen-mode
/usr/share/emacs/site-lisp/llvm-3.5/llvm-mode hides /usr/share/emacs/site-lisp/llvm-3.7/llvm-mode
/usr/share/emacs/site-lisp/llvm-3.5/emacs hides /usr/share/emacs/site-lisp/llvm-3.7/emacs
/usr/share/emacs/site-lisp/llvm-3.5/tablegen-mode hides /usr/share/emacs/site-lisp/llvm-3.8/tablegen-mode
/usr/share/emacs/site-lisp/llvm-3.5/llvm-mode hides /usr/share/emacs/site-lisp/llvm-3.8/llvm-mode
/usr/share/emacs/site-lisp/llvm-3.5/emacs hides /usr/share/emacs/site-lisp/llvm-3.8/emacs
/usr/share/emacs/site-lisp/llvm-3.5/tablegen-mode hides /usr/share/emacs/site-lisp/llvm-3.9/tablegen-mode
/usr/share/emacs/site-lisp/llvm-3.5/llvm-mode hides /usr/share/emacs/site-lisp/llvm-3.9/llvm-mode
/usr/share/emacs/site-lisp/llvm-3.5/emacs hides /usr/share/emacs/site-lisp/llvm-3.9/emacs
/usr/share/emacs/site-lisp/llvm-3.5/tablegen-mode hides /usr/share/emacs/site-lisp/llvm-4.0/tablegen-mode
/usr/share/emacs/site-lisp/llvm-3.5/llvm-mode hides /usr/share/emacs/site-lisp/llvm-4.0/llvm-mode
/usr/share/emacs/site-lisp/llvm-3.5/emacs hides /usr/share/emacs/site-lisp/llvm-4.0/emacs
/usr/share/emacs/site-lisp/rst hides /usr/share/emacs/26.1/lisp/textmodes/rst
/usr/share/emacs/site-lisp/latex-cjk-thai/thai-word hides /usr/share/emacs/26.1/lisp/language/thai-word
Features:
(shadow sort mail-extr warnings emacsbug message rmc puny seq byte-opt
gv bytecomp byte-compile cconv dired dired-loaddefs format-spec rfc822
mml easymenu mml-sec password-cache epa derived epg epg-config gnus-util
rmail rmail-loaddefs mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums
mm-util mail-prsvr mail-utils elec-pair time cus-start cus-load paren
cc-styles cc-align cc-engine cc-vars cc-defs edmacro kmacro cl-loaddefs
cl-lib time-date mule-util tooltip eldoc electric uniquify ediff-hook
vc-hooks lisp-float-type mwheel term/x-win x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode elisp-mode lisp-mode prog-mode register page
menu-bar rfn-eshadow isearch timer select scroll-bar mouse jit-lock
font-lock syntax facemenu font-core term/tty-colors frame cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote dbusbind inotify lcms2
dynamic-setting system-font-setting font-render-setting move-toolbar gtk
x-toolkit x multi-tty make-network-process emacs)
Memory information:
((conses 16 118562 10618)
(symbols 48 23199 1)
(miscs 40 54 133)
(strings 32 34944 2101)
(string-bytes 1 946046)
(vectors 16 15937)
(vector-slots 8 510844 4784)
(floats 8 56 97)
(intervals 56 279 0)
(buffers 992 12))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Thu, 27 Dec 2018 16:03:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 33887 <at> debbugs.gnu.org (full text, mbox):
> From: Vincent Lefevre <vincent <at> vinc17.net>
> Date: Thu, 27 Dec 2018 11:13:06 +0100
>
> When I open a large XML file and immediately go to the end of the
> file with '<ESC> >', Emacs hangs for several seconds. For instance,
> on /usr/share/xml/iso-codes/iso_639-3.xml from iso-codes in Debian
> (a 1-MB file), it takes 5 seconds. On a 4-MB personal XML file, it
> takes 15 seconds.
>
> This is a regression: Emacs 25 did not hang at all.
Confirmed, thanks.
The profile (see below) blames syntax-ppss called by
sgml-syntax-propertize, so I suspect commit 0055190, which added
sgml-syntax-propertize-inside to sgml-syntax-propertize.
CC'ing Stefan who made those changes.
Here's the profile:
- command-execute 532 77%
- call-interactively 532 77%
- funcall-interactively 522 75%
- end-of-buffer 500 72%
- recenter 496 71%
- jit-lock-function 496 71%
- jit-lock-fontify-now 496 71%
- jit-lock--run-functions 496 71%
- run-hook-wrapped 496 71%
- #<compiled 0x200000000b3a7fd0> 496 71%
- font-lock-fontify-region 496 71%
- font-lock-default-fontify-region 496 71%
- nxml-extend-region 496 71%
- skip-syntax-forward 496 71%
- internal--syntax-propertize 496 71%
- syntax-propertize 496 71%
- sgml-syntax-propertize 490 71%
syntax-ppss 445 64%
push-mark 1 0%
- find-file 20 2%
- find-file-noselect 20 2%
- find-file-noselect-1 19 2%
- after-find-file 17 2%
- normal-mode 17 2%
- set-auto-mode 17 2%
- set-auto-mode-0 17 2%
- xml-mode 17 2%
- byte-code 14 2%
- require 12 1%
- byte-code 11 1%
- require 10 1%
- byte-code 9 1%
- require 6 0%
- byte-code 6 0%
- cl-generic-define-method 4 0%
- cl--generic-make-function 4 0%
- cl--generic-make-next-function 4 0%
- cl--generic-get-dispatcher 4 0%
- byte-compile 3 0%
byte-code 1 0%
- #<compiled 0x200000000b325048> 1 0%
byte-compile-top-level 1 0%
- custom-declare-variable 1 0%
- custom-initialize-reset 1 0%
- eval 1 0%
- funcall 1 0%
- #<compiled 0x200000000b3c88b8> 1 0%
- executable-find 1 0%
locate-file 1 0%
file-truename 1 0%
- rng-nxml-mode-init 2 0%
- rng-validate-mode 2 0%
- rng-auto-set-schema 2 0%
- rng-locate-schema-file 2 0%
- rng-locate-schema-file-using 2 0%
- rng-get-parsed-schema-locating-file 2 0%
- rng-parse-schema-locating-file 1 0%
- rng-parse-validate-file 1 0%
- nxml-parse-instance 1 0%
nxml-parse-instance-1 1 0%
- file-truename 1 0%
- file-truename 1 0%
- file-truename 1 0%
file-truename 1 0%
- insert-file-contents 1 0%
xml-find-file-coding-system 1 0%
- execute-extended-command 1 0%
- sit-for 1 0%
redisplay 1 0%
- minibuffer-complete 1 0%
- completion-in-region 1 0%
- completion--in-region 1 0%
- #<compiled 0x2000000001b04c20> 1 0%
- apply 1 0%
- #<compiled 0x20000000013baac8> 1 0%
- completion--in-region-1 1 0%
- completion--do-completion 1 0%
- completion-try-completion 1 0%
- completion--nth-completion 1 0%
- completion--some 1 0%
- #<compiled 0x2000000001b0bd20> 1 0%
- completion-basic-try-completion 1 0%
- try-completion 1 0%
completion-file-name-table 1 0%
- byte-code 10 1%
- read-extended-command 9 1%
- completing-read 9 1%
- completing-read-default 9 1%
read-from-minibuffer 9 1%
- find-file-read-args 1 0%
- read-file-name 1 0%
- read-file-name-default 1 0%
- completing-read 1 0%
- completing-read-default 1 0%
- read-from-minibuffer 1 0%
- redisplay_internal (C function) 1 0%
find-image 1 0%
- ... 158 22%
Automatic GC 156 22%
- macroexp--all-forms 1 0%
- macroexp--expand-all 1 0%
- #<compiled 0x2000000001375130> 1 0%
- macroexp--all-forms 1 0%
- macroexp--expand-all 1 0%
- macroexp--all-forms 1 0%
- macroexp--expand-all 1 0%
- #<compiled 0x2000000001375130> 1 0%
- macroexp--all-forms 1 0%
- macroexp--expand-all 1 0%
- #<compiled 0x2000000001375068> 1 0%
- macroexp--all-forms 1 0%
- macroexp--expand-all 1 0%
- macroexp-macroexpand 1 0%
- macroexpand 1 0%
#<compiled 0x20000000013f0600> 1 0%
- rng-compute-start-tag-open-deriv 1 0%
- rng-element-get-child 1 0%
- rng-compile 1 0%
- apply 1 0%
- rng-compile-group 1 0%
- mapcar 1 0%
- rng-compile 1 0%
- apply 1 0%
- rng-compile-attribute 1 0%
- rng-compile 1 0%
- apply 1 0%
- rng-compile-ref 1 0%
- rng-compile 1 0%
- apply 1 0%
- rng-compile-data 1 0%
rng-compile-dt 1 0%
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Thu, 27 Dec 2018 16:40:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 33887 <at> debbugs.gnu.org (full text, mbox):
>> When I open a large XML file and immediately go to the end of the
>> file with '<ESC> >', Emacs hangs for several seconds. For instance,
>> on /usr/share/xml/iso-codes/iso_639-3.xml from iso-codes in Debian
>> (a 1-MB file), it takes 5 seconds. On a 4-MB personal XML file, it
>> takes 15 seconds.
>>
>> This is a regression: Emacs 25 did not hang at all.
>
> Confirmed, thanks.
>
> The profile (see below) blames syntax-ppss called by
> sgml-syntax-propertize, so I suspect commit 0055190, which added
> sgml-syntax-propertize-inside to sgml-syntax-propertize.
Sounds right, but I'm not sure what to do about this.
I don't wonder why so much time is passed on syntax-ppss, which is
generally expected to be relatively fast.
Maybe sgml-syntax-propertize is called too often (I see it's mostly
called from skip-syntax-forward; maybe we should call syntax-propertize
explicitly beforehand with a more distant position so
sgml-syntax-propertize is called just once).
Stefan
> Here's the profile:
>
> - command-execute 532 77%
> - call-interactively 532 77%
> - funcall-interactively 522 75%
> - end-of-buffer 500 72%
> - recenter 496 71%
> - jit-lock-function 496 71%
> - jit-lock-fontify-now 496 71%
> - jit-lock--run-functions 496 71%
> - run-hook-wrapped 496 71%
> - #<compiled 0x200000000b3a7fd0> 496 71%
> - font-lock-fontify-region 496 71%
> - font-lock-default-fontify-region 496 71%
> - nxml-extend-region 496 71%
> - skip-syntax-forward 496 71%
> - internal--syntax-propertize 496 71%
> - syntax-propertize 496 71%
> - sgml-syntax-propertize 490 71%
> syntax-ppss 445 64%
> push-mark 1 0%
> - find-file 20 2%
> - find-file-noselect 20 2%
> - find-file-noselect-1 19 2%
> - after-find-file 17 2%
> - normal-mode 17 2%
> - set-auto-mode 17 2%
> - set-auto-mode-0 17 2%
> - xml-mode 17 2%
> - byte-code 14 2%
> - require 12 1%
> - byte-code 11 1%
> - require 10 1%
> - byte-code 9 1%
> - require 6 0%
> - byte-code 6 0%
> - cl-generic-define-method 4 0%
> - cl--generic-make-function 4 0%
> - cl--generic-make-next-function 4 0%
> - cl--generic-get-dispatcher 4 0%
> - byte-compile 3 0%
> byte-code 1 0%
> - #<compiled 0x200000000b325048> 1 0%
> byte-compile-top-level 1 0%
> - custom-declare-variable 1 0%
> - custom-initialize-reset 1 0%
> - eval 1 0%
> - funcall 1 0%
> - #<compiled 0x200000000b3c88b8> 1 0%
> - executable-find 1 0%
> locate-file 1 0%
> file-truename 1 0%
> - rng-nxml-mode-init 2 0%
> - rng-validate-mode 2 0%
> - rng-auto-set-schema 2 0%
> - rng-locate-schema-file 2 0%
> - rng-locate-schema-file-using 2 0%
> - rng-get-parsed-schema-locating-file 2 0%
> - rng-parse-schema-locating-file 1 0%
> - rng-parse-validate-file 1 0%
> - nxml-parse-instance 1 0%
> nxml-parse-instance-1 1 0%
> - file-truename 1 0%
> - file-truename 1 0%
> - file-truename 1 0%
> file-truename 1 0%
> - insert-file-contents 1 0%
> xml-find-file-coding-system 1 0%
> - execute-extended-command 1 0%
> - sit-for 1 0%
> redisplay 1 0%
> - minibuffer-complete 1 0%
> - completion-in-region 1 0%
> - completion--in-region 1 0%
> - #<compiled 0x2000000001b04c20> 1 0%
> - apply 1 0%
> - #<compiled 0x20000000013baac8> 1 0%
> - completion--in-region-1 1 0%
> - completion--do-completion 1 0%
> - completion-try-completion 1 0%
> - completion--nth-completion 1 0%
> - completion--some 1 0%
> - #<compiled 0x2000000001b0bd20> 1 0%
> - completion-basic-try-completion 1 0%
> - try-completion 1 0%
> completion-file-name-table 1 0%
> - byte-code 10 1%
> - read-extended-command 9 1%
> - completing-read 9 1%
> - completing-read-default 9 1%
> read-from-minibuffer 9 1%
> - find-file-read-args 1 0%
> - read-file-name 1 0%
> - read-file-name-default 1 0%
> - completing-read 1 0%
> - completing-read-default 1 0%
> - read-from-minibuffer 1 0%
> - redisplay_internal (C function) 1 0%
> find-image 1 0%
> - ... 158 22%
> Automatic GC 156 22%
> - macroexp--all-forms 1 0%
> - macroexp--expand-all 1 0%
> - #<compiled 0x2000000001375130> 1 0%
> - macroexp--all-forms 1 0%
> - macroexp--expand-all 1 0%
> - macroexp--all-forms 1 0%
> - macroexp--expand-all 1 0%
> - #<compiled 0x2000000001375130> 1 0%
> - macroexp--all-forms 1 0%
> - macroexp--expand-all 1 0%
> - #<compiled 0x2000000001375068> 1 0%
> - macroexp--all-forms 1 0%
> - macroexp--expand-all 1 0%
> - macroexp-macroexpand 1 0%
> - macroexpand 1 0%
> #<compiled 0x20000000013f0600> 1 0%
> - rng-compute-start-tag-open-deriv 1 0%
> - rng-element-get-child 1 0%
> - rng-compile 1 0%
> - apply 1 0%
> - rng-compile-group 1 0%
> - mapcar 1 0%
> - rng-compile 1 0%
> - apply 1 0%
> - rng-compile-attribute 1 0%
> - rng-compile 1 0%
> - apply 1 0%
> - rng-compile-ref 1 0%
> - rng-compile 1 0%
> - apply 1 0%
> - rng-compile-data 1 0%
> rng-compile-dt 1 0%
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Thu, 27 Dec 2018 16:44:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 33887 <at> debbugs.gnu.org (full text, mbox):
> From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
> Cc: Vincent Lefevre <vincent <at> vinc17.net>, 33887 <at> debbugs.gnu.org
> Date: Thu, 27 Dec 2018 11:39:06 -0500
>
> > The profile (see below) blames syntax-ppss called by
> > sgml-syntax-propertize, so I suspect commit 0055190, which added
> > sgml-syntax-propertize-inside to sgml-syntax-propertize.
>
> Sounds right, but I'm not sure what to do about this.
> I don't wonder why so much time is passed on syntax-ppss, which is
> generally expected to be relatively fast.
Why was sgml-syntax-propertize-inside added? Is its effect an
absolute must, or merely a nice-to-have feature? If the latter,
perhaps a defcustom that could disable that call will be an okay
solution, at least as a stopgap?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Thu, 27 Dec 2018 17:33:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 33887 <at> debbugs.gnu.org (full text, mbox):
> Why was sgml-syntax-propertize-inside added? Is its effect an
> absolute must, or merely a nice-to-have feature?
It's needed for correctness in the presence of <?...?> or <![CDATA[...]]>
> If the latter, perhaps a defcustom that could disable that call will
> be an okay solution, at least as a stopgap?
I don't think it should be terribly expensive, so I'd rather first try
and better understand the performance issue,
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Thu, 27 Dec 2018 17:48:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 33887 <at> debbugs.gnu.org (full text, mbox):
> From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
> Cc: vincent <at> vinc17.net, 33887 <at> debbugs.gnu.org
> Date: Thu, 27 Dec 2018 12:32:21 -0500
>
> > If the latter, perhaps a defcustom that could disable that call will
> > be an okay solution, at least as a stopgap?
>
> I don't think it should be terribly expensive, so I'd rather first try
> and better understand the performance issue,
Sure. I thought you already did ;-)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Thu, 27 Dec 2018 18:44:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 33887 <at> debbugs.gnu.org (full text, mbox):
On 2018-12-27 12:32:21 -0500, Stefan Monnier wrote:
> > Why was sgml-syntax-propertize-inside added? Is its effect an
> > absolute must, or merely a nice-to-have feature?
>
> It's needed for correctness in the presence of <?...?> or <![CDATA[...]]>
I use both in some of my XML files and I have never found any issue
with them. Or perhaps this is just for particular cases?
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Fri, 28 Dec 2018 17:19:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 33887 <at> debbugs.gnu.org (full text, mbox):
>> > Why was sgml-syntax-propertize-inside added? Is its effect an
>> > absolute must, or merely a nice-to-have feature?
>> It's needed for correctness in the presence of <?...?> or <![CDATA[...]]>
> I use both in some of my XML files and I have never found any issue
> with them. Or perhaps this is just for particular cases?
Yes, it only makes a real difference when the content of those things
ends up confusing the parser (e.g. it looks like an unclosed tag, or
things along these lines).
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Tue, 08 Jan 2019 22:16:01 GMT)
Full text and
rfc822 format available.
Message #29 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi everyone, this is my first email to bug-gnu-emacs, so please let me
know if I am making some mistake.
For no special reason, I took this bug in order to start to know emacs'
code.
Following and confirming the details of the bug, I found that indeed the
performance issue is introduced at commit 0055190174, but not beacuse
the introduction of `sgml-syntax-propertize-inside`.
The problem is with the last rule:
```
("\"" (0 (if (prog1 (zerop (car (syntax-ppss (match-beginning 0))))
(goto-char (match-end 0)))
(string-to-syntax ".")))
```
I can't see the real effect of this rule, I tested xml parsing without
this rule and it works fine, marking double quotes inside tags as
expected without this performance issue.
Do we need to target double quotes outside tags explicitly?
--
Fernando Jascovich
developer
m: +54 9 3548 63 9833
github: https://github.com/fernando-jascovich/
linkedin: https://www.linkedin.com/in/fernandojascovich/
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Thu, 10 Jan 2019 15:10:01 GMT)
Full text and
rfc822 format available.
Message #32 received at 33887 <at> debbugs.gnu.org (full text, mbox):
> From: Fernando Jascovich <fernando.ej <at> gmail.com>
> Date: Tue, 08 Jan 2019 19:11:02 -0300
>
> Hi everyone, this is my first email to bug-gnu-emacs, so please let me
> know if I am making some mistake.
> For no special reason, I took this bug in order to start to know emacs'
> code.
> Following and confirming the details of the bug, I found that indeed the
> performance issue is introduced at commit 0055190174, but not beacuse
> the introduction of `sgml-syntax-propertize-inside`.
> The problem is with the last rule:
> ```
> ("\"" (0 (if (prog1 (zerop (car (syntax-ppss (match-beginning 0))))
> (goto-char (match-end 0)))
> (string-to-syntax ".")))
> ```
> I can't see the real effect of this rule, I tested xml parsing without
> this rule and it works fine, marking double quotes inside tags as
> expected without this performance issue.
> Do we need to target double quotes outside tags explicitly?
Stefan, any comments?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Thu, 17 Jan 2019 22:58:02 GMT)
Full text and
rfc822 format available.
Message #35 received at 33887 <at> debbugs.gnu.org (full text, mbox):
> The profile (see below) blames syntax-ppss called by
> sgml-syntax-propertize, so I suspect commit 0055190, which added
> sgml-syntax-propertize-inside to sgml-syntax-propertize.
Hmm... actually, the syntax-ppss calls that take time are directly made
from within sgml-syntax-propertize rather than from within
sgml-syntax-propertize-inside (which doesn't even appear in your profile
(in my profile I get 8099 units of time in sgml-syntax-propertize, of
which 7611 in syntax-ppss and only 77 in sgml-syntax-propertize-inside).
The problem seems to come from the following syntax propertize rule:
;; Double quotes outside of tags should not introduce strings.
;; Be careful to call `syntax-ppss' on a position before the one we're
;; going to change, so as not to need to flush the data we just computed.
("\"" (0 (if (prog1 (zerop (car (syntax-ppss (match-beginning 0))))
(goto-char (match-end 0)))
(string-to-syntax "."))))
If I comment it out, the delay is *much* smaller.
The problem being that " are quite common characters in XML files, so
the regexp matches often and we call syntax-ppss each time, so we end up
calling syntax-ppss very often.
I'm trying to figure out how to avoid calling syntax-ppss for every
" character. I'm thinking of looking at pairs of " chars and only do
extra work if there's a < or > between the two.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Thu, 17 Jan 2019 23:27:02 GMT)
Full text and
rfc822 format available.
Message #38 received at 33887 <at> debbugs.gnu.org (full text, mbox):
>> From: Fernando Jascovich <fernando.ej <at> gmail.com>
>> Date: Tue, 08 Jan 2019 19:11:02 -0300
>>
>> Hi everyone, this is my first email to bug-gnu-emacs, so please let me
>> know if I am making some mistake.
>> For no special reason, I took this bug in order to start to know emacs'
>> code.
>> Following and confirming the details of the bug, I found that indeed the
>> performance issue is introduced at commit 0055190174, but not beacuse
>> the introduction of `sgml-syntax-propertize-inside`.
>> The problem is with the last rule:
>> ```
>> ("\"" (0 (if (prog1 (zerop (car (syntax-ppss (match-beginning 0))))
>> (goto-char (match-end 0)))
>> (string-to-syntax ".")))
>> ```
>> I can't see the real effect of this rule, I tested xml parsing without
>> this rule and it works fine, marking double quotes inside tags as
>> expected without this performance issue.
>> Do we need to target double quotes outside tags explicitly?
>
> Stefan, any comments?
Yes, he's exactly right.
I just pushed a patch to master which should reduce significantly
this delay.
Stefan
Merged 25176 33887.
Request was from
Noam Postavsky <npostavs <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Wed, 17 Apr 2019 23:51:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Wed, 15 May 2019 23:54:01 GMT)
Full text and
rfc822 format available.
Message #43 received at 33887 <at> debbugs.gnu.org (full text, mbox):
Vincent Lefevre <vincent <at> vinc17.net> writes:
> This is a regression: Emacs 25 did not hang at all.
Should we backport Stefan's fix to emacs-26? Or specifically, backport
[1: e7e92dc5d2], which is Stefan's fix on top of my fix for the
loss-of-single-quote-fontification bug (Bug#35381).
[1: e7e92dc5d2]: 2019-05-15 19:04:14 -0400
Fix merge of sgml-syntax-propertize-rules
https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=e7e92dc5d24ac3bcde69732bab6a6c3c0d9de97b
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Thu, 16 May 2019 10:55:02 GMT)
Full text and
rfc822 format available.
Message #46 received at 33887 <at> debbugs.gnu.org (full text, mbox):
Hi,
On 2019-05-15 19:53:08 -0400, Noam Postavsky wrote:
> Vincent Lefevre <vincent <at> vinc17.net> writes:
>
> > This is a regression: Emacs 25 did not hang at all.
>
> Should we backport Stefan's fix to emacs-26? Or specifically, backport
> [1: e7e92dc5d2], which is Stefan's fix on top of my fix for the
> loss-of-single-quote-fontification bug (Bug#35381).
>
> [1: e7e92dc5d2]: 2019-05-15 19:04:14 -0400
> Fix merge of sgml-syntax-propertize-rules
> https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=e7e92dc5d24ac3bcde69732bab6a6c3c0d9de97b
It would be nice if this could be fixed quickly in emacs-26,
hoping that it could be fixed in Debian before the next stable
release.
(I'm still using Emacs 25 because of this bug.)
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Thu, 16 May 2019 12:17:02 GMT)
Full text and
rfc822 format available.
Message #49 received at 33887 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Noam Postavsky <npostavs <at> gmail.com> writes:
> [1: e7e92dc5d2]: 2019-05-15 19:04:14 -0400
> Fix merge of sgml-syntax-propertize-rules
> https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=e7e92dc5d24ac3bcde69732bab6a6c3c0d9de97b
Uh, I goofed that one, Stefan fixed it [2: 9a74e5666b]. The corrected patch would be as follows:
[0001-Backport-sgml-syntax-propertize-rules-speedup-Bug-33.patch (text/plain, attachment)]
[Message part 3 (text/plain, inline)]
[2: 9a74e5666b]: 2019-05-15 22:21:36 -0400
* lisp/textmodes/sgml-mode.el (sgml-syntax-propertize-rules): Fix typo
https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=9a74e5666b022098c63d0047c0df90c66e1aa64a
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Thu, 16 May 2019 14:02:02 GMT)
Full text and
rfc822 format available.
Message #52 received at 33887 <at> debbugs.gnu.org (full text, mbox):
> From: Noam Postavsky <npostavs <at> gmail.com>
> Date: Wed, 15 May 2019 19:53:08 -0400
> Cc: 33887 <at> debbugs.gnu.org
>
> Vincent Lefevre <vincent <at> vinc17.net> writes:
>
> > This is a regression: Emacs 25 did not hang at all.
>
> Should we backport Stefan's fix to emacs-26? Or specifically, backport
> [1: e7e92dc5d2], which is Stefan's fix on top of my fix for the
> loss-of-single-quote-fontification bug (Bug#35381).
>
> [1: e7e92dc5d2]: 2019-05-15 19:04:14 -0400
> Fix merge of sgml-syntax-propertize-rules
> https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=e7e92dc5d24ac3bcde69732bab6a6c3c0d9de97b
I'd like to leave this fix on master for a while, so that we could
make sure it has no adverse consequences. Can we revisit this in a
month's time, say?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Fri, 17 May 2019 21:37:01 GMT)
Full text and
rfc822 format available.
Message #55 received at 33887 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 2019-05-16 08:15:58 -0400, Noam Postavsky wrote:
> The corrected patch would be as follows:
[...]
I've tried the combination of
ca14dd1d4628094dd33d5d94694dcf5f29e843b8
7dab3ee7ab54b3c2e7bc24170376054786c01d6f
and this patch against Debian's current source package.
Emacs no longer hangs, but I get incorrect highlighting,
for instance on the following XML file.
<root>
<!-- comment -->
<a>"a'</a>
<!-- comment -->
</root>
Highlighting starts to be wrong at the single-quote character.
I've attached a screenshot obtained with the -Q option.
Did I miss anything?
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
[nxml.png (image/png, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Sat, 18 May 2019 04:16:02 GMT)
Full text and
rfc822 format available.
Message #58 received at 33887 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Vincent Lefevre <vincent <at> vinc17.net> writes:
> I've tried the combination of
>
> ca14dd1d4628094dd33d5d94694dcf5f29e843b8
> 7dab3ee7ab54b3c2e7bc24170376054786c01d6f
>
> and this patch against Debian's current source package.
>
> Emacs no longer hangs, but I get incorrect highlighting,
> for instance on the following XML file.
>
> <root>
> <!-- comment -->
> <a>"a'</a>
> <!-- comment -->
> </root>
>
> Highlighting starts to be wrong at the single-quote character.
> I've attached a screenshot obtained with the -Q option.
>
> Did I miss anything?
Ah, I didn't get the mixed quote handling right. Here's the fix for master:
[0001-Fix-Bug-33887-for-mixed-quote-usage.patch (text/x-diff, inline)]
From 4677edd8dd65b5d956732821e78794f35b275418 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs <at> gmail.com>
Date: Sat, 18 May 2019 00:04:01 -0400
Subject: [PATCH] Fix Bug#33887 for mixed quote usage
* lisp/textmodes/sgml-mode.el (sgml-syntax-propertize-rules): Only
skip syntax-ppss for matched quotes.
* test/lisp/textmodes/sgml-mode-tests.el (sgml-tests--quotes-syntax):
Expand test.
---
lisp/textmodes/sgml-mode.el | 4 ++--
test/lisp/textmodes/sgml-mode-tests.el | 17 ++++++++++++-----
2 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/lisp/textmodes/sgml-mode.el b/lisp/textmodes/sgml-mode.el
index 1b064fb825..e3cf56aa0e 100644
--- a/lisp/textmodes/sgml-mode.el
+++ b/lisp/textmodes/sgml-mode.el
@@ -345,8 +345,8 @@ sgml-font-lock-keywords
;; the resulting number of calls to syntax-ppss made it too slow
;; (bug#33887), so we're now careful to leave alone any pair
;; of quotes that doesn't hold a < or > char, which is the vast majority.
- ("\\(?:\\(?1:\"\\)[^\"<>]*[<>\"]\\|\\(?1:'\\)[^'<>]*[<>']\\)"
- (1 (unless (memq (char-before) '(?\' ?\"))
+ ("\\([\"']\\)[^<>\"']*[<>\"']"
+ (1 (unless (eq (char-after (match-beginning 1)) (char-before))
;; Be careful to call `syntax-ppss' on a position before the one
;; we're going to change, so as not to need to flush the data we
;; just computed.
diff --git a/test/lisp/textmodes/sgml-mode-tests.el b/test/lisp/textmodes/sgml-mode-tests.el
index a900e8dcf2..ffcc2cd840 100644
--- a/test/lisp/textmodes/sgml-mode-tests.el
+++ b/test/lisp/textmodes/sgml-mode-tests.el
@@ -161,11 +161,18 @@ sgml-with-content
(should (string= "&&" (buffer-string))))))
(ert-deftest sgml-tests--quotes-syntax ()
- (with-temp-buffer
- (sgml-mode)
- (insert "a\"b <tag>c'd</tag>")
- (should (= 1 (car (syntax-ppss (1- (point-max))))))
- (should (= 0 (car (syntax-ppss (point-max)))))))
+ (dolist (str '("a\"b <t>c'd</t>"
+ "a'b <t>c\"d</t>"
+ "<t>\"a'</t>"
+ "<t>'a\"</t>"
+ "<t>\"a'\"</t>"
+ "<t>'a\"'</t>"))
+ (with-temp-buffer
+ (sgml-mode)
+ (insert str)
+ ;; Check that last tag is parsed as a tag.
+ (should (= 1 (car (syntax-ppss (1- (point-max))))))
+ (should (= 0 (car (syntax-ppss (point-max))))))))
(provide 'sgml-mode-tests)
;;; sgml-mode-tests.el ends here
--
2.11.0
[Message part 3 (text/plain, inline)]
And the correponding patch against emacs-26:
[0001-Backport-sgml-syntax-propertize-rules-speedup-Bug-33.patch (text/plain, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Sat, 18 May 2019 14:49:01 GMT)
Full text and
rfc822 format available.
Message #61 received at 33887 <at> debbugs.gnu.org (full text, mbox):
There's still an issue. On the following XML file
<root>
<a>text</a>
<!-- ' -->
<a>text</a>
</root>
the part after the comment <!-- ' --> is highlighted as a comment.
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Sat, 18 May 2019 14:56:01 GMT)
Full text and
rfc822 format available.
Message #64 received at 33887 <at> debbugs.gnu.org (full text, mbox):
On 2019-05-18 16:47:56 +0200, Vincent Lefevre wrote:
> There's still an issue. On the following XML file
>
> <root>
> <a>text</a>
> <!-- ' -->
> <a>text</a>
> </root>
>
> the part after the comment <!-- ' --> is highlighted as a comment.
And on the following XML file too:
<root>
<!DOCTYPE root [
<!ENTITY f SYSTEM "f.xml">
]>
<a>ab'cd</a>
<a>text</a>
</root>
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Sat, 18 May 2019 14:58:02 GMT)
Full text and
rfc822 format available.
Message #67 received at 33887 <at> debbugs.gnu.org (full text, mbox):
On 2019-05-18 16:55:43 +0200, Vincent Lefevre wrote:
> And on the following XML file too:
>
> <root>
> <!DOCTYPE root [
> <!ENTITY f SYSTEM "f.xml">
> ]>
> <a>ab'cd</a>
> <a>text</a>
> </root>
I actually meant
<!DOCTYPE root [
<!ENTITY f SYSTEM "f.xml">
]>
<root>
<a>ab'cd</a>
<a>text</a>
</root>
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Sat, 18 May 2019 15:02:01 GMT)
Full text and
rfc822 format available.
Message #70 received at 33887 <at> debbugs.gnu.org (full text, mbox):
And another one:
<root>
<a>text</a>
<!-- "don't" -->
<a>text</a>
</root>
The second text is highlighted as a comment.
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Sat, 18 May 2019 18:50:02 GMT)
Full text and
rfc822 format available.
Message #73 received at 33887 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Vincent Lefevre <vincent <at> vinc17.net> writes:
> There's still an issue. On the following XML file
>
> <root>
> <a>text</a>
> <!-- ' -->
> <a>text</a>
> </root>
>
> the part after the comment <!-- ' --> is highlighted as a comment.
> And another one:
>
> <root>
> <a>text</a>
> <!-- "don't" -->
> <a>text</a>
> </root>
>
> The second text is highlighted as a comment.
Right, this is a collision between the syntax rules. The following
patch fixes it, though perhaps it would be better to just search for the
end of the comment in the ("\\(<\\)!--" (1 "< b")) rule instead?
[0001-Fix-sgml-syntax-handling-of-quotes-in-comments.patch (text/x-diff, inline)]
From a866e4f4b556fb4a346fa68c62296f10966690a1 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs <at> gmail.com>
Date: Sat, 18 May 2019 13:18:19 -0400
Subject: [PATCH] Fix sgml syntax handling of quotes in comments
* lisp/textmodes/sgml-mode.el (sgml-syntax-propertize-rules): Make
sure not to skip over comment ender when searching for quotes.
* test/lisp/textmodes/sgml-mode-tests.el (sgml-tests--quotes-syntax):
Add a some more cases.
---
lisp/textmodes/sgml-mode.el | 11 ++++++++---
test/lisp/textmodes/sgml-mode-tests.el | 16 +++++++++-------
2 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/lisp/textmodes/sgml-mode.el b/lisp/textmodes/sgml-mode.el
index e3cf56aa0e..1af1d1eaef 100644
--- a/lisp/textmodes/sgml-mode.el
+++ b/lisp/textmodes/sgml-mode.el
@@ -350,9 +350,14 @@ sgml-font-lock-keywords
;; Be careful to call `syntax-ppss' on a position before the one
;; we're going to change, so as not to need to flush the data we
;; just computed.
- (if (prog1 (zerop (car (syntax-ppss (match-beginning 0))))
- (goto-char (1- (match-end 0))))
- (string-to-syntax ".")))))
+ (let ((ppss (syntax-ppss (match-beginning 0))))
+ (if (prog1 (zerop (car ppss)) ; Outside tag.
+ (goto-char (1- (match-end 0)))
+ ;; If we're in a comment, don't skip over comment
+ ;; ender.
+ (when (nth 4 ppss)
+ (skip-chars-backward "- \t\n")))
+ (string-to-syntax "."))))))
)))
(defun sgml-syntax-propertize (start end)
diff --git a/test/lisp/textmodes/sgml-mode-tests.el b/test/lisp/textmodes/sgml-mode-tests.el
index ffcc2cd840..7e1ddf4047 100644
--- a/test/lisp/textmodes/sgml-mode-tests.el
+++ b/test/lisp/textmodes/sgml-mode-tests.el
@@ -166,13 +166,15 @@ sgml-with-content
"<t>\"a'</t>"
"<t>'a\"</t>"
"<t>\"a'\"</t>"
- "<t>'a\"'</t>"))
- (with-temp-buffer
- (sgml-mode)
- (insert str)
- ;; Check that last tag is parsed as a tag.
- (should (= 1 (car (syntax-ppss (1- (point-max))))))
- (should (= 0 (car (syntax-ppss (point-max))))))))
+ "<t>'a\"'</t>"
+ "<t><!-- ' --></t>"
+ "<t><!-- \" --></t>"))
+ (ert-info (str :prefix "Test string: ")
+ (sgml-with-content
+ str
+ ;; Check that last tag is parsed as a tag.
+ (should (= 1 (car (syntax-ppss (1- (point-max))))))
+ (should (= 0 (car (syntax-ppss (point-max)))))))))
(provide 'sgml-mode-tests)
;;; sgml-mode-tests.el ends here
--
2.11.0
[Message part 3 (text/plain, inline)]
> <!DOCTYPE root [
> <!ENTITY f SYSTEM "f.xml">
> ]>
> <root>
> <a>ab'cd</a>
> <a>text</a>
> </root>
This is a different issue, I think the problem is that
sgml-syntax-propertize-inside doesn't handle nesting in the DTD
definition <! [ <! ... > ]>. The patch below just avoids calling
sgml-syntax-propertize-inside on the prolog in nxml-mode (but the
problem remains in sgml-mode). Though you'll hit Bug#18871/23668 if you
try to edit the DTD.
[0001-Don-t-sgml-syntax-propertize-inside-XML-prolog.patch (text/x-diff, inline)]
From 9a50fc38b537d570f739c428a57c66557152151b Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs <at> gmail.com>
Date: Sat, 18 May 2019 14:37:51 -0400
Subject: [PATCH] Don't sgml-syntax-propertize-inside XML prolog
* lisp/nxml/nxml-mode.el (nxml-syntax-propertize): New function.
(nxml-mode): Use it as the syntax-propertize-function.
* test/lisp/nxml/nxml-mode-tests.el (nxml-mode-doctype-and-quote-syntax):
New test.
---
lisp/nxml/nxml-mode.el | 16 +++++++++++++++-
test/lisp/nxml/nxml-mode-tests.el | 8 ++++++++
2 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/lisp/nxml/nxml-mode.el b/lisp/nxml/nxml-mode.el
index ab035b927e..7c39c5023c 100644
--- a/lisp/nxml/nxml-mode.el
+++ b/lisp/nxml/nxml-mode.el
@@ -423,6 +423,20 @@ nxml-parent-document-set
(when rng-validate-mode
(rng-validate-while-idle (current-buffer)))))
+(defvar nxml-prolog-end) ;; nxml-rap.el
+(defun nxml-syntax-propertize (start end)
+ "Syntactic keywords for `nxml-mode'."
+ ;; Like `sgml-syntax-propertize', but skip prolog.
+ (setq start (max start nxml-prolog-end))
+ (if (>= start end)
+ (goto-char end)
+ (goto-char start)
+ (sgml-syntax-propertize-inside end)
+ (funcall
+ (syntax-propertize-rules sgml-syntax-propertize-rules)
+ start end)))
+
+
(defvar tildify-space-string)
(defvar tildify-foreach-region-function)
@@ -518,7 +532,7 @@ nxml-mode
(nxml-with-invisible-motion
(nxml-scan-prolog)))))
(setq-local syntax-ppss-table sgml-tag-syntax-table)
- (setq-local syntax-propertize-function #'sgml-syntax-propertize)
+ (setq-local syntax-propertize-function #'nxml-syntax-propertize)
(add-hook 'change-major-mode-hook #'nxml-cleanup nil t)
;; Emacs 23 handles the encoding attribute on the xml declaration
diff --git a/test/lisp/nxml/nxml-mode-tests.el b/test/lisp/nxml/nxml-mode-tests.el
index 92744be619..2bbf92bc96 100644
--- a/test/lisp/nxml/nxml-mode-tests.el
+++ b/test/lisp/nxml/nxml-mode-tests.el
@@ -78,5 +78,13 @@ nxml-mode-tests-correctly-indented-string
(should-not (equal (get-text-property squote-txt-pos 'face)
(get-text-property dquote-att-pos 'face))))))
+(ert-deftest nxml-mode-doctype-and-quote-syntax ()
+ (with-temp-buffer
+ (insert "<!DOCTYPE t [\n<!ENTITY f SYSTEM \"f.xml\">\n]>\n<t>'</t>")
+ (nxml-mode)
+ ;; Check that last tag is parsed as a tag.
+ (should (= 1 (car (syntax-ppss (1- (point-max))))))
+ (should (= 0 (car (syntax-ppss (point-max)))))))
+
(provide 'nxml-mode-tests)
;;; nxml-mode-tests.el ends here
--
2.11.0
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Sun, 19 May 2019 00:18:02 GMT)
Full text and
rfc822 format available.
Message #76 received at 33887 <at> debbugs.gnu.org (full text, mbox):
There's an issue with the following XML file:
<root>
<a>don't</a>
<a>text</a>
<a>></a>
<a>don't</a>
<a>text</a>
</root>
where highlighting becomes wrong starting at the second '.
However, even though > is valid, I normally use > instead.
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Sun, 19 May 2019 17:44:01 GMT)
Full text and
rfc822 format available.
Message #79 received at 33887 <at> debbugs.gnu.org (full text, mbox):
Vincent Lefevre <vincent <at> vinc17.net> writes:
> There's an issue with the following XML file:
>
> <root>
> <a>don't</a>
> <a>text</a>
> <a>></a>
> <a>don't</a>
> <a>text</a>
> </root>
>
> where highlighting becomes wrong starting at the second '.
>
> However, even though > is valid, I normally use > instead.
Hmm, I can't see a way to handle this case without making the
syntax propertizing slow again. Stefan, any ideas?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Sun, 19 May 2019 18:50:01 GMT)
Full text and
rfc822 format available.
Message #82 received at 33887 <at> debbugs.gnu.org (full text, mbox):
> Hmm, I can't see a way to handle this case without making the
> syntax propertizing slow again. Stefan, any ideas?
Can you summarize the origin of the problem in his example?
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Sun, 19 May 2019 19:04:02 GMT)
Full text and
rfc822 format available.
Message #85 received at 33887 <at> debbugs.gnu.org (full text, mbox):
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:
> Can you summarize the origin of the problem in his example?
<t>>1</t>
(syntax-ppss) on the location of "1" in the above, gives (-1 ...). And
then (syntax-ppss) on the "/" will give (0 ...). So the syntax
propertize rule for quote use of (zerop (car (syntax-ppss))) no longer
works correctly to see whether it's inside or outside a tag.
">" outside of tags should be set to syntax ".", but I would assume that
adding a syntax-propertize rule which calls syntax-ppss for every ">"
(to check whether it's inside a tag or not) will be very slow, just like
calling it for every quote was.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Sun, 19 May 2019 19:25:02 GMT)
Full text and
rfc822 format available.
Message #88 received at 33887 <at> debbugs.gnu.org (full text, mbox):
>> Can you summarize the origin of the problem in his example?
>
> <t>>1</t>
>
> (syntax-ppss) on the location of "1" in the above, gives (-1 ...). And
> then (syntax-ppss) on the "/" will give (0 ...). So the syntax
> propertize rule for quote use of (zerop (car (syntax-ppss))) no longer
> works correctly to see whether it's inside or outside a tag.
>
> ">" outside of tags should be set to syntax ".", but I would assume that
> adding a syntax-propertize rule which calls syntax-ppss for every ">"
> (to check whether it's inside a tag or not) will be very slow, just like
> calling it for every quote was.
Oh, damn! Hmm...
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Mon, 20 May 2019 11:48:01 GMT)
Full text and
rfc822 format available.
Message #91 received at 33887 <at> debbugs.gnu.org (full text, mbox):
There's an issue with the following XML file, which does not have
any special character, except a single quote in the middle of the
text.
<root>
<a>12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
</a>
</root>
Note that the newline character before the </a> is important.
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Mon, 20 May 2019 20:48:01 GMT)
Full text and
rfc822 format available.
Message #94 received at 33887 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
> There's an issue with the following XML file, which does not have
> any special character, except a single quote in the middle of the
> text.
>
> <root>
> <a>12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
> </a>
> </root>
>
> Note that the newline character before the </a> is important.
Right, this is due to chunking by syntax-propertize. Here's the fix:
[0001-Handle-lone-quote-500-characters-away-from-XML-tag-B.patch (text/plain, attachment)]
[Message part 3 (text/plain, inline)]
Note that you have to be sure to recompile sgml-mode.el AND nxml-mode.el
after applying these patches, 'make' isn't smart enough to do it
automatically (yes, I figured this out the hard way).
>> <t>>1</t>
>>
>> (syntax-ppss) on the location of "1" in the above, gives (-1 ...). And
>> then (syntax-ppss) on the "/" will give (0 ...). So the syntax
>> propertize rule for quote use of (zerop (car (syntax-ppss))) no longer
>> works correctly to see whether it's inside or outside a tag.
>>
>> ">" outside of tags should be set to syntax ".", but I would assume that
>> adding a syntax-propertize rule which calls syntax-ppss for every ">"
>> (to check whether it's inside a tag or not) will be very slow, just like
>> calling it for every quote was.
Oh, I figured it out, we can just look at (nth 9 ppss), because the list
of open parens is still okay, regardless of unmatched close parens.
[0002-Handle-outside-SGML-XML-tags-Bug-33887.patch (text/plain, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Tue, 21 May 2019 01:07:02 GMT)
Full text and
rfc822 format available.
Message #97 received at 33887 <at> debbugs.gnu.org (full text, mbox):
Thanks for the fixes.
Also I don't think that in a text node, the " and ' characters should
be interpreted for highlighting. In particular, ' is generally used
as an apostrophe, not as a quote. For instance, this looks strange:
<a>This "shouldn't" and "can't" be right.</a>
These characters have no special meaning in a text node.
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Tue, 21 May 2019 12:28:01 GMT)
Full text and
rfc822 format available.
Message #100 received at 33887 <at> debbugs.gnu.org (full text, mbox):
Vincent Lefevre <vincent <at> vinc17.net> writes:
> Also I don't think that in a text node, the " and ' characters should
> be interpreted for highlighting. In particular, ' is generally used
> as an apostrophe, not as a quote. For instance, this looks strange:
>
> <a>This "shouldn't" and "can't" be right.</a>
>
> These characters have no special meaning in a text node.
Hmm, right, it should be possible to fix the crossing quotes in the
above case, but even the simpler
<a>"oops" 'oops'</a>
shows the same highlighting. This seems directly due to "we're now
careful to leave alone any pair of quotes that doesn't hold a < or >
char". So uh, Stefan, how was that supposed to work exactly?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Wed, 22 May 2019 14:00:03 GMT)
Full text and
rfc822 format available.
Message #103 received at 33887 <at> debbugs.gnu.org (full text, mbox):
> shows the same highlighting. This seems directly due to "we're now
> careful to leave alone any pair of quotes that doesn't hold a < or >
> char". So uh, Stefan, how was that supposed to work exactly?
Remember: when I wrote this, we only supported "..." and not '...'.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Wed, 22 May 2019 15:45:02 GMT)
Full text and
rfc822 format available.
Message #106 received at 33887 <at> debbugs.gnu.org (full text, mbox):
On 2019-05-22 09:58:54 -0400, Stefan Monnier wrote:
> > shows the same highlighting. This seems directly due to "we're now
> > careful to leave alone any pair of quotes that doesn't hold a < or >
> > char". So uh, Stefan, how was that supposed to work exactly?
>
> Remember: when I wrote this, we only supported "..." and not '...'.
I'm not sure what you mean by that, but the single quotes are not
the only issue. In general, you don't know the quoting rules in a
text node used by the underlying language (if any), even if you
have only double quotes. For instance, a text node may contain C
or shell code, which can be:
"a string with \"double quotes\"..."
And one does not expect this to be interpreted as two pairs of
double-quoted text ("a string with \" and "..."). In short, you
should leave text nodes with no specific highlighting, as this
was the case with Emacs 25.
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Wed, 22 May 2019 16:03:02 GMT)
Full text and
rfc822 format available.
Message #109 received at 33887 <at> debbugs.gnu.org (full text, mbox):
> I'm not sure what you mean by that, but the single quotes are not
> the only issue.
No but it introduces problems a lot more often.
> In general, you don't know the quoting rules in a
> text node used by the underlying language (if any), even if you
> have only double quotes. For instance, a text node may contain C
> or shell code, which can be:
>
> "a string with \"double quotes\"..."
Of course. But to the extent that it doesn't break the rest of the SGML
support, I think it was a pretty good tradeoff (and has arguably a more
often beneficial than harmful effect).
> And one does not expect this to be interpreted as two pairs of
> double-quoted text ("a string with \" and "..."). In short, you
> should leave text nodes with no specific highlighting, as this
> was the case with Emacs 25.
IIRC in Emacs-24 it was yet different. Basically, the focus should be
to handle tags correctly and what happens in the regular text between
tags is not nearly as important.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Wed, 22 May 2019 21:45:01 GMT)
Full text and
rfc822 format available.
Message #112 received at 33887 <at> debbugs.gnu.org (full text, mbox):
>> <t>>1</t>
> Oh, damn! Hmm...
Maybe the best way to detect this is using `parse-partial-sexp` passing
it a `targetdepth` of -1. The trick will be when/where to call it so
it's cheap enough.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Wed, 22 May 2019 22:38:04 GMT)
Full text and
rfc822 format available.
Message #115 received at 33887 <at> debbugs.gnu.org (full text, mbox):
> Right, this is due to chunking by syntax-propertize. Here's the fix:
I pushed a patch which should fix the "lone >" problem without
introducing any undue extra cost. It should also fix the "very long
line" case.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Sun, 26 May 2019 22:19:01 GMT)
Full text and
rfc822 format available.
Message #118 received at 33887 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:
> I pushed a patch which should fix the "lone >" problem without
> introducing any undue extra cost. It should also fix the "very long
> line" case.
Seems to pass my tests. Not sure if you missed the alternate fix I
proposed in https://debbugs.gnu.org/33887#94 or not. It does have the
disadvantage of leaving (car (syntax-ppss)) unreliable for any other
code which uses it.
Here's a patch against master that should cover the remaining cases
Vincent raised:
[0001-Fix-some-SGML-syntax-edge-cases-Bug-33887.patch (text/plain, attachment)]
[Message part 3 (text/plain, inline)]
And about the highlighting of quoted text outside tags, we can just
disable fontification, while leaving the syntax code untouched:
[0002-Don-t-fontiy-text-outside-of-SGML-XML-tags-Bug-33887.patch (text/plain, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Mon, 27 May 2019 09:19:01 GMT)
Full text and
rfc822 format available.
Message #121 received at 33887 <at> debbugs.gnu.org (full text, mbox):
On 2019-05-26 18:17:55 -0400, Noam Postavsky wrote:
> And about the highlighting of quoted text outside tags, we can just
> disable fontification, while leaving the syntax code untouched:
[...]
I've applied it with a minor change against Emacs 26 (context lines
for hunk #1 of sgml-mode.el are different), but the comments are
no longer highlighted as comments.
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Mon, 27 May 2019 12:03:01 GMT)
Full text and
rfc822 format available.
Message #124 received at 33887 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Vincent Lefevre <vincent <at> vinc17.net> writes:
> On 2019-05-26 18:17:55 -0400, Noam Postavsky wrote:
>> And about the highlighting of quoted text outside tags, we can just
>> disable fontification, while leaving the syntax code untouched:
> [...]
>
> I've applied it with a minor change against Emacs 26 (context lines
> for hunk #1 of sgml-mode.el are different), but the comments are
> no longer highlighted as comments.
Ah, I guess reusing the default font-lock-syntactic-face-function
doesn't really make sense after all. So sgml-font-lock-syntactic-face
should be like this:
(defun sgml-font-lock-syntactic-face (state)
"`font-lock-syntactic-face-function' for `sgml-mode'."
;; Don't use string face outside of tags.
(cond ((and (nth 9 state) (nth 3 state)) font-lock-string-face)
((nth 4 state) font-lock-comment-face)))
[0001-Don-t-fontify-text-outside-of-SGML-XML-tags-Bug-3388.patch (text/plain, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Wed, 29 May 2019 00:31:01 GMT)
Full text and
rfc822 format available.
Message #127 received at 33887 <at> debbugs.gnu.org (full text, mbox):
Thanks. A last issue: a comment before the root element is not
highlighted. Example: in
<?xml version="1.0" encoding="utf-8"?>
<!-- comment -->
<root>
<!-- comment -->
</root>
<!-- comment -->
the first comment is not highlighted, but the other two comments are.
--
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33887
; Package
emacs
.
(Tue, 04 Jun 2019 12:57:02 GMT)
Full text and
rfc822 format available.
Message #130 received at 33887 <at> debbugs.gnu.org (full text, mbox):
tags 33887 fixed
close 33887 27.1
quit
Vincent Lefevre <vincent <at> vinc17.net> writes:
> Thanks. A last issue: a comment before the root element is not
> highlighted. Example: in
>
> <?xml version="1.0" encoding="utf-8"?>
> <!-- comment -->
> <root>
> <!-- comment -->
> </root>
> <!-- comment -->
>
> the first comment is not highlighted, but the other two comments are.
This was followed up in https://debbugs.gnu.org/32823#45
I'm pushing the current patches to master and closing this bug, as I
think all the issues here are resolved (if not, we can open new bugs).
e04f93e18a 2019-06-04T08:42:50-04:00 "Don't fontify text outside of SGML/XML tags (Bug#33887)"
https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=e04f93e18a8083d3a4930decc523c4f5d9a97c9e
438e4804d1 2019-06-04T08:42:50-04:00 "Fix some SGML syntax edge cases (Bug#33887)"
https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=438e4804d107720f526d0c7c367cbd029f264676
Added tag(s) fixed.
Request was from
Noam Postavsky <npostavs <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Tue, 04 Jun 2019 12:57:02 GMT)
Full text and
rfc822 format available.
bug marked as fixed in version 27.1, send any further explanations to
33887 <at> debbugs.gnu.org and Vincent Lefevre <vincent <at> vinc17.net>
Request was from
Noam Postavsky <npostavs <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Tue, 04 Jun 2019 12:57:04 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Wed, 03 Jul 2019 11:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 5 years and 138 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.