GNU bug report logs -
#70076
28.3; xml-escape-string parse issue
Previous Next
To reply to this bug, email your comments to 70076 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#70076
; Package
emacs
.
(Fri, 29 Mar 2024 16:03:04 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"D. Schmudde" <d <at> schmud.de>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Fri, 29 Mar 2024 16:03:04 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Starting with `emacs -Q`:
(require 'xml)
(xml-escape-string "And now it\342\200\231s all this")
The result is: `xml-escape-string: Invalid XML character: 4194274,
11`
I expect that the string will parse correctly with these escape
characters. Or is this expectation wrong?
In GNU Emacs 28.3 (build 1, x86_64-pc-linux-gnu, GTK+ Version
3.24.33, cairo version 1.16.0)
of 2023-08-25 built on pop-os
Repository revision: dec958258b133b4c21224c594da433919d852800
Repository branch: emacs-28
System Description: Pop!_OS 22.04 LTS
Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ
JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES
NOTIFY
INOTIFY PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF
TOOLKIT_SCROLL_BARS
X11 XDBE XIM XPM GTK3 ZLIB
Important settings:
value of $LANG: en_US.UTF-8
value of $XMODIFIERS: @im=ibus
locale-coding-system: utf-8-unix
--
w: http://schmud.de
e: d <at> schmud.de
t: @dschmudde
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#70076
; Package
emacs
.
(Fri, 29 Mar 2024 18:09:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 70076 <at> debbugs.gnu.org (full text, mbox):
> Cc: Protesilaos Stavrou <public <at> protesilaos.com>
> From: "D. Schmudde" <d <at> schmud.de>
> Date: Fri, 29 Mar 2024 16:44:48 +0100
>
> Starting with `emacs -Q`:
>
> (require 'xml)
> (xml-escape-string "And now it\342\200\231s all this")
>
> The result is: `xml-escape-string: Invalid XML character: 4194274,
> 11`
>
> I expect that the string will parse correctly with these escape
> characters. Or is this expectation wrong?
Your expectation is wrong, AFAIU: you are inserting a unibyte string
(a string made out of raw bytes) instead of inserting a non-ASCII
multibyte string, which is what XML expects.
Why did you need to insert those bytes, and where did they come from?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#70076
; Package
emacs
.
(Sun, 31 Mar 2024 11:44:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 70076 <at> debbugs.gnu.org (full text, mbox):
Okay, good to know. Thanks for taking a look.
Here is some additional context. It occurs when using Elfeed's
~elfeed-export-opml~ on my list of RSS feeds. It seems the library
relies on ~xml-escape-string~ to parse each element. It's worth
noting that this happens on several feeds, not just the feed for
leancrew.com listed below.
I can file a bug with the package maintainers but I wasn't sure if
the XML parser was a better place to start. Here is the specific
backtrace, if it's useful:
Debugger entered--Lisp error: (xml-invalid-character 4194274 11)
signal(xml-invalid-character (4194274 11))
xml-escape-string("And now it\342\200\231s all this")
xml-debug-print-internal((outline ((xmlUrl
. "https://leancrew.com/all-this/feed/") (title . "And now
it\342\200\231s all this"))) " ")
...
/David
Eli Zaretskii <eliz <at> gnu.org> writes:
>> Cc: Protesilaos Stavrou <public <at> protesilaos.com>
>> From: "D. Schmudde" <d <at> schmud.de>
>> Date: Fri, 29 Mar 2024 16:44:48 +0100
>>
>> Starting with `emacs -Q`:
>>
>> (require 'xml)
>> (xml-escape-string "And now it\342\200\231s all this")
>>
>> The result is: `xml-escape-string: Invalid XML character:
>> 4194274,
>> 11`
>>
>> I expect that the string will parse correctly with these escape
>> characters. Or is this expectation wrong?
>
> Your expectation is wrong, AFAIU: you are inserting a unibyte
> string
> (a string made out of raw bytes) instead of inserting a
> non-ASCII
> multibyte string, which is what XML expects.
>
> Why did you need to insert those bytes, and where did they come
> from?
--
w: http://schmud.de
e: d <at> schmud.de
t: @dschmudde
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#70076
; Package
emacs
.
(Sun, 31 Mar 2024 13:22:03 GMT)
Full text and
rfc822 format available.
Message #14 received at 70076 <at> debbugs.gnu.org (full text, mbox):
> From: "D. Schmudde" <d <at> schmud.de>
> Cc: 70076 <at> debbugs.gnu.org, public <at> protesilaos.com
> Date: Sun, 31 Mar 2024 13:15:29 +0200
>
> Okay, good to know. Thanks for taking a look.
>
> Here is some additional context. It occurs when using Elfeed's
> ~elfeed-export-opml~ on my list of RSS feeds. It seems the library
> relies on ~xml-escape-string~ to parse each element. It's worth
> noting that this happens on several feeds, not just the feed for
> leancrew.com listed below.
OK, but still: how did you get to that point? Where did the
problematic string originate from? Was it something that you typed or
copy/pasted, or something else?
> I can file a bug with the package maintainers but I wasn't sure if
> the XML parser was a better place to start.
Yes, I think it is best to start by reporting this to package
maintainers.
This bug report was last modified 34 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.