GNU bug report logs -
#70076
28.3; xml-escape-string parse issue
Previous Next
Reported by: "D. Schmudde" <d <at> schmud.de>
Date: Fri, 29 Mar 2024 16:03:04 UTC
Severity: normal
Tags: notabug
Found in version 28.3
Done: Stefan Kangas <stefankangas <at> gmail.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 70076 in the body.
You can then email your comments to 70076 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#70076
; Package
emacs
.
(Fri, 29 Mar 2024 16:03:04 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"D. Schmudde" <d <at> schmud.de>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Fri, 29 Mar 2024 16:03:04 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Starting with `emacs -Q`:
(require 'xml)
(xml-escape-string "And now it\342\200\231s all this")
The result is: `xml-escape-string: Invalid XML character: 4194274,
11`
I expect that the string will parse correctly with these escape
characters. Or is this expectation wrong?
In GNU Emacs 28.3 (build 1, x86_64-pc-linux-gnu, GTK+ Version
3.24.33, cairo version 1.16.0)
of 2023-08-25 built on pop-os
Repository revision: dec958258b133b4c21224c594da433919d852800
Repository branch: emacs-28
System Description: Pop!_OS 22.04 LTS
Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ
JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES
NOTIFY
INOTIFY PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF
TOOLKIT_SCROLL_BARS
X11 XDBE XIM XPM GTK3 ZLIB
Important settings:
value of $LANG: en_US.UTF-8
value of $XMODIFIERS: @im=ibus
locale-coding-system: utf-8-unix
--
w: http://schmud.de
e: d <at> schmud.de
t: @dschmudde
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#70076
; Package
emacs
.
(Fri, 29 Mar 2024 18:09:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 70076 <at> debbugs.gnu.org (full text, mbox):
> Cc: Protesilaos Stavrou <public <at> protesilaos.com>
> From: "D. Schmudde" <d <at> schmud.de>
> Date: Fri, 29 Mar 2024 16:44:48 +0100
>
> Starting with `emacs -Q`:
>
> (require 'xml)
> (xml-escape-string "And now it\342\200\231s all this")
>
> The result is: `xml-escape-string: Invalid XML character: 4194274,
> 11`
>
> I expect that the string will parse correctly with these escape
> characters. Or is this expectation wrong?
Your expectation is wrong, AFAIU: you are inserting a unibyte string
(a string made out of raw bytes) instead of inserting a non-ASCII
multibyte string, which is what XML expects.
Why did you need to insert those bytes, and where did they come from?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#70076
; Package
emacs
.
(Sun, 31 Mar 2024 11:44:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 70076 <at> debbugs.gnu.org (full text, mbox):
Okay, good to know. Thanks for taking a look.
Here is some additional context. It occurs when using Elfeed's
~elfeed-export-opml~ on my list of RSS feeds. It seems the library
relies on ~xml-escape-string~ to parse each element. It's worth
noting that this happens on several feeds, not just the feed for
leancrew.com listed below.
I can file a bug with the package maintainers but I wasn't sure if
the XML parser was a better place to start. Here is the specific
backtrace, if it's useful:
Debugger entered--Lisp error: (xml-invalid-character 4194274 11)
signal(xml-invalid-character (4194274 11))
xml-escape-string("And now it\342\200\231s all this")
xml-debug-print-internal((outline ((xmlUrl
. "https://leancrew.com/all-this/feed/") (title . "And now
it\342\200\231s all this"))) " ")
...
/David
Eli Zaretskii <eliz <at> gnu.org> writes:
>> Cc: Protesilaos Stavrou <public <at> protesilaos.com>
>> From: "D. Schmudde" <d <at> schmud.de>
>> Date: Fri, 29 Mar 2024 16:44:48 +0100
>>
>> Starting with `emacs -Q`:
>>
>> (require 'xml)
>> (xml-escape-string "And now it\342\200\231s all this")
>>
>> The result is: `xml-escape-string: Invalid XML character:
>> 4194274,
>> 11`
>>
>> I expect that the string will parse correctly with these escape
>> characters. Or is this expectation wrong?
>
> Your expectation is wrong, AFAIU: you are inserting a unibyte
> string
> (a string made out of raw bytes) instead of inserting a
> non-ASCII
> multibyte string, which is what XML expects.
>
> Why did you need to insert those bytes, and where did they come
> from?
--
w: http://schmud.de
e: d <at> schmud.de
t: @dschmudde
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#70076
; Package
emacs
.
(Sun, 31 Mar 2024 13:22:03 GMT)
Full text and
rfc822 format available.
Message #14 received at 70076 <at> debbugs.gnu.org (full text, mbox):
> From: "D. Schmudde" <d <at> schmud.de>
> Cc: 70076 <at> debbugs.gnu.org, public <at> protesilaos.com
> Date: Sun, 31 Mar 2024 13:15:29 +0200
>
> Okay, good to know. Thanks for taking a look.
>
> Here is some additional context. It occurs when using Elfeed's
> ~elfeed-export-opml~ on my list of RSS feeds. It seems the library
> relies on ~xml-escape-string~ to parse each element. It's worth
> noting that this happens on several feeds, not just the feed for
> leancrew.com listed below.
OK, but still: how did you get to that point? Where did the
problematic string originate from? Was it something that you typed or
copy/pasted, or something else?
> I can file a bug with the package maintainers but I wasn't sure if
> the XML parser was a better place to start.
Yes, I think it is best to start by reporting this to package
maintainers.
Added tag(s) notabug.
Request was from
Stefan Kangas <stefankangas <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Sun, 30 Jun 2024 05:55:01 GMT)
Full text and
rfc822 format available.
Reply sent
to
Stefan Kangas <stefankangas <at> gmail.com>
:
You have taken responsibility.
(Sun, 30 Jun 2024 06:13:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
"D. Schmudde" <d <at> schmud.de>
:
bug acknowledged by developer.
(Sun, 30 Jun 2024 06:13:02 GMT)
Full text and
rfc822 format available.
Message #21 received at 70076-done <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: "D. Schmudde" <d <at> schmud.de>
>> Cc: 70076 <at> debbugs.gnu.org, public <at> protesilaos.com
>> Date: Sun, 31 Mar 2024 13:15:29 +0200
>>
>> Okay, good to know. Thanks for taking a look.
>>
>> Here is some additional context. It occurs when using Elfeed's
>> ~elfeed-export-opml~ on my list of RSS feeds. It seems the library
>> relies on ~xml-escape-string~ to parse each element. It's worth
>> noting that this happens on several feeds, not just the feed for
>> leancrew.com listed below.
>
> OK, but still: how did you get to that point? Where did the
> problematic string originate from? Was it something that you typed or
> copy/pasted, or something else?
>
>> I can file a bug with the package maintainers but I wasn't sure if
>> the XML parser was a better place to start.
>
> Yes, I think it is best to start by reporting this to package
> maintainers.
This doesn't seem like a bug in Emacs, and we didn't hear anything in 3
months. I'm therefore closing this bug.
If this is still an issue, please reply to this email (use "Reply to
all" in your email client) and we can reopen the bug report.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 28 Jul 2024 11:24:17 GMT)
Full text and
rfc822 format available.
This bug report was last modified 203 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.