GNU bug report logs - #70076
28.3; xml-escape-string parse issue

Previous Next

Package: emacs;

Reported by: "D. Schmudde" <d <at> schmud.de>

Date: Fri, 29 Mar 2024 16:03:04 UTC

Severity: normal

Found in version 28.3

To reply to this bug, email your comments to 70076 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#70076; Package emacs. (Fri, 29 Mar 2024 16:03:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to "D. Schmudde" <d <at> schmud.de>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 29 Mar 2024 16:03:04 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "D. Schmudde" <d <at> schmud.de>
To: bug-gnu-emacs <at> gnu.org
Cc: Protesilaos Stavrou <public <at> protesilaos.com>
Subject: 28.3; xml-escape-string parse issue
Date: Fri, 29 Mar 2024 16:44:48 +0100
Starting with `emacs -Q`:

(require 'xml)
(xml-escape-string "And now it\342\200\231s all this")

The result is: `xml-escape-string: Invalid XML character: 4194274, 
11`

I expect that the string will parse correctly with these escape 
characters. Or is this expectation wrong?

In GNU Emacs 28.3 (build 1, x86_64-pc-linux-gnu, GTK+ Version 
3.24.33, cairo version 1.16.0)
of 2023-08-25 built on pop-os
Repository revision: dec958258b133b4c21224c594da433919d852800
Repository branch: emacs-28
System Description: Pop!_OS 22.04 LTS

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ 
JPEG
JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES 
NOTIFY
INOTIFY PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF 
TOOLKIT_SCROLL_BARS
X11 XDBE XIM XPM GTK3 ZLIB
Important settings:
 value of $LANG: en_US.UTF-8
 value of $XMODIFIERS: @im=ibus
 locale-coding-system: utf-8-unix



-- 
w: http://schmud.de
e: d <at> schmud.de
t: @dschmudde




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#70076; Package emacs. (Fri, 29 Mar 2024 18:09:02 GMT) Full text and rfc822 format available.

Message #8 received at 70076 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "D. Schmudde" <d <at> schmud.de>
Cc: public <at> protesilaos.com, 70076 <at> debbugs.gnu.org
Subject: Re: bug#70076: 28.3; xml-escape-string parse issue
Date: Fri, 29 Mar 2024 21:08:12 +0300
> Cc: Protesilaos Stavrou <public <at> protesilaos.com>
> From: "D. Schmudde" <d <at> schmud.de>
> Date: Fri, 29 Mar 2024 16:44:48 +0100
> 
> Starting with `emacs -Q`:
> 
> (require 'xml)
> (xml-escape-string "And now it\342\200\231s all this")
> 
> The result is: `xml-escape-string: Invalid XML character: 4194274, 
> 11`
> 
> I expect that the string will parse correctly with these escape 
> characters. Or is this expectation wrong?

Your expectation is wrong, AFAIU: you are inserting a unibyte string
(a string made out of raw bytes) instead of inserting a non-ASCII
multibyte string, which is what XML expects.

Why did you need to insert those bytes, and where did they come from?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#70076; Package emacs. (Sun, 31 Mar 2024 11:44:02 GMT) Full text and rfc822 format available.

Message #11 received at 70076 <at> debbugs.gnu.org (full text, mbox):

From: "D. Schmudde" <d <at> schmud.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: public <at> protesilaos.com, 70076 <at> debbugs.gnu.org
Subject: Re: bug#70076: 28.3; xml-escape-string parse issue
Date: Sun, 31 Mar 2024 13:15:29 +0200
Okay, good to know. Thanks for taking a look.

Here is some additional context. It occurs when using Elfeed's 
~elfeed-export-opml~ on my list of RSS feeds. It seems the library 
relies on ~xml-escape-string~ to parse each element. It's worth 
noting that this happens on several feeds, not just the feed for 
leancrew.com listed below.

I can file a bug with the package maintainers but I wasn't sure if 
the XML parser was a better place to start. Here is the specific 
backtrace, if it's useful:

Debugger entered--Lisp error: (xml-invalid-character 4194274 11)
 signal(xml-invalid-character (4194274 11))
 xml-escape-string("And now it\342\200\231s all this")
 xml-debug-print-internal((outline ((xmlUrl 
 . "https://leancrew.com/all-this/feed/") (title . "And now 
 it\342\200\231s all this"))) "    ")
 ...

/David

Eli Zaretskii <eliz <at> gnu.org> writes:

>> Cc: Protesilaos Stavrou <public <at> protesilaos.com>
>> From: "D. Schmudde" <d <at> schmud.de>
>> Date: Fri, 29 Mar 2024 16:44:48 +0100
>>
>> Starting with `emacs -Q`:
>>
>> (require 'xml)
>> (xml-escape-string "And now it\342\200\231s all this")
>>
>> The result is: `xml-escape-string: Invalid XML character: 
>> 4194274,
>> 11`
>>
>> I expect that the string will parse correctly with these escape
>> characters. Or is this expectation wrong?
>
> Your expectation is wrong, AFAIU: you are inserting a unibyte 
> string
> (a string made out of raw bytes) instead of inserting a 
> non-ASCII
> multibyte string, which is what XML expects.
>
> Why did you need to insert those bytes, and where did they come 
> from?


--
w: http://schmud.de
e: d <at> schmud.de
t: @dschmudde




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#70076; Package emacs. (Sun, 31 Mar 2024 13:22:03 GMT) Full text and rfc822 format available.

Message #14 received at 70076 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "D. Schmudde" <d <at> schmud.de>
Cc: public <at> protesilaos.com, 70076 <at> debbugs.gnu.org
Subject: Re: bug#70076: 28.3; xml-escape-string parse issue
Date: Sun, 31 Mar 2024 16:21:34 +0300
> From: "D. Schmudde" <d <at> schmud.de>
> Cc: 70076 <at> debbugs.gnu.org, public <at> protesilaos.com
> Date: Sun, 31 Mar 2024 13:15:29 +0200
> 
> Okay, good to know. Thanks for taking a look.
> 
> Here is some additional context. It occurs when using Elfeed's 
> ~elfeed-export-opml~ on my list of RSS feeds. It seems the library 
> relies on ~xml-escape-string~ to parse each element. It's worth 
> noting that this happens on several feeds, not just the feed for 
> leancrew.com listed below.

OK, but still: how did you get to that point?  Where did the
problematic string originate from?  Was it something that you typed or
copy/pasted, or something else?

> I can file a bug with the package maintainers but I wasn't sure if 
> the XML parser was a better place to start.

Yes, I think it is best to start by reporting this to package
maintainers.




This bug report was last modified 34 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.