GNU bug report logs - #59537
`libxml-parse-xml-region` strips out the namespace information, and namespace prefix in the DOM representation

Previous Next

Package: emacs;

Reported by: Ramesh Nedunchezian <rameshnedunchezian <at> outlook.com>

Date: Thu, 24 Nov 2022 09:08:01 UTC

Severity: normal

Tags: notabug

Done: Stefan Kangas <stefankangas <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 59537 in the body.
You can then email your comments to 59537 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#59537; Package emacs. (Thu, 24 Nov 2022 09:08:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ramesh Nedunchezian <rameshnedunchezian <at> outlook.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 24 Nov 2022 09:08:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ramesh Nedunchezian <rameshnedunchezian <at> outlook.com>
To: bug-gnu-emacs <at> gnu.org
Subject: `libxml-parse-xml-region` strips out the namespace information, and
 namespace prefix in the DOM representation
Date: Thu, 24 Nov 2022 14:31:52 +0530
[Message part 1 (text/plain, inline)]
`libxml-parse-xml-region` strips out the namespace information, and namespace prefix in the DOM representation.

Stripping out the NAMESPACE information is a bug.

FWIW, the XML file under question is a OpenDocument styles file.

See the attached =xml.org= file for more information.  That is, ...execute the org babel blocks, and follow the inline comments in there. 


In GNU Emacs 29.0.50 (build 3, x86_64-pc-linux-gnu, GTK+ Version
 3.24.34, cairo version 1.16.0) of 2022-11-19 built on debian
Repository revision: a6ae13af42ede6618c326855ea4c95e0298fb75b
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12101004
System Description: Debian GNU/Linux bookworm/sid

Configured using:
 'configure --with-imagemagick --with-xwidgets --with-json
 --without-compress-install'

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ
IMAGEMAGICK JPEG JSON LCMS2 LIBOTF LIBSELINUX LIBSYSTEMD LIBXML2
M17N_FLT MODULES NOTIFY INOTIFY PDUMPER PNG RSVG SECCOMP SOUND SQLITE3
THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM XINPUT2 XPM XWIDGETS GTK3
ZLIB

[xml.org (text/org, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59537; Package emacs. (Thu, 24 Nov 2022 10:24:01 GMT) Full text and rfc822 format available.

Message #8 received at 59537 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Ramesh Nedunchezian <rameshnedunchezian <at> outlook.com>
Cc: 59537 <at> debbugs.gnu.org
Subject: Re: bug#59537: `libxml-parse-xml-region` strips out the namespace
 information, and namespace prefix in the DOM representation
Date: Thu, 24 Nov 2022 12:23:17 +0200
> Date: Thu, 24 Nov 2022 14:31:52 +0530
> From: Ramesh Nedunchezian <rameshnedunchezian <at> outlook.com>
> 
> `libxml-parse-xml-region` strips out the namespace information, and namespace prefix in the DOM representation.
> 
> Stripping out the NAMESPACE information is a bug.

AFAICT, we just call a function from libxml2.  So I guess the bug is in that
library?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59537; Package emacs. (Thu, 24 Nov 2022 10:52:02 GMT) Full text and rfc822 format available.

Message #11 received at 59537 <at> debbugs.gnu.org (full text, mbox):

From: Ramesh Nedunchezian <rameshnedunchezian <at> outlook.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 59537 <at> debbugs.gnu.org
Subject: Re: bug#59537: `libxml-parse-xml-region` strips out the namespace
 information, and namespace prefix in the DOM representation
Date: Thu, 24 Nov 2022 16:21:30 +0530
On 24/11/22 15:52, Eli Zaretskii wrote:
>> Date: Thu, 24 Nov 2022 14:31:52 +0530
>> From: Ramesh Nedunchezian <rameshnedunchezian <at> outlook.com>
>>
>> `libxml-parse-xml-region` strips out the namespace information, and namespace prefix in the DOM representation.
>>
>> Stripping out the NAMESPACE information is a bug.
> AFAICT, we just call a function from libxml2.  So I guess the bug is in that
> library?

Do I need to upgrade? 

I am on a  Debian Unstable, and my laptop was updated only 3-months ago.  I would think that for all practical purposes, my libraries are "recent".

I am not familiar with XML or XML libraries much. 

May be Emacs should provide other entry points to libxml which would  preserve (or return) the namespace information ..

The problem with current state of affairs is that round tripping wouldn't work.  That is if I move from XML1->DOM->XML2, XML1 and XML2 will no longer be the same.

I believe, libxml was introduced to cater to Eww's HTML rendering.  So, would it be "reasonable" to say that the current entry point is sufficient to work with HTML-like XML, and not for ANY XML.


I am on debian unstable


$ uname -a

Linux debian 5.19.0-1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.19.6-1 (2022-09-01) x86_64 GNU/Linux

~$ ldd `which emacs` | grep xml

    libxml2.so.2 => /lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f5a62052000)


~$ ls -al /lib/x86_64-linux-gnu/libxml2.so.2
lrwxrwxrwx 1 root root 17 Jul 24 01:33 /lib/x86_64-linux-gnu/libxml2.so.2 -> libxml2.so.2.9.14

~$ dpkg -S libxml2.so.2.9.14

libxml2:amd64: /usr/lib/x86_64-linux-gnu/libxml2.so.2.9.14

~$ apt show libxml2:amd64

Package: libxml2
Version: 2.9.14+dfsg-1+b1
Priority: optional
Section: libs
Source: libxml2 (2.9.14+dfsg-1)
Maintainer: Debian XML/SGML Group <debian-xml-sgml-pkgs <at> lists.alioth.debian.org>
Installed-Size: 1,938 kB
Depends: libc6 (>= 2.33), libicu71 (>= 71.1-1~), liblzma5 (>= 5.1.1alpha+20120614), zlib1g (>= 1:1.2.3.3)
Conflicts: w3c-dtd-xhtml
Homepage: http://xmlsoft.org
Tag: role::shared-lib
Download-Size: 708 kB
APT-Manual-Installed: no
APT-Sources: https://deb.debian.org/debian unstable/main amd64 Packages
Description: GNOME XML library
 XML is a metalanguage to let you design your own markup language.
 A regular markup language defines a way to describe information in
 a certain class of documents (eg HTML). XML lets you define your
 own customized markup languages for many classes of document. It
 can do this because it's written in SGML, the international standard
 metalanguage for markup languages.
 .
 This package provides a library providing an extensive API to handle
 such XML data files.






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#59537; Package emacs. (Thu, 24 Nov 2022 11:01:02 GMT) Full text and rfc822 format available.

Message #14 received at 59537 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Ramesh Nedunchezian <rameshnedunchezian <at> outlook.com>
Cc: 59537 <at> debbugs.gnu.org
Subject: Re: bug#59537: `libxml-parse-xml-region` strips out the namespace
 information, and namespace prefix in the DOM representation
Date: Thu, 24 Nov 2022 13:00:41 +0200
> Date: Thu, 24 Nov 2022 16:21:30 +0530
> Cc: 59537 <at> debbugs.gnu.org
> From: Ramesh Nedunchezian <rameshnedunchezian <at> outlook.com>
> 
> On 24/11/22 15:52, Eli Zaretskii wrote:
> >> Date: Thu, 24 Nov 2022 14:31:52 +0530
> >> From: Ramesh Nedunchezian <rameshnedunchezian <at> outlook.com>
> >>
> >> `libxml-parse-xml-region` strips out the namespace information, and namespace prefix in the DOM representation.
> >>
> >> Stripping out the NAMESPACE information is a bug.
> > AFAICT, we just call a function from libxml2.  So I guess the bug is in that
> > library?
> 
> Do I need to upgrade? 

I don't know.  I'm not sure the latest libxml2 has this fixed.

I'm saying that you should probably discuss this with the libxml2
developers.

> I believe, libxml was introduced to cater to Eww's HTML rendering.

No, we introduced it for any kind of HTML and XML processing we need.




Added tag(s) notabug. Request was from Stefan Kangas <stefankangas <at> gmail.com> to control <at> debbugs.gnu.org. (Thu, 24 Nov 2022 18:17:03 GMT) Full text and rfc822 format available.

Reply sent to Stefan Kangas <stefankangas <at> gmail.com>:
You have taken responsibility. (Sat, 03 Dec 2022 00:59:03 GMT) Full text and rfc822 format available.

Notification sent to Ramesh Nedunchezian <rameshnedunchezian <at> outlook.com>:
bug acknowledged by developer. (Sat, 03 Dec 2022 00:59:03 GMT) Full text and rfc822 format available.

Message #21 received at 59537-done <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 59537-done <at> debbugs.gnu.org,
 Ramesh Nedunchezian <rameshnedunchezian <at> outlook.com>
Subject: Re: bug#59537: `libxml-parse-xml-region` strips out the namespace
 information, and namespace prefix in the DOM representation
Date: Fri, 2 Dec 2022 16:58:24 -0800
Eli Zaretskii <eliz <at> gnu.org> writes:

>> Date: Thu, 24 Nov 2022 16:21:30 +0530
>> Cc: 59537 <at> debbugs.gnu.org
>> From: Ramesh Nedunchezian <rameshnedunchezian <at> outlook.com>
>>
>> On 24/11/22 15:52, Eli Zaretskii wrote:
>> >> Date: Thu, 24 Nov 2022 14:31:52 +0530
>> >> From: Ramesh Nedunchezian <rameshnedunchezian <at> outlook.com>
>> >>
>> >> `libxml-parse-xml-region` strips out the namespace information, and namespace prefix in the DOM representation.
>> >>
>> >> Stripping out the NAMESPACE information is a bug.
>> > AFAICT, we just call a function from libxml2.  So I guess the bug is in that
>> > library?
>>
>> Do I need to upgrade?
>
> I don't know.  I'm not sure the latest libxml2 has this fixed.
>
> I'm saying that you should probably discuss this with the libxml2
> developers.
>
>> I believe, libxml was introduced to cater to Eww's HTML rendering.
>
> No, we introduced it for any kind of HTML and XML processing we need.

Since this doesn't look like a bug in Emacs, I'm closing it now.  Please
report it to the libxml2 developers.  Thanks.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 31 Dec 2022 12:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 184 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.