GNU bug report logs - #75998
[guile-lib] html->sxml does not decode entities in attributes

Previous Next

Package: guile;

Reported by: Tomas Volf <~@wolfsden.cz>

Date: Sat, 1 Feb 2025 20:11:01 UTC

Severity: normal

To reply to this bug, email your comments to 75998 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guile <at> gnu.org:
bug#75998; Package guile. (Sat, 01 Feb 2025 20:11:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Tomas Volf <~@wolfsden.cz>:
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Sat, 01 Feb 2025 20:11:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Tomas Volf <~@wolfsden.cz>
To: bug-guile <at> gnu.org
Subject: [guile-lib] html->sxml does not decode entities in attributes
Date: Sat, 01 Feb 2025 21:10:04 +0100
Hello,

I think I found a bug in the htmlprag module in guile-lib.  When parsing
attributes, the values are not properly decoded:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,use (htmlprag)
scheme@(guile-user)> (html->sxml "<hr aaa=\"bbb&quot;ccc'ddd\" />")
$1 = (*TOP* (hr (@ (aaa "bbb&quot;ccc'ddd"))))
scheme@(guile-user)> (html->sxml "<a href=\"a&amp;b\" />")
$2 = (*TOP* (a (@ (href "a&amp;b"))))
--8<---------------cut here---------------end--------------->8---

I think that $1 should be "bbb\"ccc'ddd" and $2 should be "a&b".

The annoying part is that this cannot really be changed now, because
people (me included) already have workarounds in place, and
automatically decoding now would lead to double decoding.

I see few ways forward:

1. Document the current behavior and keep it as it is.
2. Add argument #:decode-attributes, defaulting to #f, to the relevant
   procedures, so that people can opt into the fixed behavior.
3. Introduce parameter %decode-attributes, so that people can opt into
   the fixed behavior.

I am sure there are also other approaches possible.

Have a nice day,
Tomas

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.




This bug report was last modified today.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.