X-Loop: help-debbugs@HIDDEN
Subject: bug#38269: SSAX incorrect handling of > in CDATA
Resent-From: Andrew Gierth <andrew@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guile@HIDDEN
Resent-Date: Tue, 19 Nov 2019 14:50:01 +0000
Resent-Message-ID: <handler.38269.B.157417499422816 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 38269
X-GNU-PR-Package: guile
X-GNU-PR-Keywords:
To: 38269 <at> debbugs.gnu.org
X-Debbugs-Original-To: bug-guile@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.157417499422816
(code B ref -1); Tue, 19 Nov 2019 14:50:01 +0000
Received: (at submit) by debbugs.gnu.org; 19 Nov 2019 14:49:54 +0000
Received: from localhost ([127.0.0.1]:46838 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1iX4pG-0005vv-0N
for submit <at> debbugs.gnu.org; Tue, 19 Nov 2019 09:49:54 -0500
Received: from lists.gnu.org ([209.51.188.17]:53971)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <andrew@HIDDEN>) id 1iX3lk-0002OB-P0
for submit <at> debbugs.gnu.org; Tue, 19 Nov 2019 08:42:13 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:37959)
by lists.gnu.org with esmtp (Exim 4.90_1)
(envelope-from <andrew@HIDDEN>) id 1iX3lj-0004sy-6w
for bug-guile@HIDDEN; Tue, 19 Nov 2019 08:42:12 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level:
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled
version=3.3.2
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from <andrew@HIDDEN>) id 1iX3li-0008Vp-54
for bug-guile@HIDDEN; Tue, 19 Nov 2019 08:42:11 -0500
Received: from lungold.riddles.org.uk ([82.68.208.19]:57560)
by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
(Exim 4.71) (envelope-from <andrew@HIDDEN>)
id 1iX3lh-0008R4-TM
for bug-guile@HIDDEN; Tue, 19 Nov 2019 08:42:10 -0500
Received: from [192.168.127.1] (port=38258 helo=caithnard.riddles.org.uk)
by lungold.riddles.org.uk with esmtp (Exim 4.92.3 (FreeBSD))
(envelope-from <andrew@HIDDEN>) id 1iX3lT-0006Pt-2v
for bug-guile@HIDDEN; Tue, 19 Nov 2019 13:41:55 +0000
Received: from localhost ([127.0.0.1]:23006 helo=caithnard.riddles.org.uk)
by caithnard.riddles.org.uk with esmtp (Exim 4.92.3 (FreeBSD))
(envelope-from <andrew@HIDDEN>) id 1iX3lS-000286-Qa
for bug-guile@HIDDEN; Tue, 19 Nov 2019 13:41:54 +0000
From: Andrew Gierth <andrew@HIDDEN>
Message-ID: <87zhgsyost.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (berkeley-unix)
Date: Tue, 19 Nov 2019 13:41:54 +0000
MIME-Version: 1.0
Content-Type: text/plain
X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x [fuzzy]
X-Received-From: 82.68.208.19
X-Spam-Score: -2.3 (--)
X-Mailman-Approved-At: Tue, 19 Nov 2019 09:49:52 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)
The bug:
> (xml->sxml "<e><![CDATA[>]]></e>")
$2 = (*TOP* (e ">"))
The expected result is (*TOP* (e ">")).
In upstream/SSAX.scm:
; procedure+: ssax:read-cdata-body PORT STR-HANDLER SEED
[...]
; Within a CDATA section all characters are taken at their face value,
; with only three exceptions:
[..]
; > is treated as an embedded #\> character
This handling of > is contrary to the XML specification, in which
there are no special character sequences inside CDATA except newline and
the "]]>" closing tag. I have confirmed this by checking other XML
parsers. The code seems to be based on a wild misreading of another
section of the specification that does not apply here. (And
unfortunately, the W3C validation suite for XML happens not to contain
any instances of > inside CDATA.)
I believe the fix should be as simple as removing the entire (#\&) case
from the function (and fixing the test cases).
This bug seems to exist in all versions of SSAX.
--
Andrew.
Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) Content-Type: text/plain; charset=utf-8 X-Loop: help-debbugs@HIDDEN From: help-debbugs@HIDDEN (GNU bug Tracking System) To: Andrew Gierth <andrew@HIDDEN> Subject: bug#38269: Acknowledgement (SSAX incorrect handling of > in CDATA) Message-ID: <handler.38269.B.157417499422816.ack <at> debbugs.gnu.org> References: <87zhgsyost.fsf@HIDDEN> X-Gnu-PR-Message: ack 38269 X-Gnu-PR-Package: guile Reply-To: 38269 <at> debbugs.gnu.org Date: Tue, 19 Nov 2019 14:50:01 +0000 Thank you for filing a new bug report with debbugs.gnu.org. This is an automatically generated reply to let you know your message has been received. Your message is being forwarded to the package maintainers and other interested parties for their attention; they will reply in due course. Your message has been sent to the package maintainer(s): bug-guile@HIDDEN If you wish to submit further information on this problem, please send it to 38269 <at> debbugs.gnu.org. Please do not send mail to help-debbugs@HIDDEN unless you wish to report a problem with the Bug-tracking system. --=20 38269: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D38269 GNU Bug Tracking System Contact help-debbugs@HIDDEN with problems
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.