GNU bug report logs - #32528
http-post breaks with XML response payload containing boundary

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: guile; Reported by: Ricardo Wurmus <rekado@HIDDEN>; dated Sat, 25 Aug 2018 08:50:02 UTC; Maintainer for guile is bug-guile@HIDDEN.

Message received at 32528 <at> debbugs.gnu.org:


Received: (at 32528) by debbugs.gnu.org; 25 Jun 2019 08:25:22 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jun 25 04:25:22 2019
Received: from localhost ([127.0.0.1]:58602 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1hfglW-0007Qe-4l
	for submit <at> debbugs.gnu.org; Tue, 25 Jun 2019 04:25:22 -0400
Received: from eggs.gnu.org ([209.51.188.92]:46141)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1hfglU-0007QP-MT
 for 32528 <at> debbugs.gnu.org; Tue, 25 Jun 2019 04:25:20 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:53326)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <ludo@HIDDEN>)
 id 1hfglP-00047B-Aa; Tue, 25 Jun 2019 04:25:15 -0400
Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=52782 helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <ludo@HIDDEN>)
 id 1hfglI-0002VV-9b; Tue, 25 Jun 2019 04:25:09 -0400
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: Mark H Weaver <mhw@HIDDEN>
Subject: Re: bug#32528: http-post breaks with XML response payload containing
 boundary
References: <874lfiltkg.fsf@HIDDEN> <87bm9mf9d9.fsf@HIDDEN>
 <875zztg8bw.fsf@HIDDEN>
Date: Tue, 25 Jun 2019 10:25:06 +0200
In-Reply-To: <875zztg8bw.fsf@HIDDEN> (Mark H. Weaver's message of "Tue, 28
 Aug 2018 23:28:19 -0400")
Message-ID: <87imsuqdgd.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 32528
Cc: Ricardo Wurmus <rekado@HIDDEN>, 32528 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hi Mark,

Mark H Weaver <mhw@HIDDEN> skribis:

>>From 6af35a3997887fe24620fc7448ded3649e04b82b Mon Sep 17 00:00:00 2001
> From: Mark H Weaver <mhw@HIDDEN>
> Date: Tue, 28 Aug 2018 23:15:36 -0400
> Subject: [PATCH 2/2] PRELIMINARY: web: Fix parsing of HTTP Content-Type
>  header.
>
> ---
>  module/web/http.scm | 109 +++++++++++++++++++++++++++++++++++---------
>  1 file changed, 88 insertions(+), 21 deletions(-)

This patch would be nice to have, if and when you can complete it.

Thanks,
Ludo=E2=80=99.




Information forwarded to bug-guile@HIDDEN:
bug#32528; Package guile. Full text available.

Message received at 32528 <at> debbugs.gnu.org:


Received: (at 32528) by debbugs.gnu.org; 29 Aug 2018 10:26:23 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Aug 29 06:26:23 2018
Received: from localhost ([127.0.0.1]:36414 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1fuxg6-0004sn-TS
	for submit <at> debbugs.gnu.org; Wed, 29 Aug 2018 06:26:23 -0400
Received: from sender-of-o51.zoho.com ([135.84.80.216]:21067)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <rekado@HIDDEN>) id 1fuxg4-0004se-J4
 for 32528 <at> debbugs.gnu.org; Wed, 29 Aug 2018 06:26:21 -0400
ARC-Seal: i=1; a=rsa-sha256; t=1535538367; cv=none; d=zoho.com; s=zohoarc; 
 b=Q8l+v2E0AJH8rByuFGQ/nWoR3+msqEEUIz4VuBvQq52ofjFRTcHi/YVDyGvLLfJwJLIIr0bQeiPoIZ9DOmdRfXlDHWJtOi4aeu7cJ/Sbm4TuAuswNxsm5f0Wepm9QQLG76hIibae7/5RjupxZW5mFVCfYP8qjkhH/R4sc6AlRZo=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc; t=1535538367;
 h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:MIME-Version:Message-ID:References:Subject:To:ARC-Authentication-Results;
 bh=d/oy29hO5eb7uZZ8UHequ9LCUglrW3H40wfG2B7DVn4=; 
 b=PFTaBz6Ld+rrFZLcF+39rFndlhZsfxIe6VzYO4+WDPPtynzv1ju8qYZ9+J/33e+q6oL1xMdt2OuZbhSPGjw0l0Y2rgDbgCG839DWgy1GO00CawYAMJ+gwE3i8/tXP6gNzcLJjbCBG4Rl3ut9AcWEXqotNcVZr1IZAF69FRFARpc=
ARC-Authentication-Results: i=1; mx.zoho.com; dkim=pass  header.i=elephly.net;
 spf=pass  smtp.mailfrom=rekado@HIDDEN;
 dmarc=pass header.from=<rekado@HIDDEN> header.from=<rekado@HIDDEN>
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1535538367; 
 s=zoho; d=elephly.net; i=rekado@HIDDEN;
 h=References:From:To:Cc:Subject:In-reply-to:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding;
 l=1262; bh=d/oy29hO5eb7uZZ8UHequ9LCUglrW3H40wfG2B7DVn4=;
 b=hhJo/spGSBw4Q4c48Hd6K23gWw0gDaxfFS8XelfO9t0mQEYQnMMpG8zUu6wuKziT
 nRHdP7YQiSe06wUAsuPLtU94ytvRoL8orxRp6nODh/rZNLf05l1eOVFFpfGdsJR7fPb
 uhXpEfNv4HGcidMsdQc2aFXOLsRKp19/POrlpwg8=
Received: from localhost (141.80.245.135 [141.80.245.135]) by mx.zohomail.com
 with SMTPS id 1535538365105897.2605372086007;
 Wed, 29 Aug 2018 03:26:05 -0700 (PDT)
References: <874lfiltkg.fsf@HIDDEN> <87bm9mf9d9.fsf@HIDDEN>
User-agent: mu4e 1.0; emacs 26.1
From: Ricardo Wurmus <rekado@HIDDEN>
To: Mark H Weaver <mhw@HIDDEN>
Subject: Re: bug#32528: http-post breaks with XML response payload containing
 boundary
In-reply-to: <87bm9mf9d9.fsf@HIDDEN>
X-URL: https://elephly.net
X-PGP-Key: https://elephly.net/rekado.pubkey
X-PGP-Fingerprint: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
Date: Wed, 29 Aug 2018 12:26:02 +0200
Message-ID: <87d0u1bhad.fsf@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-ZohoMailClient: External
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 32528
Cc: 32528 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)


Hi Mark,

> Ricardo Wurmus <rekado@HIDDEN> writes:
>
[=E2=80=A6]
>> The reason why it fails is that Guile processes the response and treats
>> the *payload* contained in the XML response as HTTP.
>
> No, this was a good guess, but it's not actually the problem.

You are right.  I also ended up trying with =E2=80=9Cwget --save-headers=E2=
=80=9D after
sending the bug report and noticed the offending header like you did:

>   Content-Type: multipart/related; type=3D"text/xml"; start=3D"<main_enve=
lope>"; boundary=3D"=3D-=3D-=3D"
>
>   <?xml [...]

I assumed it was part of the payload when it really was a regular
header after all.

> The problem is simply that our Content-Type header parser is broken.
> It's very simplistic and merely splits the string wherever ';' is found,
> and then checks to make sure there's only one '=3D' in each parameter,
> without taking into account that quoted strings in the parameters might
> include those characters.

Right.  I worked around this in guile-debbugs simply by replacing the
Content-Type header parser with one that lacks the check for the unique
=E2=80=9C=3D=E2=80=9D in the string part.

> I'll work on a proper parser for Content-Type headers.

Thanks!

--
Ricardo





Information forwarded to bug-guile@HIDDEN:
bug#32528; Package guile. Full text available.

Message received at 32528 <at> debbugs.gnu.org:


Received: (at 32528) by debbugs.gnu.org; 29 Aug 2018 03:30:04 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Aug 28 23:30:04 2018
Received: from localhost ([127.0.0.1]:36248 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1furBD-0002k4-60
	for submit <at> debbugs.gnu.org; Tue, 28 Aug 2018 23:30:04 -0400
Received: from world.peace.net ([64.112.178.59]:36746)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mhw@HIDDEN>) id 1furBA-0002j9-TR
 for 32528 <at> debbugs.gnu.org; Tue, 28 Aug 2018 23:30:01 -0400
Received: from mhw by world.peace.net with esmtpsa
 (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89)
 (envelope-from <mhw@HIDDEN>)
 id 1furB4-0007hg-7j; Tue, 28 Aug 2018 23:29:54 -0400
From: Mark H Weaver <mhw@HIDDEN>
To: Ricardo Wurmus <rekado@HIDDEN>
Subject: Re: bug#32528: http-post breaks with XML response payload containing
 boundary
References: <874lfiltkg.fsf@HIDDEN> <87bm9mf9d9.fsf@HIDDEN>
Date: Tue, 28 Aug 2018 23:28:19 -0400
In-Reply-To: <87bm9mf9d9.fsf@HIDDEN> (Mark H. Weaver's message of "Tue, 28
 Aug 2018 17:51:14 -0400")
Message-ID: <875zztg8bw.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 32528
Cc: 32528 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Mark H Weaver <mhw@HIDDEN> writes:

> Ricardo Wurmus <rekado@HIDDEN> writes:
>
>> I=E2=80=99m having a problem with http-post and I think it might be a bu=
g.  I=E2=80=99m
>> talking to a Debbugs SOAP service over HTTP by sending (via POST) an XML
>> request.  The Debbugs SOAP service responds with a string of XML.
[...]
> The problem is simply that our Content-Type header parser is broken.
> It's very simplistic and merely splits the string wherever ';' is found,
> and then checks to make sure there's only one '=3D' in each parameter,
> without taking into account that quoted strings in the parameters might
> include those characters.
>
> I'll work on a proper parser for Content-Type headers.

I've attached preliminary patches to fix the Content-Type header parser,
and also to fix the parsing of response header lines to support
continuation lines.

With these patches applied, I'm able to fetch and decode the SOAP
response that you fetched with your 'wget' example, as follows:

--8<---------------cut here---------------start------------->8---
mhw@jojen ~/guile-stable-2.2 [env]$ meta/guile
GNU Guile 2.2.4.10-4c91d
Copyright (C) 1995-2017 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guile-user)> (use-modules (web http) (web uri) (web client) (sxml s=
imple) (ice-9 receive))
scheme@(guile-user)> ,pp (let ((req-xml "<soap:Envelope xmlns:soap=3D\"http=
://schemas.xmlsoap.org/soap/envelope/\" xmlns:xsi=3D\"http://www.w3.org/199=
9/XMLSchema-instance\" xmlns:xsd=3D\"http://www.w3.org/1999/XMLSchema\" xml=
ns:soapenc=3D\"http://schemas.xmlsoap.org/soap/encoding/\" soapenc:encoding=
Style=3D\"http://schemas.xmlsoap.org/soap/encoding/\"><soap:Body><ns1:get_b=
ug_log xmlns:ns1=3D\"urn:Debbugs/SOAP\" soapenc:encodingStyle=3D\"http://sc=
hemas.xmlsoap.org/soap/encoding/\"><ns1:bugnumber xsi:type=3D\"xsd:int\">32=
514</ns1:bugnumber></ns1:get_bug_log></soap:Body></soap:Envelope>"))
                           (receive (response body-port)
                               (http-post "https://debbugs.gnu.org/cgi/soap=
.cgi"
                                          #:streaming? #t
                                          #:body req-xml
                                          #:headers
                                          `((content-type . (text/xml))
                                            (content-length . ,(string-leng=
th req-xml))))
                             (set-port-encoding! body-port "UTF-8")
                             (xml->sxml body-port #:trim-whitespace? #t)))
$1 =3D (*TOP* (*PI* xml "version=3D\"1.0\" encoding=3D\"UTF-8\"")
       (http://schemas.xmlsoap.org/soap/envelope/:Envelope
         (@ (http://schemas.xmlsoap.org/soap/envelope/:encodingStyle
              "http://schemas.xmlsoap.org/soap/encoding/"))
         (http://schemas.xmlsoap.org/soap/envelope/:Body
           (urn:Debbugs/SOAP:get_bug_logResponse
             (http://schemas.xmlsoap.org/soap/encoding/:Array
               (@ (http://www.w3.org/1999/XMLSchema-instance:type
                    "soapenc:Array")
                  (http://schemas.xmlsoap.org/soap/encoding/:arrayType
                    "xsd:ur-type[4]"))
               (urn:Debbugs/SOAP:item
                 (urn:Debbugs/SOAP:header
                   (@ (http://www.w3.org/1999/XMLSchema-instance:type
                        "xsd:string"))
                   "Received: (at submit) by debbugs.gnu.org; 23 Aug 2018 2=
0:17:46 +0000\nFrom debbugs-submit-bounces <at> debbugs.gnu.org [...]
[...]
--8<---------------cut here---------------end--------------->8---

Note that I needed to make two other changes to your preliminary code,
namely:

* I passed "#:streaming? #t" to 'http-post', to ask for a port to read
  the response body instead of reading it eagerly.

* I explicitly set the port encoding to "UTF-8" on that port before
  using 'xml->sxml' to read it.

Otherwise, the entire 'body' response will be returned as a bytevector,
because the response Content-Type is not recognized as a textual type.
The HTTP Content-Type is "multipart/related", with a parameter:
type=3D"text/xml".  I'm not sure if we should be automatically
interpreting that as a textual type or not.

There's no 'charset' parameter in the Content-Type header, but the XML
internally specifies: encoding=3D"UTF-8".

Anyway, here are the preliminary patches.

       Mark



--=-=-=
Content-Type: text/x-patch; charset=utf-8
Content-Disposition: inline;
 filename=0001-web-Add-support-for-HTTP-header-continuation-lines.patch
Content-Transfer-Encoding: quoted-printable
Content-Description: [PATCH 1/2] web: Add support for HTTP header
 continuation lines

From 41764d60dba80126b3c97f883d0225510b55f3fa Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@HIDDEN>
Date: Tue, 28 Aug 2018 18:39:34 -0400
Subject: [PATCH 1/2] web: Add support for HTTP header continuation lines.

* module/web/http.scm (spaces-and-tabs, space-or-tab?): New variables.
(read-header-line): After reading a header, if a space or tab follows,
then read the continuation lines and append them all together.
---
 module/web/http.scm | 31 ++++++++++++++++++++++++-------
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/module/web/http.scm b/module/web/http.scm
index de61c9495..15f173173 100644
--- a/module/web/http.scm
+++ b/module/web/http.scm
@@ -1,6 +1,6 @@
 ;;; HTTP messages
=20
-;; Copyright (C)  2010-2017 Free Software Foundation, Inc.
+;; Copyright (C)  2010-2018 Free Software Foundation, Inc.
=20
 ;; This library is free software; you can redistribute it and/or
 ;; modify it under the terms of the GNU Lesser General Public
@@ -152,18 +152,35 @@ The default writer will call =E2=80=98put-string=E2=
=80=99."
         (lambda (val port)
           (put-string port val)))))
=20
+(define spaces-and-tabs
+  (char-set #\space #\tab))
+
+(define (space-or-tab? c)
+  (case c
+    ((#\space #\tab) #t)
+    (else #f)))
+
 (define (read-header-line port)
-  "Read an HTTP header line and return it without its final CRLF or LF.
-Raise a 'bad-header' exception if the line does not end in CRLF or LF,
-or if EOF is reached."
+  "Read an HTTP header line, including any continuation lines, and
+return the combined string without its final CRLF or LF.  Raise a
+'bad-header' exception if the line does not end in CRLF or LF, or if EOF
+is reached."
   (match (%read-line port)
     (((? string? line) . #\newline)
      ;; '%read-line' does not consider #\return a delimiter; so if it's
      ;; there, remove it.  We are more tolerant than the RFC in that we
      ;; tolerate LF-only endings.
-     (if (string-suffix? "\r" line)
-         (string-drop-right line 1)
-         line))
+     (let ((line (if (string-suffix? "\r" line)
+                     (string-drop-right line 1)
+                     line)))
+       ;; If the next character is a space or tab, then there's at least
+       ;; one continuation line.  Read the continuation lines by calling
+       ;; 'read-header-line' recursively, and append them to this header
+       ;; line, folding the leading spaces and tabs to a single space.
+       (if (space-or-tab? (lookahead-char port))
+           (string-append line " " (string-trim (read-header-line port)
+                                                spaces-and-tabs))
+           line)))
     ((line . _)                                ;EOF or missing delimiter
      (bad-header 'read-header-line line))))
=20
--=20
2.18.0


--=-=-=
Content-Type: text/x-patch
Content-Disposition: inline;
 filename=0002-PRELIMINARY-web-Fix-parsing-of-HTTP-Content-Type-hea.patch
Content-Description: [PATCH 2/2] PRELIMINARY: web: Fix parsing of HTTP
 Content-Type header

From 6af35a3997887fe24620fc7448ded3649e04b82b Mon Sep 17 00:00:00 2001
From: Mark H Weaver <mhw@HIDDEN>
Date: Tue, 28 Aug 2018 23:15:36 -0400
Subject: [PATCH 2/2] PRELIMINARY: web: Fix parsing of HTTP Content-Type
 header.

---
 module/web/http.scm | 109 +++++++++++++++++++++++++++++++++++---------
 1 file changed, 88 insertions(+), 21 deletions(-)

diff --git a/module/web/http.scm b/module/web/http.scm
index 15f173173..6ccd853c1 100644
--- a/module/web/http.scm
+++ b/module/web/http.scm
@@ -290,16 +290,94 @@ as an ordered alist."
 (define (write-opaque-string val port)
   (put-string port val))
 
-(define separators-without-slash
-  (string->char-set "[^][()<>@,;:\\\"?= \t]"))
-(define (validate-media-type str)
-  (let ((idx (string-index str #\/)))
-    (and idx (= idx (string-rindex str #\/))
-         (not (string-index str separators-without-slash)))))
+(define separators
+  (string->char-set "()<>@,;:\\\"/[]?={} \t"))
+
+(define (ascii-char? c)
+  (char-set-contains? char-set:ascii c))
+
+(define valid-token-chars
+  (char-set-difference char-set:ascii
+                       char-set:iso-control
+                       separators))
+
+(define (valid-token? str)
+  (and (not (string-null? str))
+       (string-every valid-token-chars str)))
+
+(define (string-skip* s pred i)
+  (or (string-skip s pred i)
+      (string-length s)))
+
+(define (parse-token str i)
+  (let* ((i   (string-skip* str spaces-and-tabs i))
+         (end (string-skip* str valid-token-chars i)))
+    (and (< i end)
+         (cons end (substring str i end)))))
+
+(define valid-text-chars
+  (char-set-adjoin (char-set-difference (ucs-range->char-set 0 256)
+                                        char-set:iso-control)
+                   #\space #\tab))
+
+(define (text-char? c)
+  (char-set-contains? valid-text-chars c))
+
+(define (parse-quoted-string str i)
+  (let ((len (string-length str))
+        (i   (string-skip* str spaces-and-tabs i)))
+    (and (< i len)
+         (eqv? #\" (string-ref str i))
+         (let loop ((i (+ i 1))
+                    (accum '()))
+           (and (< i len)
+                (match (string-ref str i)
+                  (#\" (cons (+ i 1) (reverse-list->string accum)))
+                  (#\\ (and (< (+ i 1) len)
+                            (let ((c (string-ref str (+ i 1))))
+                              (and (ascii-char? c)
+                                   (loop (+ i 2) (cons c accum))))))
+                  (c   (and (text-char? c)
+                            (loop (+ i 1) (cons c accum))))))))))
+
+(define (parse-parameter str i)
+  (let* ((eq (string-index str #\= i))
+         (attribute (string-trim-both (substring str i eq)
+                                      spaces-and-tabs)))
+    (and (valid-token? attribute)
+         (match (or (parse-token         str (+ eq 1))
+                    (parse-quoted-string str (+ eq 1)))
+           ((i . val) (cons i (cons (string->symbol attribute) val)))
+           (#f        #f)))))
+
+(define (parse-parameter-list str i)
+  (let ((len (string-length str))
+        (i   (string-skip* str spaces-and-tabs i)))
+    (if (= i len)
+        '()
+        (and (< i len)
+             (eqv? #\; (string-ref str i))
+             (match (parse-parameter str (+ i 1))
+               (#f      #f)
+               ((i . p) (match (parse-parameter-list str i)
+                          (#f  #f)
+                          (lst (cons p lst)))))))))
+
 (define (parse-media-type str)
-  (unless (validate-media-type str)
-    (bad-header-component 'media-type str))
-  (string->symbol str))
+  (let* ((i (or (string-index str #\;)
+                (string-length str)))
+         (params (parse-parameter-list str i)))
+    (or (match (string-split (substring str 0 i) #\/)
+          ((type* subtype*)
+           (let ((type    (string-trim-both type*    spaces-and-tabs))
+                 (subtype (string-trim-both subtype* spaces-and-tabs)))
+             (and (valid-token? type)
+                  (valid-token? subtype)
+                  params
+                  (cons (string->symbol (string-append type "/" subtype))
+                        params))))
+          (_ #f))
+        (bad-header 'content-type str))))
 
 (define* (skip-whitespace str #:optional (start 0) (end (string-length str)))
   (let lp ((i start))
@@ -1617,18 +1695,7 @@ treated specially, and is just returned as a plain string."
 ;; Content-Type = media-type
 ;;
 (declare-header! "Content-Type"
-  (lambda (str)
-    (let ((parts (string-split str #\;)))
-      (cons (parse-media-type (car parts))
-            (map (lambda (x)
-                   (let ((eq (string-index x #\=)))
-                     (unless (and eq (= eq (string-rindex x #\=)))
-                       (bad-header 'content-type str))
-                     (cons
-                      (string->symbol
-                       (string-trim x char-set:whitespace 0 eq))
-                      (string-trim-right x char-set:whitespace (1+ eq)))))
-                 (cdr parts)))))
+  parse-media-type
   (lambda (val)
     (match val
       (((? symbol?) ((? symbol?) . (? string?)) ...) #t)
-- 
2.18.0


--=-=-=--




Information forwarded to bug-guile@HIDDEN:
bug#32528; Package guile. Full text available.

Message received at 32528 <at> debbugs.gnu.org:


Received: (at 32528) by debbugs.gnu.org; 28 Aug 2018 21:52:56 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Aug 28 17:52:56 2018
Received: from localhost ([127.0.0.1]:36107 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1fulux-0005cM-Sv
	for submit <at> debbugs.gnu.org; Tue, 28 Aug 2018 17:52:56 -0400
Received: from world.peace.net ([64.112.178.59]:36358)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mhw@HIDDEN>) id 1fuluw-0005cA-Nw
 for 32528 <at> debbugs.gnu.org; Tue, 28 Aug 2018 17:52:54 -0400
Received: from mhw by world.peace.net with esmtpsa
 (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.89)
 (envelope-from <mhw@HIDDEN>)
 id 1fuluq-0006CE-SR; Tue, 28 Aug 2018 17:52:48 -0400
From: Mark H Weaver <mhw@HIDDEN>
To: Ricardo Wurmus <rekado@HIDDEN>
Subject: Re: bug#32528: http-post breaks with XML response payload containing
 boundary
References: <874lfiltkg.fsf@HIDDEN>
Date: Tue, 28 Aug 2018 17:51:14 -0400
In-Reply-To: <874lfiltkg.fsf@HIDDEN> (Ricardo Wurmus's message of "Sat,
 25 Aug 2018 10:49:19 +0200")
Message-ID: <87bm9mf9d9.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 32528
Cc: 32528 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

Ricardo Wurmus <rekado@HIDDEN> writes:

> I=E2=80=99m having a problem with http-post and I think it might be a bug=
.  I=E2=80=99m
> talking to a Debbugs SOAP service over HTTP by sending (via POST) an XML
> request.  The Debbugs SOAP service responds with a string of XML.
>
> Here=E2=80=99s a simplified version of what I do:
>
>   (use-module (web http))
>   (let ((req-xml "<soap:Envelope xmlns:soap...>"))
>     (receive (response body)
>         (http-post uri
>                    #:body req-xml
>                    #:headers
>                    `((content-type . (text/xml))
>                      (content-length . ,(string-length req-xml))))
>      ;; Do something with the response body
>      (xml->sxml body #:trim-whitespace? #t)))
>
> This fails for some requests with an error like this:
>
>     web/http.scm:1609:23: Bad Content-Type header: multipart/related; typ=
e=3D"text/xml"; start=3D"<main_envelope>"; boundary=3D"=3D-=3D-=3D"

[...]

> The reason why it fails is that Guile processes the response and treats
> the *payload* contained in the XML response as HTTP.

No, this was a good guess, but it's not actually the problem.

If you add --save-headers to the wget command line, you'll see the full
response, and the HTTP headers are what's being parsed, as it should be.
It looks like this (except that I removed the carriage returns below):

  HTTP/1.1 200 OK
  Date: Tue, 28 Aug 2018 21:40:30 GMT
  Server: Apache
  SOAPServer: SOAP::Lite/Perl/1.11
  Strict-Transport-Security: max-age=3D63072000
  Content-Length: 32650
  X-Content-Type-Options: nosniff
  X-Frame-Options: sameorigin
  X-XSS-Protection: 1; mode=3Dblock
  Keep-Alive: timeout=3D5, max=3D100
  Connection: Keep-Alive
  Content-Type: multipart/related; type=3D"text/xml"; start=3D"<main_envelo=
pe>"; boundary=3D"=3D-=3D-=3D"
=20=20
  <?xml [...]

The problem is simply that our Content-Type header parser is broken.
It's very simplistic and merely splits the string wherever ';' is found,
and then checks to make sure there's only one '=3D' in each parameter,
without taking into account that quoted strings in the parameters might
include those characters.

I'll work on a proper parser for Content-Type headers.

      Thanks,
        Mark




Information forwarded to bug-guile@HIDDEN:
bug#32528; Package guile. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 25 Aug 2018 08:49:51 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Aug 25 04:49:51 2018
Received: from localhost ([127.0.0.1]:60207 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ftUGV-0001RY-0O
	for submit <at> debbugs.gnu.org; Sat, 25 Aug 2018 04:49:51 -0400
Received: from eggs.gnu.org ([208.118.235.92]:49790)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <rekado@HIDDEN>) id 1ftUGR-0001RK-H0
 for submit <at> debbugs.gnu.org; Sat, 25 Aug 2018 04:49:47 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <rekado@HIDDEN>) id 1ftUGL-0004UE-9F
 for submit <at> debbugs.gnu.org; Sat, 25 Aug 2018 04:49:42 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_05,T_DKIM_INVALID
 autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:50613)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <rekado@HIDDEN>) id 1ftUGL-0004U8-3p
 for submit <at> debbugs.gnu.org; Sat, 25 Aug 2018 04:49:41 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:42538)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <rekado@HIDDEN>) id 1ftUGK-0007uk-6u
 for bug-guile@HIDDEN; Sat, 25 Aug 2018 04:49:41 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <rekado@HIDDEN>) id 1ftUGF-0004Rz-6k
 for bug-guile@HIDDEN; Sat, 25 Aug 2018 04:49:40 -0400
Received: from sender-of-o51.zoho.com ([135.84.80.216]:21119)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <rekado@HIDDEN>) id 1ftUGF-0004RC-04
 for bug-guile@HIDDEN; Sat, 25 Aug 2018 04:49:35 -0400
ARC-Seal: i=1; a=rsa-sha256; t=1535186969; cv=none; d=zoho.com; s=zohoarc; 
 b=RFEP/m/ZQGvVV1jmNt558Z/vkLUHKIgj1yiIs5BjOK5Waz90ch95CKAGv2arnO3IuPsL1gKa8RJ8g0XGy1zYvhP5rR8GdPYpHwx4IeZJnZDKKc8owMYK+nM3Q2YWw2FOLSfe3nIOPwDIXFa8pwuV7Mvszm7V+a/twjVEBORk3BA=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com;
 s=zohoarc; t=1535186969;
 h=Content-Type:Content-Transfer-Encoding:Date:From:MIME-Version:Message-ID:Subject:To:ARC-Authentication-Results;
 bh=Jc2SHT+nH5xE1il58QRMdxq6j1BzNzin6XinBuwuDKQ=; 
 b=H204vR3DZHFRAub4mpayOkLwGkAIVBqeBgULOT3YVZoyInvbVVlsVVlPdw4FAG8kMZavScK5WlZHzHQMh0qa57BIuRtymljzSt0cdMhj8s3q1vyUYpMEHeGl6fHNLb/T9a2cnNEpxnQS/EOGksbjjdEg/3tVTazIPmYmWqsVtCM=
ARC-Authentication-Results: i=1; mx.zoho.com; dkim=pass  header.i=elephly.net;
 spf=pass  smtp.mailfrom=rekado@HIDDEN;
 dmarc=pass header.from=<rekado@HIDDEN> header.from=<rekado@HIDDEN>
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1535186969; 
 s=zoho; d=elephly.net; i=rekado@HIDDEN;
 h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding;
 l=3252; bh=Jc2SHT+nH5xE1il58QRMdxq6j1BzNzin6XinBuwuDKQ=;
 b=SW8lXTsTOkiJSj6SnMfAnJiouaPaDwSHxl0QudA22SrL9r8hvg3pxtQceBK+phE9
 o/kzSylLyJS52v5H5KAjhl7hQn3tkQHCsY0N0Z64LFKOwPhhrwKP8Rp7fak0ocLaevU
 5gPVYsMgIVlEaf90J2GOGbRA8m8LbSuc9ts8aBi0=
Received: from localhost (port-92-200-39-85.dynamic.qsc.de [92.200.39.85]) by
 mx.zohomail.com with SMTPS id 1535186968136390.8047275342726;
 Sat, 25 Aug 2018 01:49:28 -0700 (PDT)
User-agent: mu4e 1.0; emacs 26.1
From: Ricardo Wurmus <rekado@HIDDEN>
To: bug-guile@HIDDEN
Subject: http-post breaks with XML response payload containing boundary
X-URL: https://elephly.net
X-PGP-Key: https://elephly.net/rekado.pubkey
X-PGP-Fingerprint: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
Date: Sat, 25 Aug 2018 10:49:19 +0200
Message-ID: <874lfiltkg.fsf@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-ZohoMailClient: External
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
 [fuzzy]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.0 (----)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.0 (-----)

Hi Guilers,

I=E2=80=99m having a problem with http-post and I think it might be a bug. =
 I=E2=80=99m
talking to a Debbugs SOAP service over HTTP by sending (via POST) an XML
request.  The Debbugs SOAP service responds with a string of XML.

Here=E2=80=99s a simplified version of what I do:

  (use-module (web http))
  (let ((req-xml "<soap:Envelope xmlns:soap...>"))
    (receive (response body)
        (http-post uri
                   #:body req-xml
                   #:headers
                   `((content-type . (text/xml))
                     (content-length . ,(string-length req-xml))))
     ;; Do something with the response body
     (xml->sxml body #:trim-whitespace? #t)))

This fails for some requests with an error like this:

    web/http.scm:1609:23: Bad Content-Type header: multipart/related; type=
=3D"text/xml"; start=3D"<main_envelope>"; boundary=3D"=3D-=3D-=3D"

Here=E2=80=99s a backtrace:

--8<---------------cut here---------------start------------->8---
In debbugs/soap.scm:
    101:8  9 (soap-invoke "https://debbugs.gnu.org/cgi/soap.cgi" _ . _)
In web/client.scm:
   386:24  8 (http-request _ #:body _ #:port _ #:method _ #:version _ #:kee=
p-alive? _ #:headers _ #:decode-body? _ #:streaming? _ #:request _)
In web/response.scm:
   200:48  7 (read-response #<input-output: string 2db5690>)
In web/http.scm:
   225:33  6 (read-headers #<input-output: string 2db5690>)
   195:11  5 (read-header #<input-output: string 2db5690>)
  1606:12  4 (_ "multipart/related; type=3D\"text/xml\"; start=3D\"<main_en=
velope>\"; boundary=3D\"=3D-=3D-=3D\"")
In ice-9/boot-9.scm:
   222:29  3 (map1 (" type=3D\"text/xml\"" " start=3D\"<main_envelope>\"" "=
 boundary=3D\"=3D-=3D-=3D\""))
   222:29  2 (map1 (" start=3D\"<main_envelope>\"" " boundary=3D\"=3D-=3D-=
=3D\""))
   222:17  1 (map1 (" boundary=3D\"=3D-=3D-=3D\""))
In web/http.scm:
  1609:23  0 (_ " boundary=3D\"=3D-=3D-=3D\"")
--8<---------------cut here---------------end--------------->8---

The reason why it fails is that Guile processes the response and treats
the *payload* contained in the XML response as HTTP.  In this case it
processes the response and stumbles upon a multipart email that contains
a Content-type header specifying a boundary string.

The Content-type handler in (web http) doesn=E2=80=99t like that the bounda=
ry
string contains =E2=80=9C=3D=E2=80=9D and aborts.

The point is, though, that it shouldn=E2=80=99t even try to parse the paylo=
ad of
the XML response.  If you want to see the full XML response you can use
wget:

    wget --post-data=3D"<soap:Envelope xmlns:soap=3D\"http://schemas.xmlsoa=
p.org/soap/envelope/\" xmlns:xsi=3D\"http://www.w3.org/1999/XMLSchema-insta=
nce\" xmlns:xsd=3D\"http://www.w3.org/1999/XMLSchema\" xmlns:soapenc=3D\"ht=
tp://schemas.xmlsoap.org/soap/encoding/\" soapenc:encodingStyle=3D\"http://=
schemas.xmlsoap.org/soap/encoding/\"><soap:Body><ns1:get_bug_log xmlns:ns1=
=3D\"urn:Debbugs/SOAP\" soapenc:encodingStyle=3D\"http://schemas.xmlsoap.or=
g/soap/encoding/\"><ns1:bugnumber xsi:type=3D\"xsd:int\">32514</ns1:bugnumb=
er></ns1:get_bug_log></soap:Body></soap:Envelope>" --header "Content-type: =
text/xml" -qO - "https://debbugs.gnu.org/cgi/soap.cgi"

Is this a problem with Guile when a response with Content-type text/xml
is received?

--
Ricardo





Acknowledgement sent to Ricardo Wurmus <rekado@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-guile@HIDDEN. Full text available.
Report forwarded to bug-guile@HIDDEN:
bug#32528; Package guile. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.