GNU bug report logs - #63941
[PATCH] ; always CRLF before non-first boundary in multipart form

Previous Next

Package: emacs;

Reported by: ozzloy <ozzloy <at> gmail.com>

Date: Wed, 7 Jun 2023 06:53:02 UTC

Severity: normal

Tags: patch

To reply to this bug, email your comments to 63941 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Wed, 07 Jun 2023 06:53:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to ozzloy <ozzloy <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 07 Jun 2023 06:53:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: ozzloy <ozzloy <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: [PATCH] ; always CRLF before non-first boundary in multipart form
Date: Tue, 6 Jun 2023 22:25:40 -0700
[Message part 1 (text/plain, inline)]
When I POST a file ending with a newline using EWW, the final newline
gets chomped off of the content.  This is because
mm-url-encode-multipart-form-data inserts CRLF unless it is at the
beginning of a line.  For file uploads, this behavior is incorrect.

To reproduce,

0. Upload a file ending in "\n" using EWW.

1. Observe the POST does not have CRLF between the file content and the
boundary.

Refer to https://www.rfc-editor.org/rfc/rfc2046#section-5.1.1 for
details.

I have not tested other kinds of html form posts.  I am not familiar
with what should be tested.  I did include tests for this specific bug
though.


In GNU Emacs 29.0.91 (build 1, x86_64-pc-linux-gnu, GTK+ Version
 3.24.33, cairo version 1.16.0) of 2023-06-06 built on trent-reznor
Repository revision: b5eb43ba289519704c6cb0fe456038dcaec172c3
Repository branch: CRLF-before-noninitial-boundary
Windowing system distributor 'The X.Org Foundation', version 11.0.12101004
System Description: Ubuntu 22.04.2 LTS

Configured features:
CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG
LIBSELINUX LIBXML2 MODULES NOTIFY INOTIFY PDUMPER PNG SECCOMP SOUND
THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM XINPUT2 XPM GTK3 ZLIB

Important settings:
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix

Major mode: Fundamental

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  blink-cursor-mode: t
  buffer-read-only: t
  line-number-mode: t
  indent-tabs-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message mailcap yank-media puny dired
dired-loaddefs rfc822 mml mml-sec password-cache epa derived epg rfc6068
epg-config gnus-util text-property-search time-date subr-x mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
cl-loaddefs cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util
mail-prsvr mail-utils rmc iso-transl tooltip cconv eldoc paren electric
uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel
term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu
timer select scroll-bar mouse jit-lock font-lock syntax font-core
term/tty-colors frame minibuffer nadvice seq simple cl-generic
indonesian philippine cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
composite emoji-zwj charscript charprop case-table epa-hook
jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs
theme-loaddefs faces cus-face macroexp files window text-properties
overlay sha1 md5 base64 format env code-pages mule custom widget keymap
hashtable-print-readable backquote threads dbusbind inotify
dynamic-setting system-font-setting font-render-setting cairo
move-toolbar gtk x-toolkit xinput2 x multi-tty make-network-process
emacs)

Memory information:
((conses 16 36910 11079)
 (symbols 48 5128 0)
 (strings 32 13160 1100)
 (string-bytes 1 372301)
 (vectors 16 9331)
 (vector-slots 8 149255 18597)
 (floats 8 33 21)
 (intervals 56 326 0)
 (buffers 976 11))
[Message part 2 (text/html, inline)]
[0001-always-CRLF-before-non-first-boundary-in-multipart-f.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Wed, 07 Jun 2023 12:32:01 GMT) Full text and rfc822 format available.

Message #8 received at 63941 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: ozzloy <ozzloy <at> gmail.com>
Cc: 63941 <at> debbugs.gnu.org
Subject: Re: bug#63941: [PATCH] ;
 always CRLF before non-first boundary in multipart form
Date: Wed, 07 Jun 2023 15:30:59 +0300
> From: ozzloy <ozzloy <at> gmail.com>
> Date: Tue, 6 Jun 2023 22:25:40 -0700
> 
> When I POST a file ending with a newline using EWW, the final newline
> gets chomped off of the content.  This is because
> mm-url-encode-multipart-form-data inserts CRLF unless it is at the
> beginning of a line.  For file uploads, this behavior is incorrect.
> 
> To reproduce,
> 
> 0. Upload a file ending in "\n" using EWW.
> 
> 1. Observe the POST does not have CRLF between the file content and the
> boundary.

Thanks, but could you please describe in detail what should be done in
step 0? how to "post a file using EWW"?

We need the details to be sure we reproduce the same problem as you
do, and also to test the solution and any alternatives.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Thu, 08 Jun 2023 04:31:05 GMT) Full text and rfc822 format available.

Message #11 received at 63941 <at> debbugs.gnu.org (full text, mbox):

From: ozzloy <ozzloy <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63941 <at> debbugs.gnu.org
Subject: Re: bug#63941: [PATCH] ; always CRLF before non-first boundary in
 multipart form
Date: Wed, 7 Jun 2023 19:48:29 -0700
[Message part 1 (text/plain, inline)]
> please describe in detail what should be done

Sure!  I'm not sure what would be too much, and what would be too little
explanation.

The short version is to run the following code,
#+begin_src elisp
(require 'mm-url)
(let ((data '(("file"
              ("filedata" . "file content\n")
              ("name"     . "file")
              ("filename" . "filename"))))
      (boundary "BOUNDARY"))
  (mm-url-encode-multipart-form-data data boundary))
#+end_src

#+RESULTS:
: --BOUNDARY^M
: Content-Disposition: form-data; name="file"; filename="filename"^M
: Content-Transfer-Encoding: binary^M
: Content-Type: text/plain^M
: ^M
: file content
: --BOUNDARY--^M
(I've replaced carriage returns with literal '^' and 'M' to avoid email
mangling)

and observe the absence of CRLF between the file content and the boundary.
From https://www.rfc-editor.org/rfc/rfc2046#section-5.1.1 this description,
it seems like there should be.

Here's a longer set of steps.

0. run http server
#+begin_src bash
  git clone https://git.sr.ht/~ozzloy/emacs-bug-63941
  cd emacs-bug-63941
  git checkout reproduce-bug-63941
  ./server.py
#+end_src

1. Then use EWW to browse to localhost:8085 and upload the file =filename=.

Here's my result when doing that, first with EWW.
#+begin_quote
upload_content = b'file content', name = 'filename', size = 12
127.0.0.1 - - [07/Jun/2023 18:55:03] "POST / HTTP/1.1" 200 -
#+end_quote

The bug is that the file content no longer has the final "\n".

2. To confirm, in a separate shell, I posted using curl.
#+begin_src bash
  curl -F "file=@filename;filename=filename" localhost:8085
#+end_src

Here's the output I got
#+begin_quote
upload_content = b'file content\n', name = 'filename', size = 13
127.0.0.1 - - [07/Jun/2023 18:57:12] "POST / HTTP/1.1" 200 -
#+end_quote

The file's content ends with "\n", which is the expected behavior.

This is the second HTTP file upload server I have used and seen this
behavior. The same thing happened for one I wrote in clojure using
Aleph. That one is here https://git.sr.ht/~ozzloy/fupload but I made the
python one because it's probably less set up for other people to
reproduce it.
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Thu, 08 Jun 2023 06:10:02 GMT) Full text and rfc822 format available.

Message #14 received at 63941 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: ozzloy <ozzloy <at> gmail.com>
Cc: 63941 <at> debbugs.gnu.org
Subject: Re: bug#63941: [PATCH] ; always CRLF before non-first boundary in
 multipart form
Date: Thu, 08 Jun 2023 09:09:12 +0300
> From: ozzloy <ozzloy <at> gmail.com>
> Date: Wed, 7 Jun 2023 19:48:29 -0700
> Cc: 63941 <at> debbugs.gnu.org
> 
> 1. Then use EWW to browse to localhost:8085 and upload the file =filename=.

This part is exactly what I'm asking about: how do I "upload the file"
using EWW?  Can you please show what I should type to do that?

> Here's my result when doing that, first with EWW.
> #+begin_quote
> upload_content = b'file content', name = 'filename', size = 12
> 127.0.0.1 - - [07/Jun/2023 18:55:03] "POST / HTTP/1.1" 200 -
> #+end_quote

Where does one see this result?  Is it in some Emacs buffer?  In that
case, what is the name of that buffer?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Thu, 08 Jun 2023 06:44:02 GMT) Full text and rfc822 format available.

Message #17 received at 63941 <at> debbugs.gnu.org (full text, mbox):

From: ozzloy <ozzloy <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63941 <at> debbugs.gnu.org
Subject: Re: bug#63941: [PATCH] ; always CRLF before non-first boundary in
 multipart form
Date: Wed, 7 Jun 2023 23:43:22 -0700
[Message part 1 (text/plain, inline)]
Thanks for clarifying!

> This part is exactly what I'm asking about: how do I "upload the file"
> using EWW?  Can you please show what I should type to do that?

0. In a terminal, do the following
#+begin_src bash
  git clone https://git.sr.ht/~ozzloy/emacs-bug-63941
  cd emacs-bug-63941
  git checkout reproduce-bug-63941
  ./server.py
#+end_src

1. In an emacs window, do the following

   M-x eww<Enter>
   localhost:8085<Enter>

   Once EWW opens localhost:8085, it will display the web page with a button
   labeled "Browse". Click that button.

2. It will then ask you to choose a file. Choose the file
   =.../emacs-bug-63941/filename= in the same directory where =server.py=
is.

3. Hit <tab> to go to the "Submit" button, and hit <Enter> to upload that
file.


> Where does one see this result?  Is it in some Emacs buffer?  In that
> case, what is the name of that buffer?

The output will be in the terminal where =./server.py= was run.  I ran it
in an
emacs shell buffer named =$emacs-bug-63941=.


On Wed, Jun 7, 2023 at 11:09 PM Eli Zaretskii <eliz <at> gnu.org> wrote:

> > From: ozzloy <ozzloy <at> gmail.com>
> > Date: Wed, 7 Jun 2023 19:48:29 -0700
> > Cc: 63941 <at> debbugs.gnu.org
> >
> > 1. Then use EWW to browse to localhost:8085 and upload the file
> =filename=.
>
> This part is exactly what I'm asking about: how do I "upload the file"
> using EWW?  Can you please show what I should type to do that?
>
> > Here's my result when doing that, first with EWW.
> > #+begin_quote
> > upload_content = b'file content', name = 'filename', size = 12
> > 127.0.0.1 - - [07/Jun/2023 18:55:03] "POST / HTTP/1.1" 200 -
> > #+end_quote
>
> Where does one see this result?  Is it in some Emacs buffer?  In that
> case, what is the name of that buffer?
>
> Thanks.
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Thu, 08 Jun 2023 06:54:01 GMT) Full text and rfc822 format available.

Message #20 received at 63941 <at> debbugs.gnu.org (full text, mbox):

From: ozzloy <ozzloy <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63941 <at> debbugs.gnu.org
Subject: Re: bug#63941: [PATCH] ; always CRLF before non-first boundary in
 multipart form
Date: Wed, 7 Jun 2023 23:52:31 -0700
[Message part 1 (text/plain, inline)]
Quick fix.  In the last email, I said
>  Once EWW opens localhost:8085, it will display the web page with a
>  button labeled "Browse". Click that button.

My bad.  Clicking does not work for me.  What I actually do is put the
cursor on the button and hit <Enter>.  Then it will bring up the file


On Wed, Jun 7, 2023 at 11:43 PM ozzloy <ozzloy <at> gmail.com> wrote:

>
> Thanks for clarifying!
>
> > This part is exactly what I'm asking about: how do I "upload the file"
> > using EWW?  Can you please show what I should type to do that?
>
> 0. In a terminal, do the following
> #+begin_src bash
>   git clone https://git.sr.ht/~ozzloy/emacs-bug-63941
>   cd emacs-bug-63941
>   git checkout reproduce-bug-63941
>   ./server.py
> #+end_src
>
> 1. In an emacs window, do the following
>
>    M-x eww<Enter>
>    localhost:8085<Enter>
>
>    Once EWW opens localhost:8085, it will display the web page with a
> button
>    labeled "Browse". Click that button.
>
> 2. It will then ask you to choose a file. Choose the file
>    =.../emacs-bug-63941/filename= in the same directory where =server.py=
> is.
>
> 3. Hit <tab> to go to the "Submit" button, and hit <Enter> to upload that
> file.
>
>
> > Where does one see this result?  Is it in some Emacs buffer?  In that
> > case, what is the name of that buffer?
>
> The output will be in the terminal where =./server.py= was run.  I ran it
> in an
> emacs shell buffer named =$emacs-bug-63941=.
>
>
> On Wed, Jun 7, 2023 at 11:09 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
>
>> > From: ozzloy <ozzloy <at> gmail.com>
>> > Date: Wed, 7 Jun 2023 19:48:29 -0700
>> > Cc: 63941 <at> debbugs.gnu.org
>> >
>> > 1. Then use EWW to browse to localhost:8085 and upload the file
>> =filename=.
>>
>> This part is exactly what I'm asking about: how do I "upload the file"
>> using EWW?  Can you please show what I should type to do that?
>>
>> > Here's my result when doing that, first with EWW.
>> > #+begin_quote
>> > upload_content = b'file content', name = 'filename', size = 12
>> > 127.0.0.1 - - [07/Jun/2023 18:55:03] "POST / HTTP/1.1" 200 -
>> > #+end_quote
>>
>> Where does one see this result?  Is it in some Emacs buffer?  In that
>> case, what is the name of that buffer?
>>
>> Thanks.
>>
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Sat, 10 Jun 2023 09:43:02 GMT) Full text and rfc822 format available.

Message #23 received at 63941 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: ozzloy <ozzloy <at> gmail.com>
Cc: 63941 <at> debbugs.gnu.org
Subject: Re: bug#63941: [PATCH] ; always CRLF before non-first boundary in
 multipart form
Date: Sat, 10 Jun 2023 12:42:54 +0300
> From: ozzloy <ozzloy <at> gmail.com>
> Date: Wed, 7 Jun 2023 23:52:31 -0700
> Cc: 63941 <at> debbugs.gnu.org
> 
> Quick fix.  In the last email, I said
> >  Once EWW opens localhost:8085, it will display the web page with a
> >  button labeled "Browse". Click that button.
> 
> My bad.  Clicking does not work for me.  What I actually do is put the
> cursor on the button and hit <Enter>.  Then it will bring up the file

Thanks.

Making the change in mm-url.el sounds scary: that code was written
many years ago, and who knows where it is used?  The reason for
testing bolp there is not explained, but I'm guessing they didn't want
to craete an empty line?  It might be better to make a local change in
eww.el: just append "\r\n" to the request data.

I'll leave it to HTTP experts to decide what to do about this.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Sun, 11 Jun 2023 01:40:02 GMT) Full text and rfc822 format available.

Message #26 received at 63941 <at> debbugs.gnu.org (full text, mbox):

From: ozzloy <ozzloy <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63941 <at> debbugs.gnu.org
Subject: Re: bug#63941: [PATCH] ; always CRLF before non-first boundary in
 multipart form
Date: Sat, 10 Jun 2023 18:38:44 -0700
[Message part 1 (text/plain, inline)]
> Making the change in mm-url.el sounds scary: that code was written
> many years ago, and who knows where it is used?

Yeah, that makes sense. I'll see what I can find about that, at least
within emacs itself.

> The reason for testing bolp there is not explained, but I'm guessing
> they didn't want to craete an empty line?

Not sure. It looks like it was introduced during a patch intended to fix
submitting binary data.  Would it make sense to CC the two authors of the
following commits?

I dug into the history a bit to try to find out where =(unless (bolp)=
came from

#+begin_src bash
git log -L '/(unless (bolp)/,+1:lisp/gnus/mm-url.el' > unless-bolp
#+end_src

It looks like it was introduced in this commit

#+begin_quote
commit fca2f70380dcb054497470aaf8eda6173063928e
Author: Kenjiro Nakayama <nakayamakenjiro <at> gmail.com>
Date:   Mon Nov 10 22:33:55 2014 +0100

    Allow uploading files from eww
#+end_quote

and the "\r\n" was included unconditionally before the boundary

#+begin_src elisp
   ;; use the boundary as a separator
   (concat "\r\n--" boundary "\r\n")
#+end_src

Then it this commit changed it

#+begin_quote
commit a6e0188dffc394698d9ffbef50401f14a31c8722
Author: Lars Ingebrigtsen <larsi <at> gnus.org>
Date:   Thu Oct 13 21:39:29 2016 +0200

    Fix problem with submitting binary data via HTTP forms
#+end_quote

to the following

#+begin_src elisp
 (unless (bolp)
      (insert "\r\n"))
#+end_src

plus a bunch of "\r\n"s scattered throughout the different conditions.
It's not clear to me how the behavior of the function changed.

It looks like maybe instead of the initial diff I proposed, something
like this would produce more accurate output for file uploads, and also
change the surrounding code as little as possible,

#+begin_quote
diff --git a/lisp/gnus/mm-url.el b/lisp/gnus/mm-url.el
index 11847a79f17..686dea20b6a 100644
--- a/lisp/gnus/mm-url.el
+++ b/lisp/gnus/mm-url.el
@@ -430,7 +430,8 @@ mm-url-encode-multipart-form-data
              (insert filedata))
             ;; How can this possibly be useful?
             ((integerp filedata)
-             (insert (number-to-string filedata))))))
+             (insert (number-to-string filedata)))))
+         (insert "\r\n"))
         ((equal name "submit")
          (insert
           "Content-Disposition: form-data;
name=\"submit\"\r\n\r\nSubmit\r\n"))
#+end_quote

But it seems like it would be better to do a larger rewrite that
consolidates the places where "\r\n" is added before the non-initial
boundary.

In order to figure that out, I will read the two versions more closely
to figure out what the differences are and write some ERT tests showing
the difference.  Hopefully that will clear things up.
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Sun, 18 Jun 2023 23:24:02 GMT) Full text and rfc822 format available.

Message #29 received at 63941 <at> debbugs.gnu.org (full text, mbox):

From: ozzloy <ozzloy <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63941 <at> debbugs.gnu.org
Subject: Re: bug#63941: [PATCH] ; always CRLF before non-first boundary in
 multipart form
Date: Sun, 18 Jun 2023 16:23:23 -0700
[Message part 1 (text/plain, inline)]
> who knows where it is used?

As far as I can tell, it is only ever used by =(eww-submit ...)= from
=lisp/net/eww.el=.  That's the only place it's used in all of emacs,
and every reference I can find on the web.

I've tested the heck out of it now, and also used firefox, chromium,
and curl to generate http messages for comparison.

Those tests of the different versions of
=mm-url-encode-multipart-form-data=, as well as the http messages
generated by firefox, chromium, and curl can be seen here

https://git.sr.ht/~ozzloy/emacs-bug-63941/tree/53a7949a5db21c456c1da3b4add29343c3d02137/item/mm-url-tests.el

The patch I have attached to this email generates output that matches
firefox, chromium, and curl.  It also includes a bunch of tests for
the included version of =mm-url-encode-multipart-form-data=

If there's some change that would make the patch a better fit, let me
know.  I'm happy to modify it.
[Message part 2 (text/html, inline)]
[0001-always-CRLF-before-non-first-boundary-in-mulitpart-f.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Mon, 19 Jun 2023 16:14:02 GMT) Full text and rfc822 format available.

Message #32 received at 63941 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: ozzloy <ozzloy <at> gmail.com>
Cc: 63941 <at> debbugs.gnu.org
Subject: Re: bug#63941: [PATCH] ; always CRLF before non-first boundary in
 multipart form
Date: Mon, 19 Jun 2023 19:13:41 +0300
> From: ozzloy <ozzloy <at> gmail.com>
> Date: Sun, 18 Jun 2023 16:23:23 -0700
> Cc: 63941 <at> debbugs.gnu.org
> 
> > who knows where it is used?
> 
> As far as I can tell, it is only ever used by =(eww-submit ...)= from
> =lisp/net/eww.el=.  That's the only place it's used in all of emacs,
> and every reference I can find on the web.
> 
> I've tested the heck out of it now, and also used firefox, chromium,
> and curl to generate http messages for comparison.
> 
> Those tests of the different versions of
> =mm-url-encode-multipart-form-data=, as well as the http messages
> generated by firefox, chromium, and curl can be seen here
> 
> https://git.sr.ht/~ozzloy/emacs-bug-63941/tree/53a7949a5db21c456c1da3b4add29343c3d02137/item/mm-url-tests.el
> 
> 
> The patch I have attached to this email generates output that matches
> firefox, chromium, and curl.  It also includes a bunch of tests for
> the included version of =mm-url-encode-multipart-form-data=
> 
> If there's some change that would make the patch a better fit, let me
> know.  I'm happy to modify it.

Thanks.  I really need some Gnus/eww expert to look into this and
review the patch.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Thu, 22 Jun 2023 16:50:02 GMT) Full text and rfc822 format available.

Message #35 received at 63941 <at> debbugs.gnu.org (full text, mbox):

From: ozzloy <ozzloy <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63941 <at> debbugs.gnu.org
Subject: Re: bug#63941: [PATCH] ; always CRLF before non-first boundary in
 multipart form
Date: Thu, 22 Jun 2023 09:49:12 -0700
[Message part 1 (text/plain, inline)]
> I really need some Gnus/eww expert to look into this and review the
> patch.

ok.


While I was writing this patch, I found some other things to patch.
For example, the multipart message will currently only be generated
if there is a file to send. Instead, it should be used whenever there
is a form that has the attribute "enctype" with its value set to
"mutlipart/form-data".

Is the patch ok other than needing review from an expert in gnus/eww?
Is the formatting good, are the comments all right, did I do the
comment block at the top of the new file =mm-url-tests.el= correctly?

I'm writing and submitting more patches, so if there's something like
that I should fix, feel free to let me know.

thanks!
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Thu, 22 Jun 2023 18:26:02 GMT) Full text and rfc822 format available.

Message #38 received at 63941 <at> debbugs.gnu.org (full text, mbox):

From: ozzloy <ozzloy <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 63941 <at> debbugs.gnu.org
Subject: Re: bug#63941: [PATCH] ; always CRLF before non-first boundary in
 multipart form
Date: Thu, 22 Jun 2023 11:25:05 -0700
[Message part 1 (text/plain, inline)]
> I really need some Gnus/eww expert to look into this and review the patch.

Just checking, is this something I should do? Perhaps by reaching out to
Lars
and/or Kenjiro, the two most recent authors on that function?


I made shorter notes on what the bug is and how to reproduce it here

https://git.sr.ht/~ozzloy/emacs-bug-63941/tree/reproduce/item/readme.org


thanks again!
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Thu, 22 Jun 2023 18:30:02 GMT) Full text and rfc822 format available.

Message #41 received at 63941 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: ozzloy <ozzloy <at> gmail.com>
Cc: 63941 <at> debbugs.gnu.org
Subject: Re: bug#63941: [PATCH] ; always CRLF before non-first boundary in
 multipart form
Date: Thu, 22 Jun 2023 21:29:57 +0300
> From: ozzloy <ozzloy <at> gmail.com>
> Date: Thu, 22 Jun 2023 11:25:05 -0700
> Cc: 63941 <at> debbugs.gnu.org
> 
> > I really need some Gnus/eww expert to look into this and review the patch.
> 
> Just checking, is this something I should do? Perhaps by reaching out to Lars
> and/or Kenjiro, the two most recent authors on that function?

You could try, yes.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Fri, 23 Jun 2023 08:23:02 GMT) Full text and rfc822 format available.

Message #44 received at 63941 <at> debbugs.gnu.org (full text, mbox):

From: ozzloy <ozzloy <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>, Kenjiro Nakayama <nakayamakenjiro <at> gmail.com>,
 Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 63941 <at> debbugs.gnu.org
Subject: Re: bug#63941: [PATCH] ; always CRLF before non-first boundary in
 multipart form
Date: Fri, 23 Jun 2023 01:22:20 -0700
[Message part 1 (text/plain, inline)]
>> From: ozzloy <ozzloy <at> gmail.com>
>> Date: Thu, 22 Jun 2023 11:25:05 -0700
>> Cc: 63941 <at> debbugs.gnu.org
>>
>>> I really need some Gnus/eww expert to look into this and review the
patch.
>>
>> Just checking, is this something I should do? Perhaps by reaching out to
Lars
>> and/or Kenjiro, the two most recent authors on that function?
>
> You could try, yes.

oh, wow, i'm glad i asked.  i thought maybe you were doing that and i
should just wait until you were done.

Hi Lars and Kenjiro!

Could you review this patch?  Or suggest someone who would be good to ask?

https://git.sr.ht/~ozzloy/emacs/commit/c73ccda90623519434f8ec2c700adf70ac1d6a00

It fixes a problem where files uploaded through EWW have a newline at EOF
chomped.


Here's a set of instructions to reproduce the problem

https://git.sr.ht/~ozzloy/emacs-bug-63941/tree/reproduce/item/readme.org


and a bunch of tests showing the http messages generated by different
clients
and characterizing the behavior of prior versions of
=mm-url-encode-multipart-form-data=

https://git.sr.ht/~ozzloy/emacs-bug-63941/tree/reproduce/item/mm-url-tests.el


thanks!
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Tue, 18 Jul 2023 19:05:01 GMT) Full text and rfc822 format available.

Message #47 received at 63941 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: ozzloy <ozzloy <at> gmail.com>
Cc: 63941 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#63941: [PATCH] ; always CRLF before non-first boundary in
 multipart form
Date: Tue, 18 Jul 2023 15:04:45 -0400
OK, I'm not super familiar with this code, and while I really appreciate
your tests, I'd rather not completely replace the old with a brand new
code because that makes it hard for me to see what's really changed.

So, as a first step I took the existing code, and did the following:
- "Inline" the (unless (bolp) (insert "\r\n")) into every branch of the
  preceding `cond`.  We know this makes no difference.
- Remove the (unless (bolp) (insert "\r\n")) in the "submit" branch
  since it follows an insertion that ends in "n" and hence doesn't
  do anything.
- Remove the `(unless (bolp)` test from the "file" branch since that's
  what this bug report is about:
  - when `filedata` is a number, this makes no difference (we just
    inserted that number without a trailing \n).
  - when `filedata` is a string that ends in non-\n (a case that
    currently works right) this makes no difference.
  - when it does end in \n this does make a difference which fixes
    this bug.
  - when `filedata` is an empty string, this add an additional \r\n
    compared to the current code.  This seems right to me (I expect the
    decoding software will skip the \r\n at the of the header and then
    look for \r\n<BOUNDARY>, so it's important to have two \r\n).
- In the default case, I left the code as is, but the `(unless (bolp)`
  test is probably not right.

There remain some questions on this patch:

1- When `filedata` is neither a number nor a string this is treated the
   same as an empty string.  Is that right?  Or should it just never
   happen (in which case we should probably catch this case and signal
   an error).
2- Should we also remove the `(unless (bolp)` in the default case?
   I think we do (and my reading of your tests agrees, tho I couldn't
   run your test suite on this version of the code, among other things
   because it contains multiple tests with the same name, so it gives
   me things like (error "Test
   `test-mm-url-encode-multipart-form-data-A-ab-c' redefined")).

I suspect we can also simply this code by moving the first
(insert "--" boundary "\r\n") before the loop, and the second into the
loop so we can make it insert "\r\n--" boundary "\r\n" (and thus remove
\r\n from the end of each of the preceding branches).


        Stefan


diff --git a/lisp/gnus/mm-url.el b/lisp/gnus/mm-url.el
index 11847a79f17..3041ad16264 100644
--- a/lisp/gnus/mm-url.el
+++ b/lisp/gnus/mm-url.el
@@ -430,16 +430,17 @@ mm-url-encode-multipart-form-data
 	      (insert filedata))
 	     ;; How can this possibly be useful?
 	     ((integerp filedata)
-	      (insert (number-to-string filedata))))))
+	      (insert (number-to-string filedata)))))
+	  (insert "\r\n"))
 	 ((equal name "submit")
 	  (insert
 	   "Content-Disposition: form-data; name=\"submit\"\r\n\r\nSubmit\r\n"))
 	 (t
 	  (insert (format "Content-Disposition: form-data; name=%S\r\n\r\n"
 			  name))
-	  (insert value)))
-	(unless (bolp)
-	  (insert "\r\n"))))
+	  (insert value)
+	  (unless (bolp)
+	    (insert "\r\n"))))))
     (insert "--" boundary "--\r\n")
     (buffer-string)))
 





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Fri, 21 Jul 2023 09:05:01 GMT) Full text and rfc822 format available.

Message #50 received at 63941 <at> debbugs.gnu.org (full text, mbox):

From: ozzloy <ozzloy <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 63941 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#63941: [PATCH] ; always CRLF before non-first boundary in
 multipart form
Date: Fri, 21 Jul 2023 02:04:27 -0700
[Message part 1 (text/plain, inline)]
Thanks so much for taking the time to review this!

> I'd rather not completely replace the old with a brand new code
> because that makes it hard for me to see what's really changed.

I thought this would be ok, since the existing version is a complete
rewrite of the original (so there's precedent for complete rewrite of
a function in a commit to fix a bug), and if there were tests showing
the behavior to be the same as before, except where this bug is
fixed. (Though I see the tests are currently broken).

Based on your feedback, and some help from #emacs, I made a patch
that is very minimal to the existing code, with better commit
message, and attached it here.

The patch removes the =(unless (bolp) ...)= guarding inserting CRLF.
The RFC says the "boundary delimiter MUST occur at the beginning of a
line, i.e., following a CRLF".  =(bolp)= is not enough to guarantee
the boundary is preceded by CRLF.  It can be true when the point
is after just "\n".

Because CRLF is inserted unconditionally after the =cond=, the code
does not include the boundary's CRLF in each branch of the =cond=.

> when `filedata` is an empty string, this add an additional \r\n
> compared to the current code.  This seems right to me

Me too, and all the other clients I tested.

> I expect the decoding software will skip the \r\n at the of the
> header and then look for \r\n<BOUNDARY>, so it's important to have
> two \r\n

 What you said is true.  In addition, they also accept

"HEADER\r\nfile content\n--BOUNDARY"

as the content "file content", and consider the last "\n" as attached
to the boundary.  That's where the file's final "\n" gets lost if the
file's content was initially "file content\n".

> There remain some questions on this patch:

While fixing this bug, I found a few more problems in addition to the
two that you mention here.  I was not addressing them yet, since I
thought I should fix one bug per patch.

> I suspect we can also simply this code by moving the first (insert
> "--" boundary "\r\n") before the loop, and the second into the loop
> so we can make it insert "\r\n--" boundary "\r\n" (and thus remove
> \r\n from the end of each of the preceding branches).

Almost, but not quite, or at least not without some awkward (to my
eye) repositioning of inserting boundaries, "--", and "\r\n".  The
final boundary complicates things.  It is different from all the
others, it is "--BOUNDARY--" instead of "--BOUNDARY"

Here's what I ended up with when I tried that,

https://git.sr.ht/~ozzloy/emacs-bug-63941/tree/simplify-insert-boundaries-and---/item/mm-url.el#L397

This passes the tests in =emacs/tests/lisp/gnus/mm-url-tests.el=.
[Message part 2 (text/html, inline)]
[0001-allow-uploading-files-ending-in-newline-via-EWW.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Tue, 29 Aug 2023 00:29:01 GMT) Full text and rfc822 format available.

Message #53 received at 63941 <at> debbugs.gnu.org (full text, mbox):

From: ozzloy <ozzloy <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 63941 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#63941: [PATCH] ; always CRLF before non-first boundary in
 multipart form
Date: Mon, 28 Aug 2023 17:28:00 -0700
[Message part 1 (text/plain, inline)]
i've modified the commit with a couple goals.
  + make the bug fixing part of the diff as small as possible.
  + made the new tests look more like existing ones.
  + rebased onto the head of the emacs-29 branch pulled in
    earlier today.
[Message part 2 (text/html, inline)]
[0001-upload-newline-terminated-files-via-EWW-Bug-63941.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#63941; Package emacs. (Sat, 02 Dec 2023 15:04:02 GMT) Full text and rfc822 format available.

Message #56 received at 63941 <at> debbugs.gnu.org (full text, mbox):

From: ozzloy <ozzloy <at> gmail.com>
To: 63941 <at> debbugs.gnu.org
Cc: Stefan Kangas <stefankangas <at> gmail.com>
Subject: Re: bug#63941: [PATCH] ; always CRLF before non-first boundary in
 multipart form
Date: Sat, 2 Dec 2023 07:03:23 -0800
[Message part 1 (text/plain, inline)]
bump

On Mon, Aug 28, 2023 at 5:28 PM ozzloy <ozzloy <at> gmail.com> wrote:

> i've modified the commit with a couple goals.
>   + make the bug fixing part of the diff as small as possible.
>   + made the new tests look more like existing ones.
>   + rebased onto the head of the emacs-29 branch pulled in
>     earlier today.
>
[Message part 2 (text/html, inline)]

This bug report was last modified 153 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.