GNU bug report logs - #34469
26.1; EWW stops renderring web page on null byte

Previous Next

Package: emacs;

Reported by: Lukasz Pawelczyk <l.pawelczyk <at> samsung.com>

Date: Wed, 13 Feb 2019 15:57:02 UTC

Severity: normal

Tags: fixed

Found in version 26.1

Fixed in version 27.1

Done: Robert Pluim <rpluim <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 34469 in the body.
You can then email your comments to 34469 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#34469; Package emacs. (Wed, 13 Feb 2019 15:57:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Lukasz Pawelczyk <l.pawelczyk <at> samsung.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 13 Feb 2019 15:57:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Lukasz Pawelczyk <l.pawelczyk <at> samsung.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 26.1; EWW stops renderring web page on null byte
Date: Wed, 13 Feb 2019 13:27:16 +0100
As in the topic. See this page:
http://blog.eduardofleury.com/archives/2007/09/13
There is a string with a null byte at the beginning. Firefox renders
the page past this point. EWW stops on:
sock.bind(“



In GNU Emacs 26.1 (build 1, x86_64-redhat-linux-gnu, GTK+ Version
3.23.2)
 of 2018-08-13 built on buildvm-13.phx2.fedoraproject.org
Windowing system distributor 'Fedora Project', version 11.0.12003000
Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Contacting host: blog.eduardofleury.com:80
scroll-up-command: End of buffer [2 times]
Configured using:
 'configure --build=x86_64-redhat-linux-gnu
 --host=x86_64-redhat-linux-gnu --program-prefix=
 --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
 --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc
 --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64
 --libexecdir=/usr/libexec --localstatedir=/var
 --sharedstatedir=/var/lib --mandir=/usr/share/man
 --infodir=/usr/share/info --with-dbus --with-gif --with-jpeg --with-
png
 --with-rsvg --with-tiff --with-xft --with-xpm --with-x-toolkit=gtk3
 --with-gpm=no --with-xwidgets --with-modules
 build_alias=x86_64-redhat-linux-gnu host_alias=x86_64-redhat-linux-gnu
 'CFLAGS=-DMAIL_USE_LOCKF -O2 -g -pipe -Wall -Werror=format-security
 -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions
 -fstack-protector-strong -grecord-gcc-switches
 -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
 -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'
 LDFLAGS=-Wl,-z,relro
 PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'

Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND DBUS GSETTINGS NOTIFY ACL
LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 MODULES THREADS XWIDGETS LCMS2

Important settings:
  value of $LC_COLLATE: C
  value of $LC_CTYPE: pl_PL.UTF-8
  value of $LC_MONETARY: en_US.UTF-8
  value of $LC_NUMERIC: en_US.UTF-8
  value of $LC_TIME: en_US.UTF-8
  value of $LANG: C
  value of $XMODIFIERS: @im=ibus
  locale-coding-system: utf-8-unix

Major mode: eww

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message dired dired-loaddefs rfc822 mml
mml-sec epa derived epg epg-config mm-decode mm-bodies mm-encode
mailabbrev gmm-utils mailheader sendmail cl-extra help-mode
network-stream starttls url-http tls gnutls mail-parse rfc2231 url-gw
nsm rmc url-cache url-auth eww easymenu puny mm-url gnus nnheader
gnus-util rmail rmail-loaddefs rfc2047 rfc2045 ietf-drums mail-utils
wid-edit mm-util mail-prsvr url-queue url url-proxy url-privacy
url-expand url-methods url-history url-cookie url-domsuf url-util
url-parse auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs
password-cache url-vars mailcap shr svg xml seq byte-opt gv bytecomp
byte-compile cconv dom browse-url format-spec cl-loaddefs cl-lib
elec-pair time-date mule-util tooltip eldoc electric uniquify ediff-
hook
vc-hooks lisp-float-type mwheel term/x-win x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode elisp-mode lisp-mode prog-mode register page
menu-bar rfn-eshadow isearch timer select scroll-bar mouse jit-lock
font-lock syntax facemenu font-core term/tty-colors frame cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote dbusbind inotify lcms2
dynamic-setting system-font-setting font-render-setting xwidget-
internal
move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 137138 10359)
 (symbols 48 23803 2)
 (miscs 40 59 148)
 (strings 32 40308 1635)
 (string-bytes 1 1174212)
 (vectors 16 17956)
 (vector-slots 8 544601 12850)
 (floats 8 73 241)
 (intervals 56 3447 0)
 (buffers 992 12))
-- 
Lukasz Pawelczyk
Samsung R&D Institute Poland
Samsung Electronics







Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34469; Package emacs. (Thu, 14 Feb 2019 04:47:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Nicholas Drozd <nicholasdrozd <at> gmail.com>
To: l.pawelczyk <at> samsung.com, bug-gnu-emacs <at> gnu.org
Subject: bug#34469: 26.1; EWW stops renderring web page on null byte
Date: Wed, 13 Feb 2019 22:44:50 -0600
This looks a problem with libxml-parse-html-region (or maybe even
lower than that, I have no idea). Put the following in a buffer

  <p>sock.bind(&#8220;\0MyBindName&#8221;)</p>

and execute

  (libxml-parse-html-region (point-min) (point-max))

This returns

  (html nil (body nil (p nil "sock.bind(“")))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34469; Package emacs. (Thu, 14 Feb 2019 19:15:01 GMT) Full text and rfc822 format available.

Message #11 received at 34469 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Nicholas Drozd <nicholasdrozd <at> gmail.com>
Cc: 34469 <at> debbugs.gnu.org, l.pawelczyk <at> samsung.com
Subject: Re: bug#34469: 26.1; EWW stops renderring web page on null byte
Date: Thu, 14 Feb 2019 21:14:12 +0200
> From: Nicholas Drozd <nicholasdrozd <at> gmail.com>
> Date: Wed, 13 Feb 2019 22:44:50 -0600
> 
> This looks a problem with libxml-parse-html-region (or maybe even
> lower than that, I have no idea).

libxml-parse-html-region calls parse_region, which passes a C string
to libxml functions.  So there can be no embedded null bytes.

Does libxml have facilities to deal with such cases?  If not, maybe
this should be taken up with libxml developers.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34469; Package emacs. (Sat, 16 Feb 2019 18:14:02 GMT) Full text and rfc822 format available.

Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Nicholas Drozd <nicholasdrozd <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org, Eli Zaretskii <eliz <at> gnu.org>
Subject: bug#34469: 26.1; EWW stops renderring web page on null byte
Date: Sat, 16 Feb 2019 12:13:03 -0600
This is a known issue with libxml, or at least it was at some point.
Here's a thread from 2008:
https://mail.gnome.org/archives/xml/2008-August/msg00008.html




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34469; Package emacs. (Tue, 19 Feb 2019 01:13:01 GMT) Full text and rfc822 format available.

Message #17 received at 34469 <at> debbugs.gnu.org (full text, mbox):

From: Glenn Morris <rgm <at> gnu.org>
To: Nicholas Drozd <nicholasdrozd <at> gmail.com>
Cc: eliz <at> gnu.org, 34469 <at> debbugs.gnu.org
Subject: Re: bug#34469: 26.1; EWW stops renderring web page on null byte
Date: Mon, 18 Feb 2019 20:12:41 -0500
Perhaps eww-display-html should replace null bytes (with whatever the
html standard says is appropriate) before calling
libxml-parse-html-region. It already replaces CRLF.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34469; Package emacs. (Tue, 19 Feb 2019 10:07:01 GMT) Full text and rfc822 format available.

Message #20 received at 34469 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Glenn Morris <rgm <at> gnu.org>
Cc: 34469 <at> debbugs.gnu.org, Nicholas Drozd <nicholasdrozd <at> gmail.com>
Subject: Re: bug#34469: 26.1; EWW stops renderring web page on null byte
Date: Tue, 19 Feb 2019 11:06:37 +0100
Glenn Morris <rgm <at> gnu.org> writes:

> Perhaps eww-display-html should replace null bytes (with whatever the
> html standard says is appropriate) before calling
> libxml-parse-html-region. It already replaces CRLF.

Chrome at least just strips the null byte completely.

There is apparently a class of attacks that uses the null character
for nefarious purposes, so how about something like this:

diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index 1cc4557ce1..9b57bc43e4 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -448,8 +448,8 @@ eww-display-html
 		    (decode-coding-region (point) (point-max) encode)
 		  (coding-system-error nil))
                 (save-excursion
-                  ;; Remove CRLF before parsing.
-                  (while (re-search-forward "\r$" nil t)
+                  ;; Remove CRLF and NULL before parsing.
+                  (while (re-search-forward "\r$\\|\000" nil t)
                     (replace-match "" t t)))
 		(libxml-parse-html-region (point) (point-max))))))
 	(source (and (null document)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34469; Package emacs. (Tue, 19 Feb 2019 16:32:02 GMT) Full text and rfc822 format available.

Message #23 received at 34469 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: rgm <at> gnu.org, 34469 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
Subject: Re: bug#34469: 26.1; EWW stops renderring web page on null byte
Date: Tue, 19 Feb 2019 18:30:48 +0200
> From: Robert Pluim <rpluim <at> gmail.com>
> Date: Tue, 19 Feb 2019 11:06:37 +0100
> Cc: 34469 <at> debbugs.gnu.org, Nicholas Drozd <nicholasdrozd <at> gmail.com>
> 
> Glenn Morris <rgm <at> gnu.org> writes:
> 
> > Perhaps eww-display-html should replace null bytes (with whatever the
> > html standard says is appropriate) before calling
> > libxml-parse-html-region. It already replaces CRLF.
> 
> Chrome at least just strips the null byte completely.
> 
> There is apparently a class of attacks that uses the null character
> for nefarious purposes, so how about something like this:
> 
> diff --git a/lisp/net/eww.el b/lisp/net/eww.el
> index 1cc4557ce1..9b57bc43e4 100644
> --- a/lisp/net/eww.el
> +++ b/lisp/net/eww.el
> @@ -448,8 +448,8 @@ eww-display-html
>  		    (decode-coding-region (point) (point-max) encode)
>  		  (coding-system-error nil))
>                  (save-excursion
> -                  ;; Remove CRLF before parsing.
> -                  (while (re-search-forward "\r$" nil t)
> +                  ;; Remove CRLF and NULL before parsing.
> +                  (while (re-search-forward "\r$\\|\000" nil t)
>                      (replace-match "" t t)))

It is un-Emacsy, IMO, to remove content without a trace.  (CR is
different: we simply convert text to Unix LF-only EOL format.)  So I'd
suggest to replace with "^@" or "\000" or "NUL" or something to that
effect.  Even U+FFFD would be better than removing.

(We could get fancy and have a defcustom for those who do want the
null bytes removed.)

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34469; Package emacs. (Tue, 19 Feb 2019 17:38:01 GMT) Full text and rfc822 format available.

Message #26 received at 34469 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 34469 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
Subject: Re: bug#34469: 26.1; EWW stops renderring web page on null byte
Date: Tue, 19 Feb 2019 18:37:26 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Robert Pluim <rpluim <at> gmail.com>
>> Date: Tue, 19 Feb 2019 11:06:37 +0100
>> Cc: 34469 <at> debbugs.gnu.org, Nicholas Drozd <nicholasdrozd <at> gmail.com>
>> 
>> Glenn Morris <rgm <at> gnu.org> writes:
>> 
>> > Perhaps eww-display-html should replace null bytes (with whatever the
>> > html standard says is appropriate) before calling
>> > libxml-parse-html-region. It already replaces CRLF.
>> 
>> Chrome at least just strips the null byte completely.
>> 
>> There is apparently a class of attacks that uses the null character
>> for nefarious purposes, so how about something like this:
>> 
>> diff --git a/lisp/net/eww.el b/lisp/net/eww.el
>> index 1cc4557ce1..9b57bc43e4 100644
>> --- a/lisp/net/eww.el
>> +++ b/lisp/net/eww.el
>> @@ -448,8 +448,8 @@ eww-display-html
>>  		    (decode-coding-region (point) (point-max) encode)
>>  		  (coding-system-error nil))
>>                  (save-excursion
>> -                  ;; Remove CRLF before parsing.
>> -                  (while (re-search-forward "\r$" nil t)
>> +                  ;; Remove CRLF and NULL before parsing.
>> +                  (while (re-search-forward "\r$\\|\000" nil t)
>>                      (replace-match "" t t)))
>
> It is un-Emacsy, IMO, to remove content without a trace.  (CR is
> different: we simply convert text to Unix LF-only EOL format.)  So I'd
> suggest to replace with "^@" or "\000" or "NUL" or something to that
> effect.  Even U+FFFD would be better than removing.
>

Since this is all due to a C-ism in the handling of content, Iʼd vote
for "\0", although this is inside Emacs, so perhaps "^@" is best.

> (We could get fancy and have a defcustom for those who do want the
> null bytes removed.)

I really donʼt think this is something that needs to be configurable.

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34469; Package emacs. (Tue, 19 Feb 2019 18:12:02 GMT) Full text and rfc822 format available.

Message #29 received at 34469 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 34469 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
Subject: Re: bug#34469: 26.1; EWW stops renderring web page on null byte
Date: Tue, 19 Feb 2019 20:11:17 +0200
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 34469 <at> debbugs.gnu.org,  nicholasdrozd <at> gmail.com
> Date: Tue, 19 Feb 2019 18:37:26 +0100
> 
> Since this is all due to a C-ism in the handling of content, Iʼd vote
> for "\0", although this is inside Emacs, so perhaps "^@" is best.

Either is fine with me.

> > (We could get fancy and have a defcustom for those who do want the
> > null bytes removed.)
> 
> I really donʼt think this is something that needs to be configurable.

Neither do I.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34469; Package emacs. (Wed, 20 Feb 2019 18:49:03 GMT) Full text and rfc822 format available.

Message #32 received at 34469 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 34469 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
Subject: Re: bug#34469: 26.1; EWW stops renderring web page on null byte
Date: Wed, 20 Feb 2019 19:48:50 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Robert Pluim <rpluim <at> gmail.com>
>> Cc: 34469 <at> debbugs.gnu.org,  nicholasdrozd <at> gmail.com
>> Date: Tue, 19 Feb 2019 18:37:26 +0100
>> 
>> Since this is all due to a C-ism in the handling of content, Iʼd vote
>> for "\0", although this is inside Emacs, so perhaps "^@" is best.
>
> Either is fine with me.

Since the web page that triggered this was showing C code, Iʼve gone
for the "\0" option.

2019-02-20  Robert Pluim  <rpluim <at> gmail.com>

	* lisp/net/eww.el (eww-display-html): Replace NULL characters with
	"\0", as libxml can't handle embedded NULLs.
diff --git i/lisp/net/eww.el w/lisp/net/eww.el
index 555b3bd591..06075b1ebd 100644
--- i/lisp/net/eww.el
+++ w/lisp/net/eww.el
@@ -462,10 +462,12 @@ eww-display-html
 		(condition-case nil
 		    (decode-coding-region (point) (point-max) encode)
 		  (coding-system-error nil))
-                (save-excursion
-                  ;; Remove CRLF before parsing.
-                  (while (re-search-forward "\r$" nil t)
-                    (replace-match "" t t)))
+		(save-excursion
+		  ;; Remove CRLF and NULL before parsing.
+                  (while (re-search-forward "\\(\r$\\)\\|\\(\000\\)" nil t)
+                    (replace-match (if (match-beginning 1)
+                                       ""
+                                     "\\0") t t)))
 		(libxml-parse-html-region (point) (point-max))))))
 	(source (and (null document)
 		     (buffer-substring (point) (point-max)))))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34469; Package emacs. (Wed, 27 Feb 2019 11:32:02 GMT) Full text and rfc822 format available.

Message #35 received at 34469 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 34469 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
Subject: Re: bug#34469: 26.1; EWW stops renderring web page on null byte
Date: Wed, 27 Feb 2019 12:31:45 +0100
Robert Pluim <rpluim <at> gmail.com> writes:

Ping!

Eli, release or master?

> 2019-02-20  Robert Pluim  <rpluim <at> gmail.com>
>
> 	* lisp/net/eww.el (eww-display-html): Replace NULL characters with
> 	"\0", as libxml can't handle embedded NULLs.
> diff --git i/lisp/net/eww.el w/lisp/net/eww.el
> index 555b3bd591..06075b1ebd 100644
> --- i/lisp/net/eww.el
> +++ w/lisp/net/eww.el
> @@ -462,10 +462,12 @@ eww-display-html
>  		(condition-case nil
>  		    (decode-coding-region (point) (point-max) encode)
>  		  (coding-system-error nil))
> -                (save-excursion
> -                  ;; Remove CRLF before parsing.
> -                  (while (re-search-forward "\r$" nil t)
> -                    (replace-match "" t t)))
> +		(save-excursion
> +		  ;; Remove CRLF and NULL before parsing.
> +                  (while (re-search-forward "\\(\r$\\)\\|\\(\000\\)" nil t)
> +                    (replace-match (if (match-beginning 1)
> +                                       ""
> +                                     "\\0") t t)))
>  		(libxml-parse-html-region (point) (point-max))))))
>  	(source (and (null document)
>  		     (buffer-substring (point) (point-max)))))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34469; Package emacs. (Wed, 27 Feb 2019 15:57:02 GMT) Full text and rfc822 format available.

Message #38 received at 34469 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: 34469 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
Subject: Re: bug#34469: 26.1; EWW stops renderring web page on null byte
Date: Wed, 27 Feb 2019 17:55:56 +0200
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 34469 <at> debbugs.gnu.org,  nicholasdrozd <at> gmail.com
> Date: Wed, 27 Feb 2019 12:31:45 +0100
> 
> Robert Pluim <rpluim <at> gmail.com> writes:
> 
> Ping!
> 
> Eli, release or master?

Master, please.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34469; Package emacs. (Wed, 27 Feb 2019 16:22:02 GMT) Full text and rfc822 format available.

Message #41 received at 34469 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 34469 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
Subject: Re: bug#34469: 26.1; EWW stops renderring web page on null byte
Date: Wed, 27 Feb 2019 17:21:36 +0100
tags 34469 fixed
close 34469 27.1
quit

Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Robert Pluim <rpluim <at> gmail.com>
>> Cc: 34469 <at> debbugs.gnu.org,  nicholasdrozd <at> gmail.com
>> Date: Wed, 27 Feb 2019 12:31:45 +0100
>> 
>> Robert Pluim <rpluim <at> gmail.com> writes:
>> 
>> Ping!
>> 
>> Eli, release or master?
>
> Master, please.

Done as d07f3aae48
Closing.

Robert




Added tag(s) fixed. Request was from Robert Pluim <rpluim <at> gmail.com> to control <at> debbugs.gnu.org. (Wed, 27 Feb 2019 16:22:02 GMT) Full text and rfc822 format available.

bug marked as fixed in version 27.1, send any further explanations to 34469 <at> debbugs.gnu.org and Lukasz Pawelczyk <l.pawelczyk <at> samsung.com> Request was from Robert Pluim <rpluim <at> gmail.com> to control <at> debbugs.gnu.org. (Wed, 27 Feb 2019 16:22:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34469; Package emacs. (Thu, 28 Feb 2019 01:54:02 GMT) Full text and rfc822 format available.

Message #48 received at 34469 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: Glenn Morris <rgm <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 34469 <at> debbugs.gnu.org, Lukasz Pawelczyk <l.pawelczyk <at> samsung.com>,
 Nicholas Drozd <nicholasdrozd <at> gmail.com>
Subject: 26.1; EWW stops renderring web page on null byte
Date: Wed, 27 Feb 2019 17:52:52 -0800
[Message part 1 (text/plain, inline)]
Thanks for fixing that bug. However, replacing NUL with \0 sounds iffy.
Even if we assume that a web page contains C-like code, the replacement
would mishandle a NUL followed by an octal digit, since the replacement
would look like \07 which would be interpreted as a BEL character, not
as a NULL followed by a digit 7. And web pages do not typically contain
C code, so the replacement \0 might cause other trouble.

Instead, it sounds better to replace NUL with the four-character
sequence "&#0;", as this is a standard HTML way to represent a NUL
character. I installed the attached patch to do this.

In my little tests with this patch, libxml2 typically handled &#0; by
discarding it and continuing to parse, which is better than ignoring the
rest of the input. In some cases libxml2 handles &#0; by discarding
later input up to a delimiter; although this is bad, it's a libxml2 bug
that attackers can exploit independently of what Emacs does with NUL,
since attackers can simply use &#0;.

[0001-Escape-HTML-NUL-as-0-in-eww.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34469; Package emacs. (Thu, 28 Feb 2019 08:47:03 GMT) Full text and rfc822 format available.

Message #51 received at 34469 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Glenn Morris <rgm <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 34469 <at> debbugs.gnu.org, Lukasz Pawelczyk <l.pawelczyk <at> samsung.com>,
 Nicholas Drozd <nicholasdrozd <at> gmail.com>
Subject: Re: 26.1; EWW stops renderring web page on null byte
Date: Thu, 28 Feb 2019 09:46:46 +0100
Paul Eggert <eggert <at> cs.ucla.edu> writes:

> Thanks for fixing that bug. However, replacing NUL with \0 sounds iffy.
> Even if we assume that a web page contains C-like code, the replacement
> would mishandle a NUL followed by an octal digit, since the replacement
> would look like \07 which would be interpreted as a BEL character, not
> as a NULL followed by a digit 7. And web pages do not typically contain
> C code, so the replacement \0 might cause other trouble.
>

In my sample of 1 website, 100% of them contained C code :-)

> Instead, it sounds better to replace NUL with the four-character
> sequence "&#0;", as this is a standard HTML way to represent a NUL
> character. I installed the attached patch to do this.
>

OK by me.

Robert




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 28 Mar 2019 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 31 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.