GNU bug report logs -
#34469
26.1; EWW stops renderring web page on null byte
Previous Next
Reported by: Lukasz Pawelczyk <l.pawelczyk <at> samsung.com>
Date: Wed, 13 Feb 2019 15:57:02 UTC
Severity: normal
Tags: fixed
Found in version 26.1
Fixed in version 27.1
Done: Robert Pluim <rpluim <at> gmail.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 34469 in the body.
You can then email your comments to 34469 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#34469
; Package
emacs
.
(Wed, 13 Feb 2019 15:57:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Lukasz Pawelczyk <l.pawelczyk <at> samsung.com>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Wed, 13 Feb 2019 15:57:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
As in the topic. See this page:
http://blog.eduardofleury.com/archives/2007/09/13
There is a string with a null byte at the beginning. Firefox renders
the page past this point. EWW stops on:
sock.bind(“
In GNU Emacs 26.1 (build 1, x86_64-redhat-linux-gnu, GTK+ Version
3.23.2)
of 2018-08-13 built on buildvm-13.phx2.fedoraproject.org
Windowing system distributor 'Fedora Project', version 11.0.12003000
Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Contacting host: blog.eduardofleury.com:80
scroll-up-command: End of buffer [2 times]
Configured using:
'configure --build=x86_64-redhat-linux-gnu
--host=x86_64-redhat-linux-gnu --program-prefix=
--disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
--bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc
--datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64
--libexecdir=/usr/libexec --localstatedir=/var
--sharedstatedir=/var/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-dbus --with-gif --with-jpeg --with-
png
--with-rsvg --with-tiff --with-xft --with-xpm --with-x-toolkit=gtk3
--with-gpm=no --with-xwidgets --with-modules
build_alias=x86_64-redhat-linux-gnu host_alias=x86_64-redhat-linux-gnu
'CFLAGS=-DMAIL_USE_LOCKF -O2 -g -pipe -Wall -Werror=format-security
-Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions
-fstack-protector-strong -grecord-gcc-switches
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic
-fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'
LDFLAGS=-Wl,-z,relro
PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
Configured features:
XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND DBUS GSETTINGS NOTIFY ACL
LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT ZLIB
TOOLKIT_SCROLL_BARS GTK3 X11 MODULES THREADS XWIDGETS LCMS2
Important settings:
value of $LC_COLLATE: C
value of $LC_CTYPE: pl_PL.UTF-8
value of $LC_MONETARY: en_US.UTF-8
value of $LC_NUMERIC: en_US.UTF-8
value of $LC_TIME: en_US.UTF-8
value of $LANG: C
value of $XMODIFIERS: @im=ibus
locale-coding-system: utf-8-unix
Major mode: eww
Minor modes in effect:
tooltip-mode: t
global-eldoc-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
buffer-read-only: t
line-number-mode: t
transient-mark-mode: t
Load-path shadows:
None found.
Features:
(shadow sort mail-extr emacsbug message dired dired-loaddefs rfc822 mml
mml-sec epa derived epg epg-config mm-decode mm-bodies mm-encode
mailabbrev gmm-utils mailheader sendmail cl-extra help-mode
network-stream starttls url-http tls gnutls mail-parse rfc2231 url-gw
nsm rmc url-cache url-auth eww easymenu puny mm-url gnus nnheader
gnus-util rmail rmail-loaddefs rfc2047 rfc2045 ietf-drums mail-utils
wid-edit mm-util mail-prsvr url-queue url url-proxy url-privacy
url-expand url-methods url-history url-cookie url-domsuf url-util
url-parse auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs
password-cache url-vars mailcap shr svg xml seq byte-opt gv bytecomp
byte-compile cconv dom browse-url format-spec cl-loaddefs cl-lib
elec-pair time-date mule-util tooltip eldoc electric uniquify ediff-
hook
vc-hooks lisp-float-type mwheel term/x-win x-win term/common-win x-dnd
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode elisp-mode lisp-mode prog-mode register page
menu-bar rfn-eshadow isearch timer select scroll-bar mouse jit-lock
font-lock syntax facemenu font-core term/tty-colors frame cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote dbusbind inotify lcms2
dynamic-setting system-font-setting font-render-setting xwidget-
internal
move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)
Memory information:
((conses 16 137138 10359)
(symbols 48 23803 2)
(miscs 40 59 148)
(strings 32 40308 1635)
(string-bytes 1 1174212)
(vectors 16 17956)
(vector-slots 8 544601 12850)
(floats 8 73 241)
(intervals 56 3447 0)
(buffers 992 12))
--
Lukasz Pawelczyk
Samsung R&D Institute Poland
Samsung Electronics
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#34469
; Package
emacs
.
(Thu, 14 Feb 2019 04:47:02 GMT)
Full text and
rfc822 format available.
Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):
This looks a problem with libxml-parse-html-region (or maybe even
lower than that, I have no idea). Put the following in a buffer
<p>sock.bind(“\0MyBindName”)</p>
and execute
(libxml-parse-html-region (point-min) (point-max))
This returns
(html nil (body nil (p nil "sock.bind(“")))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#34469
; Package
emacs
.
(Thu, 14 Feb 2019 19:15:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 34469 <at> debbugs.gnu.org (full text, mbox):
> From: Nicholas Drozd <nicholasdrozd <at> gmail.com>
> Date: Wed, 13 Feb 2019 22:44:50 -0600
>
> This looks a problem with libxml-parse-html-region (or maybe even
> lower than that, I have no idea).
libxml-parse-html-region calls parse_region, which passes a C string
to libxml functions. So there can be no embedded null bytes.
Does libxml have facilities to deal with such cases? If not, maybe
this should be taken up with libxml developers.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#34469
; Package
emacs
.
(Sat, 16 Feb 2019 18:14:02 GMT)
Full text and
rfc822 format available.
Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):
This is a known issue with libxml, or at least it was at some point.
Here's a thread from 2008:
https://mail.gnome.org/archives/xml/2008-August/msg00008.html
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#34469
; Package
emacs
.
(Tue, 19 Feb 2019 01:13:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 34469 <at> debbugs.gnu.org (full text, mbox):
Perhaps eww-display-html should replace null bytes (with whatever the
html standard says is appropriate) before calling
libxml-parse-html-region. It already replaces CRLF.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#34469
; Package
emacs
.
(Tue, 19 Feb 2019 10:07:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 34469 <at> debbugs.gnu.org (full text, mbox):
Glenn Morris <rgm <at> gnu.org> writes:
> Perhaps eww-display-html should replace null bytes (with whatever the
> html standard says is appropriate) before calling
> libxml-parse-html-region. It already replaces CRLF.
Chrome at least just strips the null byte completely.
There is apparently a class of attacks that uses the null character
for nefarious purposes, so how about something like this:
diff --git a/lisp/net/eww.el b/lisp/net/eww.el
index 1cc4557ce1..9b57bc43e4 100644
--- a/lisp/net/eww.el
+++ b/lisp/net/eww.el
@@ -448,8 +448,8 @@ eww-display-html
(decode-coding-region (point) (point-max) encode)
(coding-system-error nil))
(save-excursion
- ;; Remove CRLF before parsing.
- (while (re-search-forward "\r$" nil t)
+ ;; Remove CRLF and NULL before parsing.
+ (while (re-search-forward "\r$\\|\000" nil t)
(replace-match "" t t)))
(libxml-parse-html-region (point) (point-max))))))
(source (and (null document)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#34469
; Package
emacs
.
(Tue, 19 Feb 2019 16:32:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 34469 <at> debbugs.gnu.org (full text, mbox):
> From: Robert Pluim <rpluim <at> gmail.com>
> Date: Tue, 19 Feb 2019 11:06:37 +0100
> Cc: 34469 <at> debbugs.gnu.org, Nicholas Drozd <nicholasdrozd <at> gmail.com>
>
> Glenn Morris <rgm <at> gnu.org> writes:
>
> > Perhaps eww-display-html should replace null bytes (with whatever the
> > html standard says is appropriate) before calling
> > libxml-parse-html-region. It already replaces CRLF.
>
> Chrome at least just strips the null byte completely.
>
> There is apparently a class of attacks that uses the null character
> for nefarious purposes, so how about something like this:
>
> diff --git a/lisp/net/eww.el b/lisp/net/eww.el
> index 1cc4557ce1..9b57bc43e4 100644
> --- a/lisp/net/eww.el
> +++ b/lisp/net/eww.el
> @@ -448,8 +448,8 @@ eww-display-html
> (decode-coding-region (point) (point-max) encode)
> (coding-system-error nil))
> (save-excursion
> - ;; Remove CRLF before parsing.
> - (while (re-search-forward "\r$" nil t)
> + ;; Remove CRLF and NULL before parsing.
> + (while (re-search-forward "\r$\\|\000" nil t)
> (replace-match "" t t)))
It is un-Emacsy, IMO, to remove content without a trace. (CR is
different: we simply convert text to Unix LF-only EOL format.) So I'd
suggest to replace with "^@" or "\000" or "NUL" or something to that
effect. Even U+FFFD would be better than removing.
(We could get fancy and have a defcustom for those who do want the
null bytes removed.)
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#34469
; Package
emacs
.
(Tue, 19 Feb 2019 17:38:01 GMT)
Full text and
rfc822 format available.
Message #26 received at 34469 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Robert Pluim <rpluim <at> gmail.com>
>> Date: Tue, 19 Feb 2019 11:06:37 +0100
>> Cc: 34469 <at> debbugs.gnu.org, Nicholas Drozd <nicholasdrozd <at> gmail.com>
>>
>> Glenn Morris <rgm <at> gnu.org> writes:
>>
>> > Perhaps eww-display-html should replace null bytes (with whatever the
>> > html standard says is appropriate) before calling
>> > libxml-parse-html-region. It already replaces CRLF.
>>
>> Chrome at least just strips the null byte completely.
>>
>> There is apparently a class of attacks that uses the null character
>> for nefarious purposes, so how about something like this:
>>
>> diff --git a/lisp/net/eww.el b/lisp/net/eww.el
>> index 1cc4557ce1..9b57bc43e4 100644
>> --- a/lisp/net/eww.el
>> +++ b/lisp/net/eww.el
>> @@ -448,8 +448,8 @@ eww-display-html
>> (decode-coding-region (point) (point-max) encode)
>> (coding-system-error nil))
>> (save-excursion
>> - ;; Remove CRLF before parsing.
>> - (while (re-search-forward "\r$" nil t)
>> + ;; Remove CRLF and NULL before parsing.
>> + (while (re-search-forward "\r$\\|\000" nil t)
>> (replace-match "" t t)))
>
> It is un-Emacsy, IMO, to remove content without a trace. (CR is
> different: we simply convert text to Unix LF-only EOL format.) So I'd
> suggest to replace with "^@" or "\000" or "NUL" or something to that
> effect. Even U+FFFD would be better than removing.
>
Since this is all due to a C-ism in the handling of content, Iʼd vote
for "\0", although this is inside Emacs, so perhaps "^@" is best.
> (We could get fancy and have a defcustom for those who do want the
> null bytes removed.)
I really donʼt think this is something that needs to be configurable.
Robert
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#34469
; Package
emacs
.
(Tue, 19 Feb 2019 18:12:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 34469 <at> debbugs.gnu.org (full text, mbox):
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 34469 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
> Date: Tue, 19 Feb 2019 18:37:26 +0100
>
> Since this is all due to a C-ism in the handling of content, Iʼd vote
> for "\0", although this is inside Emacs, so perhaps "^@" is best.
Either is fine with me.
> > (We could get fancy and have a defcustom for those who do want the
> > null bytes removed.)
>
> I really donʼt think this is something that needs to be configurable.
Neither do I.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#34469
; Package
emacs
.
(Wed, 20 Feb 2019 18:49:03 GMT)
Full text and
rfc822 format available.
Message #32 received at 34469 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Robert Pluim <rpluim <at> gmail.com>
>> Cc: 34469 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
>> Date: Tue, 19 Feb 2019 18:37:26 +0100
>>
>> Since this is all due to a C-ism in the handling of content, Iʼd vote
>> for "\0", although this is inside Emacs, so perhaps "^@" is best.
>
> Either is fine with me.
Since the web page that triggered this was showing C code, Iʼve gone
for the "\0" option.
2019-02-20 Robert Pluim <rpluim <at> gmail.com>
* lisp/net/eww.el (eww-display-html): Replace NULL characters with
"\0", as libxml can't handle embedded NULLs.
diff --git i/lisp/net/eww.el w/lisp/net/eww.el
index 555b3bd591..06075b1ebd 100644
--- i/lisp/net/eww.el
+++ w/lisp/net/eww.el
@@ -462,10 +462,12 @@ eww-display-html
(condition-case nil
(decode-coding-region (point) (point-max) encode)
(coding-system-error nil))
- (save-excursion
- ;; Remove CRLF before parsing.
- (while (re-search-forward "\r$" nil t)
- (replace-match "" t t)))
+ (save-excursion
+ ;; Remove CRLF and NULL before parsing.
+ (while (re-search-forward "\\(\r$\\)\\|\\(\000\\)" nil t)
+ (replace-match (if (match-beginning 1)
+ ""
+ "\\0") t t)))
(libxml-parse-html-region (point) (point-max))))))
(source (and (null document)
(buffer-substring (point) (point-max)))))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#34469
; Package
emacs
.
(Wed, 27 Feb 2019 11:32:02 GMT)
Full text and
rfc822 format available.
Message #35 received at 34469 <at> debbugs.gnu.org (full text, mbox):
Robert Pluim <rpluim <at> gmail.com> writes:
Ping!
Eli, release or master?
> 2019-02-20 Robert Pluim <rpluim <at> gmail.com>
>
> * lisp/net/eww.el (eww-display-html): Replace NULL characters with
> "\0", as libxml can't handle embedded NULLs.
> diff --git i/lisp/net/eww.el w/lisp/net/eww.el
> index 555b3bd591..06075b1ebd 100644
> --- i/lisp/net/eww.el
> +++ w/lisp/net/eww.el
> @@ -462,10 +462,12 @@ eww-display-html
> (condition-case nil
> (decode-coding-region (point) (point-max) encode)
> (coding-system-error nil))
> - (save-excursion
> - ;; Remove CRLF before parsing.
> - (while (re-search-forward "\r$" nil t)
> - (replace-match "" t t)))
> + (save-excursion
> + ;; Remove CRLF and NULL before parsing.
> + (while (re-search-forward "\\(\r$\\)\\|\\(\000\\)" nil t)
> + (replace-match (if (match-beginning 1)
> + ""
> + "\\0") t t)))
> (libxml-parse-html-region (point) (point-max))))))
> (source (and (null document)
> (buffer-substring (point) (point-max)))))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#34469
; Package
emacs
.
(Wed, 27 Feb 2019 15:57:02 GMT)
Full text and
rfc822 format available.
Message #38 received at 34469 <at> debbugs.gnu.org (full text, mbox):
> From: Robert Pluim <rpluim <at> gmail.com>
> Cc: 34469 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
> Date: Wed, 27 Feb 2019 12:31:45 +0100
>
> Robert Pluim <rpluim <at> gmail.com> writes:
>
> Ping!
>
> Eli, release or master?
Master, please.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#34469
; Package
emacs
.
(Wed, 27 Feb 2019 16:22:02 GMT)
Full text and
rfc822 format available.
Message #41 received at 34469 <at> debbugs.gnu.org (full text, mbox):
tags 34469 fixed
close 34469 27.1
quit
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Robert Pluim <rpluim <at> gmail.com>
>> Cc: 34469 <at> debbugs.gnu.org, nicholasdrozd <at> gmail.com
>> Date: Wed, 27 Feb 2019 12:31:45 +0100
>>
>> Robert Pluim <rpluim <at> gmail.com> writes:
>>
>> Ping!
>>
>> Eli, release or master?
>
> Master, please.
Done as d07f3aae48
Closing.
Robert
Added tag(s) fixed.
Request was from
Robert Pluim <rpluim <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Wed, 27 Feb 2019 16:22:02 GMT)
Full text and
rfc822 format available.
bug marked as fixed in version 27.1, send any further explanations to
34469 <at> debbugs.gnu.org and Lukasz Pawelczyk <l.pawelczyk <at> samsung.com>
Request was from
Robert Pluim <rpluim <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Wed, 27 Feb 2019 16:22:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#34469
; Package
emacs
.
(Thu, 28 Feb 2019 01:54:02 GMT)
Full text and
rfc822 format available.
Message #48 received at 34469 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Thanks for fixing that bug. However, replacing NUL with \0 sounds iffy.
Even if we assume that a web page contains C-like code, the replacement
would mishandle a NUL followed by an octal digit, since the replacement
would look like \07 which would be interpreted as a BEL character, not
as a NULL followed by a digit 7. And web pages do not typically contain
C code, so the replacement \0 might cause other trouble.
Instead, it sounds better to replace NUL with the four-character
sequence "�", as this is a standard HTML way to represent a NUL
character. I installed the attached patch to do this.
In my little tests with this patch, libxml2 typically handled � by
discarding it and continuing to parse, which is better than ignoring the
rest of the input. In some cases libxml2 handles � by discarding
later input up to a delimiter; although this is bad, it's a libxml2 bug
that attackers can exploit independently of what Emacs does with NUL,
since attackers can simply use �.
[0001-Escape-HTML-NUL-as-0-in-eww.patch (text/x-patch, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#34469
; Package
emacs
.
(Thu, 28 Feb 2019 08:47:03 GMT)
Full text and
rfc822 format available.
Message #51 received at 34469 <at> debbugs.gnu.org (full text, mbox):
Paul Eggert <eggert <at> cs.ucla.edu> writes:
> Thanks for fixing that bug. However, replacing NUL with \0 sounds iffy.
> Even if we assume that a web page contains C-like code, the replacement
> would mishandle a NUL followed by an octal digit, since the replacement
> would look like \07 which would be interpreted as a BEL character, not
> as a NULL followed by a digit 7. And web pages do not typically contain
> C code, so the replacement \0 might cause other trouble.
>
In my sample of 1 website, 100% of them contained C code :-)
> Instead, it sounds better to replace NUL with the four-character
> sequence "�", as this is a standard HTML way to represent a NUL
> character. I installed the attached patch to do this.
>
OK by me.
Robert
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 28 Mar 2019 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 5 years and 31 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.