GNU bug report logs -
#17245
GREP BUG: grep -P and binary files
Previous Next
Reported by: damon <dh <at> bug-grep.usrbin.org>
Date: Sat, 12 Apr 2014 00:28:01 UTC
Severity: normal
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17245 in the body.
You can then email your comments to 17245 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#17245
; Package
grep
.
(Sat, 12 Apr 2014 00:28:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
damon <dh <at> bug-grep.usrbin.org>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Sat, 12 Apr 2014 00:28:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi there -
I recently noticed a bug after upgrading grep and have tracked it
through a few versions now.
I was using grep -P (PCRE grep) in some scripts to grep through
directory of files, and the process would keep aborting with a
segmentation fault.
The last known good version is grep-2.14. Every version after that has
failed in a slightly different way, making me think this could be a bug
in grep, not in pcre.
I tried compiling greps 2.14 through 2.18 against the latest pcre
library, pcre-8.33. Here's what happens when i try each version against
a random binary file, attached to this message as test-image.png. This
file was just one of many that caused the errors, though not every
binary file does.
Below are some results demonstrating what's going wrong. Note that all
of these seem to work fine with regular grep or with grep -E. Please
let me know what else i can do to help track this down!
# grep-2.14/src/grep -P '\[.?max' test-image.png
(works, does not match)
# grep-2.15/src/grep -P '\[.?max' test-image.png
Aborted
# grep-2.16/src/grep -P '\[.?max' test-image.png
Binary file test-image.png matches
(erroneous - should not match)
# grep-2.16/src/grep -P '.?max' test-image.png
Segmentation fault
# grep-2.17/src/grep -P '\[.?max' test-image.png
Segmentation fault
# grep-2.18/src/grep -P '\[.?max' test-image.png
Segmentation fault
# grep-2.18/src/grep -P '.?ma' test-image.png
Segmentation fault
# grep-2.18/src/grep -P '.?m' test-image.png
Binary file test-image.png matches
-damon
[test-image.png (image/png, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17245
; Package
grep
.
(Sat, 12 Apr 2014 16:17:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 17245 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
This bug is similar to bug#16586.
It seems that the pointer `eptr' for a current position in a text
exceeded the starting position in backword searching. I seem that PCRE
library may assume that a text doesn't have invalid sequence in UTF-8.
Could you re-try in them non-UTF8 locales?
Norihiro
[backtrace.log (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17245
; Package
grep
.
(Sat, 12 Apr 2014 16:24:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 17245 <at> debbugs.gnu.org (full text, mbox):
Hi there -
Bingo, that does it. I have LANG set to en_CA.utf-8. If i run:
LANG=en_CA grep-2.18/src/grep -P '\[.?max' test-image.png
It works fine, reporting no match.
Same for every other version i have compiled.
So definitely utf-8 related. Let me know if i can provide anything
else.
-damon
On 13 Apr, Norihiro Tanaka wrote:
> This bug is similar to bug#16586.
>
> It seems that the pointer `eptr' for a current position in a text
> exceeded the starting position in backword searching. I seem that PCRE
> library may assume that a text doesn't have invalid sequence in UTF-8.
>
> Could you re-try in them non-UTF8 locales?
>
> Norihiro
> $ gdb src/grep core.1430
> GNU gdb (GDB) 7.6.2
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "i386-pc-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /home/staff/b/grep-2.18/src/grep...done.
> [New LWP 1430]
>
> warning: Can't read pathname for load map: Input/output error.
> Core was generated by `src/grep -P .?ma test-image.png'.
> Program terminated with signal 11, Segmentation fault.
> #0 0x001612ca in match (eptr=0x9a24fff <Address 0x9a24fff out of bounds>,
> ecode=0x9a25e65 "\035m\035ax",
> mstart=0x9a26e9d
> "\272\374;\017\233\323\230:\364\005+\373a&\367\032X\304\216
> \342y\274\301\357\361\005",
> offset_top=2, md=0xbfe18a64, eptrb=0x0, rdepth=0) at pcre_exec.c:5943
> 5943 BACKCHAR(eptr);
> (gdb) bt
> #0 0x001612ca in match (eptr=0x9a24fff <Address 0x9a24fff out of bounds>,
> ecode=0x9a25e65 "\035m\035ax",
> mstart=0x9a26e9d
> "\272\374;\017\233\323\230:\364\005+\373a&\367\032X\304\216
> \342y\274\301\357\361\005",
> offset_top=2, md=0xbfe18a64, eptrb=0x0, rdepth=0) at pcre_exec.c:5943
> #1 0x0016308a in pcre_exec (argument_re=0x9a25e28, extra_data=0x9a25e78,
> subject=0x9a26e9d
> "\272\374;\017\233\323\230:\364\005+\373a&\367\032X\304\216
> \342y\274\301\357\361\005",
> length=101, start_offset=0, options=8192, offsets=0xbfe18bdc,
> offsetcount=300) at pcre_exec.c:6941
> #2 0x0805a472 in Pexecute (buf=0x9a26000 "\211PNG\r\n\032\n", size=6568,
> match_size=0xbfe19114, start_ptr=0x0)
> at pcresearch.c:174
> #3 0x0804ba07 in do_execute (buf=0x9a26000 "\211PNG\r\n\032\n", size=6568,
> match_size=0xbfe19114, start_ptr=0x0)
> at grep.c:1073
> #4 0x0804bc98 in grepbuf (beg=0x9a26000 "\211PNG\r\n\032\n",
> lim=0x9a279a8
> "\217\222(\016\001c\025R\221c\233S\250\327\177m\002\344Q\022\362$\320\066\3
> 76\327\245{\f\035D\001\260\251\326a\247{T\200_\bj8\274") at grep.c:1109
> #5 0x0804bfb3 in grep (fd=3, st=0xbfe19200) at grep.c:1220
> #6 0x0804c9ab in grepdesc (desc=3, command_line=1) at grep.c:1474
> #7 0x0804c650 in grepfile (dirdesc=-100, name=0xbfe19889 "test-image.png",
> follow=1, command_line=1) at grep.c:1375
> #8 0x0804cc22 in grep_command_line_arg (arg=0xbfe19889 "test-image.png") at
> grep.c:1526
> #9 0x0804e358 in main (argc=4, argv=0xbfe194a4) at grep.c:2362
--
Damon Harper _/\_ Nothing is as simple as it seems at
damon <at> usrbin.ca __\ /__ first, as hopeless as it seems in
\ / the middle, or as finished as it
www.usrbin.ca/damon |/||\| seems in the end.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17245
; Package
grep
.
(Sun, 13 Apr 2014 19:14:03 GMT)
Full text and
rfc822 format available.
Message #14 received at 17245 <at> debbugs.gnu.org (full text, mbox):
On Fri, Apr 11, 2014 at 4:47 PM, damon <dh <at> bug-grep.usrbin.org> wrote:
> Hi there -
>
> I recently noticed a bug after upgrading grep and have tracked it
> through a few versions now.
>
> I was using grep -P (PCRE grep) in some scripts to grep through
> directory of files, and the process would keep aborting with a
> segmentation fault.
>
> The last known good version is grep-2.14. Every version after that has
> failed in a slightly different way, making me think this could be a bug
> in grep, not in pcre.
>
> I tried compiling greps 2.14 through 2.18 against the latest pcre
> library, pcre-8.33. Here's what happens when i try each version against
> a random binary file, attached to this message as test-image.png. This
> file was just one of many that caused the errors, though not every
> binary file does.
>
> Below are some results demonstrating what's going wrong. Note that all
> of these seem to work fine with regular grep or with grep -E. Please
> let me know what else i can do to help track this down!
>
> # grep-2.14/src/grep -P '\[.?max' test-image.png
> (works, does not match)
...
> # grep-2.18/src/grep -P '\[.?max' test-image.png
> Segmentation fault
>
> # grep-2.18/src/grep -P '.?ma' test-image.png
> Segmentation fault
>
> # grep-2.18/src/grep -P '.?m' test-image.png
> Binary file test-image.png matches
Thank you for the bug report.
That is due to a bug in libpcre. I've confirmed that it is still
triggered even when using the latest grep.git linked with
the latest from pcre.git (latest commit has "Final tidies for
8.35 release." as the subject). I built grep as usual, and
then ran this:
rm src/grep; make LIB_PCRE=$PWD/../pcre/.libs/libpcre.a
Confirm that grep is not using a shared libpcre (this must print nothing):
ldd src/grep|grep pcre
That presumes I had already built the latest pcre/ in ../pcre.
Then, run this to test it with a non-UTF8 locale, and it is
error-free, correctly finding no match:
LC_ALL=ja_JP.eucJP valgrind src/grep -P '\[.?max' test-image.png
Repeat using a UTF8 locale, and you see that valgrind reports
numerous buffer overrun and heap-use-after-free errors:
LC_ALL=en_US.utf8 valgrind src/grep -P '\[.?max' test-image.png
Here is an equivalent but much smaller test case:
$ printf 'a\201b\r'|LC_ALL=en_US.utf8 valgrind src/grep -P 'a.?XXb'
That segfaults. Interestingly, if I replace each X with a ".",
grep gets into an infinite loop within libpcre's match function.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17245
; Package
grep
.
(Sun, 13 Apr 2014 23:18:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 17245 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Sun, Apr 13, 2014 at 12:13 PM, Jim Meyering <jim <at> meyering.net> wrote:
> On Fri, Apr 11, 2014 at 4:47 PM, damon <dh <at> bug-grep.usrbin.org> wrote:
>> Hi there -
>>
>> I recently noticed a bug after upgrading grep and have tracked it
>> through a few versions now.
>>
>> I was using grep -P (PCRE grep) in some scripts to grep through
>> directory of files, and the process would keep aborting with a
>> segmentation fault.
>>
>> The last known good version is grep-2.14. Every version after that has
>> failed in a slightly different way, making me think this could be a bug
>> in grep, not in pcre.
>>
>> I tried compiling greps 2.14 through 2.18 against the latest pcre
>> library, pcre-8.33. Here's what happens when i try each version against
>> a random binary file, attached to this message as test-image.png. This
>> file was just one of many that caused the errors, though not every
>> binary file does.
>>
>> Below are some results demonstrating what's going wrong. Note that all
>> of these seem to work fine with regular grep or with grep -E. Please
>> let me know what else i can do to help track this down!
>>
>> # grep-2.14/src/grep -P '\[.?max' test-image.png
>> (works, does not match)
> ...
>> # grep-2.18/src/grep -P '\[.?max' test-image.png
>> Segmentation fault
>>
>> # grep-2.18/src/grep -P '.?ma' test-image.png
>> Segmentation fault
>>
>> # grep-2.18/src/grep -P '.?m' test-image.png
>> Binary file test-image.png matches
>
> Thank you for the bug report.
> That is due to a bug in libpcre. I've confirmed that it is still
> triggered even when using the latest grep.git linked with
> the latest from pcre.git (latest commit has "Final tidies for
> 8.35 release." as the subject). I built grep as usual, and
> then ran this:
>
> rm src/grep; make LIB_PCRE=$PWD/../pcre/.libs/libpcre.a
>
> Confirm that grep is not using a shared libpcre (this must print nothing):
>
> ldd src/grep|grep pcre
>
> That presumes I had already built the latest pcre/ in ../pcre.
> Then, run this to test it with a non-UTF8 locale, and it is
> error-free, correctly finding no match:
>
> LC_ALL=ja_JP.eucJP valgrind src/grep -P '\[.?max' test-image.png
>
> Repeat using a UTF8 locale, and you see that valgrind reports
> numerous buffer overrun and heap-use-after-free errors:
>
> LC_ALL=en_US.utf8 valgrind src/grep -P '\[.?max' test-image.png
>
> Here is an equivalent but much smaller test case:
>
> $ printf 'a\201b\r'|LC_ALL=en_US.utf8 valgrind src/grep -P 'a.?XXb'
>
> That segfaults. Interestingly, if I replace each X with a ".",
> grep gets into an infinite loop within libpcre's match function.
FYI, I'm pushing the attached patch, to add a test for this.
It fails with the latest pcre from git (8.35), but passes with debian
unstable's libpcre3 8.31-3:
[k.txt (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17245
; Package
grep
.
(Tue, 15 Apr 2014 23:49:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 17245 <at> debbugs.gnu.org (full text, mbox):
I confirmed that this bug is also avoided by re-compiling PCRE with
--enable-git option.
PCRE without --enable-git:
$ env LC_ALL=en_US.utf8 src/grep -P '.?ma' test-image.png
Segmentation fault (core dumped)
PCRE with --enable-git:
$ env LC_ALL=en_US.utf8 src/grep -P '.?ma' test-image.png
Binary file ../test-image.png matches
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17245
; Package
grep
.
(Tue, 15 Apr 2014 23:59:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 17245 <at> debbugs.gnu.org (full text, mbox):
Norihiro Tanaka wrote:
> I confirmed that this bug is also avoided by re-compiling PCRE with
> --enable-git option.
Sorry, what's --enable-git?
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17245
; Package
grep
.
(Wed, 16 Apr 2014 00:04:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 17245 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Tue, Apr 15, 2014 at 4:48 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> I confirmed that this bug is also avoided by re-compiling PCRE with
> --enable-git option.
>
> PCRE without --enable-git:
> $ env LC_ALL=en_US.utf8 src/grep -P '.?ma' test-image.png
> Segmentation fault (core dumped)
>
> PCRE with --enable-git:
> $ env LC_ALL=en_US.utf8 src/grep -P '.?ma' test-image.png
> Binary file ../test-image.png matches
Thank you.
I presume you meant --enable-jit.
However, even when building the latest pcre like this:
./configure --enable-unicode-properties --enable-utf8 --enable-jit && make
and linking grep with its resulting .a file, my new pcre-infloop test
still failed.
However, with the attached patch to pcre, it passes:
[k.txt (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17245
; Package
grep
.
(Wed, 16 Apr 2014 12:14:01 GMT)
Full text and
rfc822 format available.
Message #29 received at 17245 <at> debbugs.gnu.org (full text, mbox):
Jim Meyering wrote:
> I presume you meant --enable-jit.
Sorry, you are right. It's --enable-jit.
I reported it to PCRE project.
http://bugs.exim.org/show_bug.cgi?id=1468
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Mon, 21 Apr 2014 18:04:04 GMT)
Full text and
rfc822 format available.
Notification sent
to
damon <dh <at> bug-grep.usrbin.org>
:
bug acknowledged by developer.
(Mon, 21 Apr 2014 18:04:05 GMT)
Full text and
rfc822 format available.
Message #34 received at 17245-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 04/16/2014 05:13 AM, Norihiro Tanaka wrote:
> http://bugs.exim.org/show_bug.cgi?id=1468
Thanks. The response there makes it clear that if grep passes arbitrary
binary data to PCRE, and if grep uses PCRE_NO_UTF8_CHECK, undefined
behavior will result (maybe infinite loop, core dump, etc.). We can't
have undefined behavior in grep. A simple fix is to avoid using
PCRE_NO_UTF8_CHECK so I installed the attached patch to do that.
Perhaps we can think of a better way at some point. In the meantime I'm
taking the liberty of closing Bug#17245 and Bug#16586.
[0001-grep-P-now-rejects-invalid-input-sequences-in-UTF-8-.patch (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17245
; Package
grep
.
(Mon, 21 Apr 2014 22:09:01 GMT)
Full text and
rfc822 format available.
Message #37 received at 17245-done <at> debbugs.gnu.org (full text, mbox):
Paul Eggert wrote:
fix is to avoid using PCRE_NO_UTF8_CHECK.
Thanks. I also agree with your thoughts.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17245
; Package
grep
.
(Thu, 24 Apr 2014 02:32:03 GMT)
Full text and
rfc822 format available.
Message #40 received at 17245-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Mon, Apr 21, 2014 at 11:03 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 04/16/2014 05:13 AM, Norihiro Tanaka wrote:
>>
>> http://bugs.exim.org/show_bug.cgi?id=1468
>
>
> Thanks. The response there makes it clear that if grep passes arbitrary
> binary data to PCRE, and if grep uses PCRE_NO_UTF8_CHECK, undefined behavior
> will result (maybe infinite loop, core dump, etc.). We can't have undefined
> behavior in grep. A simple fix is to avoid using PCRE_NO_UTF8_CHECK so I
> installed the attached patch to do that. Perhaps we can think of a better
> way at some point. In the meantime I'm taking the liberty of closing
> Bug#17245 and Bug#16586.
Thanks for the patch, but I'm not sure I like the consequences:
that anyone using grep -P to search data that is even a tiny bit
inconsistent with their UTF-8 locale will now get an exit status of
2 rather than the matches they used to get. I would prefer to test for
working PCRE support and disable -P if it is deemed inadequate,
but that may have to wait for the release of a new version of
libpcre.
In any case, I found that this additional change is required,
at least on OS/X, to avoid a test failure:
[k.txt (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17245
; Package
grep
.
(Thu, 24 Apr 2014 05:40:03 GMT)
Full text and
rfc822 format available.
Message #43 received at 17245 <at> debbugs.gnu.org (full text, mbox):
Jim Meyering wrote:
> anyone using grep -P to search data that is even a tiny bit
> inconsistent with their UTF-8 locale will now get an exit status of
> 2 rather than the matches they used to get.
Yes, I don't like that either, but <http://bugs.exim.org/1468> says
libpcre intends to have undefined behavior here. If so, it wouldn't
help to wait until the next libprce release, which may well have a
serious bug of this form in a different area, a bug that's not easy to
test for.
Perhaps somebody should modify grep -P to discard input lines containing
non-UTF-8 data instead of presenting them to libprce. That way, it
would be safe for grep -P to use PCRE_NO_UTF8_CHECK. Although grep -P
should report an error and exit with status 2 if it discards input due
to encoding errors, it can also report matches in lines that do not
contain encoding errors, so that users can see both the error messages
and the matches.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#17245
; Package
grep
.
(Thu, 24 Apr 2014 15:30:03 GMT)
Full text and
rfc822 format available.
Message #46 received at 17245 <at> debbugs.gnu.org (full text, mbox):
On Wed, Apr 23, 2014 at 10:39 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Jim Meyering wrote:
>>
>> anyone using grep -P to search data that is even a tiny bit
>> inconsistent with their UTF-8 locale will now get an exit status of
>> 2 rather than the matches they used to get.
>
>
> Yes, I don't like that either, but <http://bugs.exim.org/1468> says libpcre
Oh! I had not read that. That is disappointing.
> intends to have undefined behavior here. If so, it wouldn't help to wait
> until the next libprce release, which may well have a serious bug of this
> form in a different area, a bug that's not easy to test for.
Indeed.
> Perhaps somebody should modify grep -P to discard input lines containing
> non-UTF-8 data instead of presenting them to libprce. That way, it would be
> safe for grep -P to use PCRE_NO_UTF8_CHECK. Although grep -P should report
> an error and exit with status 2 if it discards input due to encoding errors,
> it can also report matches in lines that do not contain encoding errors, so
> that users can see both the error messages and the matches.
That sounds reasonable, but I don't like the requirement that
one make two passes over each subject text.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 23 May 2014 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 10 years and 352 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.