GNU bug report logs -
#22181
endless loop in grep 2.22
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22181 in the body.
You can then email your comments to 22181 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#22181
; Package
grep
.
(Tue, 15 Dec 2015 20:47:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Christian Boltz <grep-bug <at> cboltz.de>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Tue, 15 Dec 2015 20:47:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
I hit an endless loop in grep 2.22. I can reproduce it with
# grep -obUa -P '\x04\x08\x00profile\x00\x07' cache--usr.sbin.smbldap-useradd
16profile
27801profile
27801profile
27801profile
27801profile
27801profile
27801profile
27801profile
27801profile
27801profile
27801profile
27801profile
27801profile
27801profile
[...]
I get this line over and over (some minutes long) - but for testing,
you might want to use ... | head -n50 to avoid heating your office
using your computer ;-)
The file needed for the reproducer is attached.
To make sure you have an unmodified copy - its sha256sum is
89f458796dcb1cdcaec534fec84c6c3440844dbd6dc014e51a5d74e9800c2aab
I have more files that can reproduce the endless loop - basically it
looks like lots of (or all?) AppArmor cache files of profiles that
contain subprofiles or hats trigger this. OTOH, cache files from single
profiles don't trigger the endless loop.
As the subject says, I'm using grep 2.22 on openSUSE Tumbleweed.
This bug seems to be a regression. I wasn't able to reproduce this bug
with grep 2.14, and sarnold on #apparmor also couldn't reproduce it with
grep 2.21 on Ubuntu. OTOH, he could reproduce the endless loop with
grep 2.22 on Ubuntu.
I also downloaded and compiled the grep 2.21 and 2.22 tarballs.
Result (not too surprising):
- 2.21 works as expected
- 2.22 enters an endless loop
-> This is clearly a regression between 2.21 and 2.22.
For comparison: The expected output (with grep 2.21) is:
#2.21# ./grep -obUa -P '\x04\x08\x00profile\x00\x07' cache--usr.sbin.smbldap-useradd
16profile
27801profile
Regards,
Christian Boltz
PS: usually I use a random signature, but I'll use a hand-picked quote
for this mail ;-)
--
<sarnold> I don't know how cboltz survives, everything he touches
breaks into several pieces .. I fear for his car.. [from #apparmor]
[cache--usr.sbin.smbldap-useradd (application/octet-stream, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#22181
; Package
grep
.
(Fri, 18 Dec 2015 20:53:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 22181 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Tue, Dec 15, 2015 at 12:20 PM, Christian Boltz <grep-bug <at> cboltz.de> wrote:
> Hello,
>
> I hit an endless loop in grep 2.22. I can reproduce it with
>
> # grep -obUa -P '\x04\x08\x00profile\x00\x07' cache--usr.sbin.smbldap-useradd
> 16profile
> 27801profile
> 27801profile
> 27801profile
> 27801profile
> 27801profile
> 27801profile
> 27801profile
> 27801profile
> 27801profile
> 27801profile
> 27801profile
> 27801profile
> 27801profile
> [...]
>
> I get this line over and over (some minutes long) - but for testing,
> you might want to use ... | head -n50 to avoid heating your office
> using your computer ;-)
>
> The file needed for the reproducer is attached.
> To make sure you have an unmodified copy - its sha256sum is
> 89f458796dcb1cdcaec534fec84c6c3440844dbd6dc014e51a5d74e9800c2aab
>
> I have more files that can reproduce the endless loop - basically it
> looks like lots of (or all?) AppArmor cache files of profiles that
> contain subprofiles or hats trigger this. OTOH, cache files from single
> profiles don't trigger the endless loop.
>
> As the subject says, I'm using grep 2.22 on openSUSE Tumbleweed.
>
> This bug seems to be a regression. I wasn't able to reproduce this bug
> with grep 2.14, and sarnold on #apparmor also couldn't reproduce it with
> grep 2.21 on Ubuntu. OTOH, he could reproduce the endless loop with
> grep 2.22 on Ubuntu.
>
> I also downloaded and compiled the grep 2.21 and 2.22 tarballs.
> Result (not too surprising):
> - 2.21 works as expected
> - 2.22 enters an endless loop
>
> -> This is clearly a regression between 2.21 and 2.22.
>
>
> For comparison: The expected output (with grep 2.21) is:
>
> #2.21# ./grep -obUa -P '\x04\x08\x00profile\x00\x07' cache--usr.sbin.smbldap-useradd
> 16profile
> 27801profile
Thank you for the report. That is indeed a bug in the latest.
Here's a small reproducer:
printf '\201ab\0'|LC_ALL=en_US.utf8 grep -oa -P ab
And here is the patch that will form the basis of a complete fix:
[infloop.patch (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#22181
; Package
grep
.
(Fri, 18 Dec 2015 21:52:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 22181 <at> debbugs.gnu.org (full text, mbox):
On 12/18/2015 12:52 PM, Jim Meyering wrote:
> And here is the patch that will form the basis of a complete fix:
Thanks for looking into that; you beat me to it!
POSIX says grep has undefined behavior when given an encoding error, and
looping forever sure fills the bill :-).
Reply sent
to
Jim Meyering <jim <at> meyering.net>
:
You have taken responsibility.
(Sat, 19 Dec 2015 06:25:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Christian Boltz <grep-bug <at> cboltz.de>
:
bug acknowledged by developer.
(Sat, 19 Dec 2015 06:25:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 22181-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Fri, Dec 18, 2015 at 1:50 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 12/18/2015 12:52 PM, Jim Meyering wrote:
>>
>> And here is the patch that will form the basis of a complete fix:
>
> Thanks for looking into that; you beat me to it!
>
> POSIX says grep has undefined behavior when given an encoding error, and
> looping forever sure fills the bill :-).
:-)
Here is the patch I expect to push tomorrow. I am using
the occasion of this reply to close the bug report by inserting
"-done" in the bug email address. Any reply will still go both to
the mailing list and to the bug-tracking system.
[0001-grep-oP-don-t-infloop-when-processing-invalid-UTF8-p.patch (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#22181
; Package
grep
.
(Thu, 07 Jan 2016 06:47:01 GMT)
Full text and
rfc822 format available.
Message #19 received at 22181 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Thanks to everyone who reported and fixed this bug. I looked over the fix and
this inspired me to improve on it. I installed the attached patch, which doesn't
fix any functionality bugs, but does improve performance significantly in some
cases.
[0001-Improve-on-fix-for-Bug-22181.patch (text/x-diff, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#22181
; Package
grep
.
(Fri, 08 Jan 2016 05:49:02 GMT)
Full text and
rfc822 format available.
Message #22 received at 22181 <at> debbugs.gnu.org (full text, mbox):
On Wed, Jan 6, 2016 at 10:46 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Thanks to everyone who reported and fixed this bug. I looked over the fix
> and this inspired me to improve on it. I installed the attached patch, which
> doesn't fix any functionality bugs, but does improve performance
> significantly in some cases.
Thanks for all your work.
I've just noticed a test failure on debian unstable systems. Doesn't
seem to matter, but will mention I was compiling with this:
gcc version 5.3.1 20160101 (Debian 5.3.1-5)
FAIL: encoding-error
====================
...
--- exp 2016-01-07 21:39:42.018646618 -0800
+++ out 2016-01-07 21:39:42.018646618 -0800
@@ -1 +1 @@
-Binary file in matches
+Pedro P\xe9rez
+ fail=1
Sorry I don't have time to investigate now.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#22181
; Package
grep
.
(Fri, 08 Jan 2016 16:51:02 GMT)
Full text and
rfc822 format available.
Message #25 received at 22181 <at> debbugs.gnu.org (full text, mbox):
On 01/07/2016 09:47 PM, Jim Meyering wrote:
> FAIL: encoding-error
> ====================
> ...
> --- exp 2016-01-07 21:39:42.018646618 -0800
> +++ out 2016-01-07 21:39:42.018646618 -0800
> @@ -1 +1 @@
> -Binary file in matches
> +Pedro P\xe9rez
> + fail=1
I can't reproduce that in Fedora 23 x86-64, which is using gcc 5.3.1
20151207 (Red Hat 5.3.1-2).
One hypothetical explanation is a bug or incompatibility in the
bleeding-edge Debian shell, which I suppose could cause
require_en_utf8_locale_ to do the wrong thing (i.e., to fail to report
that the en_US.UTF-8 locale is missing). You might check the output of
the command './get-mb-cur-max en_US.UTF-8' when you have the time.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#22181
; Package
grep
.
(Fri, 08 Jan 2016 21:33:02 GMT)
Full text and
rfc822 format available.
Message #28 received at 22181 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Fri, Jan 8, 2016 at 8:50 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 01/07/2016 09:47 PM, Jim Meyering wrote:
>>
>> FAIL: encoding-error
>> ====================
>> ...
>> --- exp 2016-01-07 21:39:42.018646618 -0800
>> +++ out 2016-01-07 21:39:42.018646618 -0800
>> @@ -1 +1 @@
>> -Binary file in matches
>> +Pedro P\xe9rez
>> + fail=1
>
>
> I can't reproduce that in Fedora 23 x86-64, which is using gcc 5.3.1
> 20151207 (Red Hat 5.3.1-2).
>
> One hypothetical explanation is a bug or incompatibility in the
> bleeding-edge Debian shell, which I suppose could cause
> require_en_utf8_locale_ to do the wrong thing (i.e., to fail to report that
> the en_US.UTF-8 locale is missing). You might check the output of the
> command './get-mb-cur-max en_US.UTF-8' when you have the time.
Will investigate. In the mean time, here's a patch for the
false-positive failure I mentioned:
[0001-mb-non-UTF8-performance-avoid-FP-test-failure-on-fas.patch (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#22181
; Package
grep
.
(Sat, 09 Jan 2016 01:03:01 GMT)
Full text and
rfc822 format available.
Message #31 received at 22181 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Fri, Jan 8, 2016 at 1:32 PM, Jim Meyering <jim <at> meyering.net> wrote:
> On Fri, Jan 8, 2016 at 8:50 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>> On 01/07/2016 09:47 PM, Jim Meyering wrote:
>>>
>>> FAIL: encoding-error
>>> ====================
>>> ...
>>> --- exp 2016-01-07 21:39:42.018646618 -0800
>>> +++ out 2016-01-07 21:39:42.018646618 -0800
>>> @@ -1 +1 @@
>>> -Binary file in matches
>>> +Pedro P\xe9rez
>>> + fail=1
>>
>>
>> I can't reproduce that in Fedora 23 x86-64, which is using gcc 5.3.1
>> 20151207 (Red Hat 5.3.1-2).
>>
>> One hypothetical explanation is a bug or incompatibility in the
>> bleeding-edge Debian shell, which I suppose could cause
>> require_en_utf8_locale_ to do the wrong thing (i.e., to fail to report that
>> the en_US.UTF-8 locale is missing). You might check the output of the
>> command './get-mb-cur-max en_US.UTF-8' when you have the time.
>
> Will investigate. In the mean time, here's a patch for the
> false-positive failure I mentioned:
It was trivial: printf does not necessarily support \xHH hexadecimal
escapes. I switched the input generation to use printf with an octal
escaped byte instead, and now it works.
I've just pushed the attached along with the preceding patch.
[0001-tests-fix-encoding-error-test-failure-to-use-of-prin.patch (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#22181
; Package
grep
.
(Sat, 09 Jan 2016 02:54:01 GMT)
Full text and
rfc822 format available.
Message #34 received at 22181 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Jim Meyering wrote:
> It was trivial: printf does not necessarily support \xHH hexadecimal
> escapes.
Thanks for catching that. I looked and found one other problem of that kind. I
tried running the tests on Solaris and AIX and found a few more porting issues
in the grep tests, and installed the attached.
[0001-tests-port-to-other-POSIXish-platforms.patch (text/x-diff, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#22181
; Package
grep
.
(Sat, 09 Jan 2016 09:22:02 GMT)
Full text and
rfc822 format available.
Message #37 received at 22181 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Fri, Jan 8, 2016 at 6:53 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Jim Meyering wrote:
>>
>> It was trivial: printf does not necessarily support \xHH hexadecimal
>> escapes.
>
> Thanks for catching that. I looked and found one other problem of that kind.
> I tried running the tests on Solaris and AIX and found a few more porting
> issues in the grep tests, and installed the attached.
Hah! TIL head -N and yes are not portable. Thank you.
I've been spoiled/corrupted by writing coreutils tests for so long.
I would prefer to continue to use "yes" via the following, at least
in the first test. That way is clearer. In the second, I could go either
way, since your awk process replaces both yes and head, at the
expense of being a bit less concise and less readable.
What do you think of this patch?
[0001-tests-do-use-yes-but-via-an-AWK-replacement.patch (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#22181
; Package
grep
.
(Sat, 09 Jan 2016 18:21:01 GMT)
Full text and
rfc822 format available.
Message #40 received at 22181 <at> debbugs.gnu.org (full text, mbox):
Jim Meyering wrote:
> Hah! TIL head -N and yes are not portable. Thank you.
> I've been spoiled/corrupted by writing coreutils tests for so long.
>
> I would prefer to continue to use "yes" via the following, at least
> in the first test. That way is clearer. In the second, I could go either
> way, since your awk process replaces both yes and head, at the
> expense of being a bit less concise and less readable.
I could go either way too.
Though it's not needed for these particular tests, the shell function can be
tweaked to default to 'y' and to output quotes and backlashes in the arg as-is,
like BSD 'yes'. Something like this, perhaps?
yes() { line=${*-y} ${AWK-awk} 'BEGIN{for (;;) print ENVIRON["line"]}'; }
Information forwarded
to
bug-grep <at> gnu.org
:
bug#22181
; Package
grep
.
(Sat, 09 Jan 2016 19:28:02 GMT)
Full text and
rfc822 format available.
Message #43 received at 22181 <at> debbugs.gnu.org (full text, mbox):
On Sat, Jan 9, 2016 at 10:20 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Jim Meyering wrote:
>>
>> Hah! TIL head -N and yes are not portable. Thank you.
>> I've been spoiled/corrupted by writing coreutils tests for so long.
>>
>> I would prefer to continue to use "yes" via the following, at least
>> in the first test. That way is clearer. In the second, I could go either
>> way, since your awk process replaces both yes and head, at the
>> expense of being a bit less concise and less readable.
>
>
> I could go either way too.
>
> Though it's not needed for these particular tests, the shell function can be
> tweaked to default to 'y' and to output quotes and backlashes in the arg
> as-is, like BSD 'yes'. Something like this, perhaps?
>
> yes() { line=${*-y} ${AWK-awk} 'BEGIN{for (;;) print ENVIRON["line"]}'; }
Indeed, I thought of quotes and backslashes a little too late.
Nice hack. I will use that, probably with an added "local ",
since init.sh ensures that the test-run shell supports that.
Hmm... I see that gnulib's init.sh has a stray (new) use
of local. Will remove.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#22181
; Package
grep
.
(Sat, 09 Jan 2016 19:35:01 GMT)
Full text and
rfc822 format available.
Message #46 received at 22181 <at> debbugs.gnu.org (full text, mbox):
On Sat, Jan 9, 2016 at 11:27 AM, Jim Meyering <jim <at> meyering.net> wrote:
> On Sat, Jan 9, 2016 at 10:20 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>> Jim Meyering wrote:
>>>
>>> Hah! TIL head -N and yes are not portable. Thank you.
>>> I've been spoiled/corrupted by writing coreutils tests for so long.
>>>
>>> I would prefer to continue to use "yes" via the following, at least
>>> in the first test. That way is clearer. In the second, I could go either
>>> way, since your awk process replaces both yes and head, at the
>>> expense of being a bit less concise and less readable.
>>
>>
>> I could go either way too.
>>
>> Though it's not needed for these particular tests, the shell function can be
>> tweaked to default to 'y' and to output quotes and backlashes in the arg
>> as-is, like BSD 'yes'. Something like this, perhaps?
>>
>> yes() { line=${*-y} ${AWK-awk} 'BEGIN{for (;;) print ENVIRON["line"]}'; }
>
> Indeed, I thought of quotes and backslashes a little too late.
> Nice hack. I will use that, probably with an added "local ",
> since init.sh ensures that the test-run shell supports that.
> Hmm... I see that gnulib's init.sh has a stray (new) use
> of local. Will remove.
I will *not* be adding a "local " prefix. Not required. That "line" is
an envvar, so just fine as-is.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 07 Feb 2016 12:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 9 years and 76 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.