GNU bug report logs -
#20974
Weird newline matching behaviour in --null-data mode
Previous Next
Reported by: Balazs Kezes <rlblaster <at> gmail.com>
Date: Fri, 3 Jul 2015 17:00:07 UTC
Severity: normal
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 20974 in the body.
You can then email your comments to 20974 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#20974
; Package
grep
.
(Fri, 03 Jul 2015 17:00:07 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Balazs Kezes <rlblaster <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Fri, 03 Jul 2015 17:00:08 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello!
I'm running into issues with grep in -z mode. I've managed to minimize
it into this:
$ seq 2 | grep --null-data --quiet '[12].2' ; echo $?
0
$ seq 2 | grep --null-data --quiet '[1-2].2' ; echo $?
1
I'd expect the two expressions to mean the same. I've tried this with
the latest version built from the official sources, 2.21. I've also
found [1] which might be related but it wasn't updated for almost 2
years. Or is this expected?
Thanks!
[1] http://savannah.gnu.org/bugs/?40009
--
Balazs
Information forwarded
to
bug-grep <at> gnu.org
:
bug#20974
; Package
grep
.
(Sat, 04 Jul 2015 00:37:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 20974 <at> debbugs.gnu.org (full text, mbox):
On Fri, 3 Jul 2015 17:59:19 +0100
Balazs Kezes <rlblaster <at> gmail.com> wrote:
> I'm running into issues with grep in -z mode. I've managed to minimize
> it into this:
>
> $ seq 2 | grep --null-data --quiet '[12].2' ; echo $?
> 0
> $ seq 2 | grep --null-data --quiet '[1-2].2' ; echo $?
> 1
>
> I'd expect the two expressions to mean the same. I've tried this with
> the latest version built from the official sources, 2.21. I've also
> found [1] which might be related but it wasn't updated for almost 2
> years. Or is this expected?
$ seq 2 | env LC_ALL=C grep --null-data --quiet '[12].2' ; echo $?
0
$ seq 2 | env LC_ALL=C grep --null-data --quiet '[1-2].2' ; echo $?
0
$ seq 2 | env LC_ALL=en_US.iso88591 grep --null-data --quiet '[12].2' ; echo $?
0
$ seq 2 | env LC_ALL=en_US.iso88591 grep --null-data --quiet '[1-2].2' ; echo $?
1
grep depends on regex for only last case to support collating element,
but regex is not support to substitute NUL for LF as newline character
with --null-data.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#20974
; Package
grep
.
(Sat, 04 Jul 2015 03:04:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 20974 <at> debbugs.gnu.org (full text, mbox):
On Fri, Jul 3, 2015 at 9:59 AM, Balazs Kezes <rlblaster <at> gmail.com> wrote:
> Hello!
>
> I'm running into issues with grep in -z mode. I've managed to minimize
> it into this:
>
> $ seq 2 | grep --null-data --quiet '[12].2' ; echo $?
> 0
> $ seq 2 | grep --null-data --quiet '[1-2].2' ; echo $?
> 1
Thank you for the report.
I too would like those two commands to work the same way.
The problem is that when the regular expression contains a
bracket expression with a range, grep switches from using
its DFA matcher to relying on regex, but as Norihiro Tanaka
mentioned, grep's use of the regex matcher with the
--null-data (-z) option cannot match multi-line results.
One can demonstrate the problem in the C locale too,
by using a back-reference, since that construct also causes
grep to use regex:
$ printf '1\n1\n' |LC_ALL=en_US.UTF-8 src/grep -Ezq '1.1'
$ printf '1\n1\n' |LC_ALL=en_US.UTF-8 src/grep -Ezq '(1).\1'
[Exit 1]
$ printf '1\n1\n' |LC_ALL=C src/grep -Ezq '(1).\1'
[Exit 1]
It'd be great to fix this, but it is not on my short-term radar,
though I will add some expected-to-fail tests.
Reply sent
to
Jim Meyering <jim <at> meyering.net>
:
You have taken responsibility.
(Sat, 04 Jul 2015 03:11:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Balazs Kezes <rlblaster <at> gmail.com>
:
bug acknowledged by developer.
(Sat, 04 Jul 2015 03:11:03 GMT)
Full text and
rfc822 format available.
Message #16 received at 20974-done <at> debbugs.gnu.org (full text, mbox):
On Fri, Jul 3, 2015 at 8:03 PM, Jim Meyering <jim <at> meyering.net> wrote:
> On Fri, Jul 3, 2015 at 9:59 AM, Balazs Kezes <rlblaster <at> gmail.com> wrote:
>> Hello!
>>
>> I'm running into issues with grep in -z mode. I've managed to minimize
>> it into this:
>>
>> $ seq 2 | grep --null-data --quiet '[12].2' ; echo $?
>> 0
>> $ seq 2 | grep --null-data --quiet '[1-2].2' ; echo $?
>> 1
>
> Thank you for the report.
> I too would like those two commands to work the same way.
> The problem is that when the regular expression contains a
> bracket expression with a range, grep switches from using
> its DFA matcher to relying on regex, but as Norihiro Tanaka
> mentioned, grep's use of the regex matcher with the
> --null-data (-z) option cannot match multi-line results.
>
> One can demonstrate the problem in the C locale too,
> by using a back-reference, since that construct also causes
> grep to use regex:
>
> $ printf '1\n1\n' |LC_ALL=en_US.UTF-8 src/grep -Ezq '1.1'
> $ printf '1\n1\n' |LC_ALL=en_US.UTF-8 src/grep -Ezq '(1).\1'
> [Exit 1]
> $ printf '1\n1\n' |LC_ALL=C src/grep -Ezq '(1).\1'
> [Exit 1]
>
> It'd be great to fix this, but it is not on my short-term radar,
> though I will add some expected-to-fail tests.
Oh, nice! I see that Paul Eggert has just fixed this with
the following patch:
http://git.sv.gnu.org/cgit/grep.git/commit/?id=0e8fda0d880cccd0
So I'm closing this ticket.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#20974
; Package
grep
.
(Sat, 04 Jul 2015 04:42:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 20974 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Fri, 3 Jul 2015 20:10:08 -0700
Jim Meyering <jim <at> meyering.net> wrote:
> Oh, nice! I see that Paul Eggert has just fixed this with
> the following patch:
> http://git.sv.gnu.org/cgit/grep.git/commit/?id=0e8fda0d880cccd0
>
> So I'm closing this ticket.
>
Paul's fix is very nice, I could not found it.
However, following case is not fixed yet. Not only '.' but also hat
list (e.g. [^a]) should match newline with -z. So we need clear
RE_HAT_LISTS_NOT_NEWLINE bit.
$ seq 2 | LC_ALL=C grep --null-data '[1-2][^a][1-2]'
1
2
$ seq 2 | LC_ALL=en_US.iso88591 grep --null-data '[1-2][^a][1-2]'
[0001-grep-z-a-now-consistently-matches-newline.patch (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#20974
; Package
grep
.
(Sat, 04 Jul 2015 15:51:02 GMT)
Full text and
rfc822 format available.
Message #22 received at 20974 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Norihiro Tanaka wrote:
> Not only '.' but also hat
> list (e.g. [^a]) should match newline with -z. So we need clear
> RE_HAT_LISTS_NOT_NEWLINE bit.
Thanks for reporting that. I also noticed some related bugs in dfa.c that
'grep' does not exercise (so no grep test cases, alas). Plus, it's long been
time that we fix RE_SYNTAX_GREP and RE_SYNTAX_EGREP to match grep's actual
behavior. So I installed a Gnulib patch to update RE_SYNTAX_GREP and
RE_SYNTAX_EGREP to the fixed behavior (see
<http://lists.gnu.org/archive/html/bug-gnulib/2015-07/msg00016.html>) and
installed grep patches to sync to gnulib and fix the other problems.
The first attached patch I installed yesterday (and you've commented on it) but
I didn't have time to send email about it so am attaching it now. The other
five attached patches fix the bugs noted above.
Here's the justification for the first attached patch. The grep documentation
says that '.' matches any character, and this includes both NUL and LF.
Ordinarily, LF terminates a line and so is never part of match data, but '.'
should still match NUL. Conversely with -z, NUL terminates a line and so is
never part of match data, but '.' should still match LF.
[0001-grep-z-.-now-consistently-matches-newline.patch (text/x-diff, attachment)]
[0002-grep-z-x-now-consistently-matches-newline.patch (text/x-diff, attachment)]
[0003-dfa-.-and-x-now-consistently-match-newline.patch (text/x-diff, attachment)]
[0004-build-update-gnulib-submodule-to-latest.patch (text/x-diff, attachment)]
[0005-maint-ignore-gendocs_template_min.patch (text/x-diff, attachment)]
[0006-grep-use-recent-gnulib-syntax-bits.patch (text/x-diff, attachment)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 02 Aug 2015 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 9 years and 292 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.