GNU bug report logs -
#34951
[PATCH] grep: a kwset matcher not work in a grep matcher
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Sat, 23 Mar 2019 02:30:02 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 34951 in the body.
You can then email your comments to 34951 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Sat, 23 Mar 2019 02:30:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Sat, 23 Mar 2019 02:30:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
A kwset matcher is not built in a grep matcher after token re-order is
introduced in commit 5c7a0371823876cca7a1347fa09ca26bbbff0c98 in dfa.
It caused performance degradation in some typical cases. This bug is
introduced in grep-3.2.
DFAMUST() does not work if tokens which are parsed in dfa matcher are
re-ordered. Therefore, change as it is called after parse and before
tokens re-order.
BTW, this change does not affect programs that do not use DFAMUST(),
such as sed or gawk.
$ yes $(printf '%040d' 0) | head -10000000 >inp
$ grep-2.2/src/grep 01.2 inp
real 1.61
user 1.53
sys 0.07
$ grep-2.3/src/grep 01.2 inp
real 1.57
user 1.48
sys 0.08
$ grep-2.4/src/grep 01.2 inp
real 1.50
user 1.44
sys 0.05
$ grep-2.4.1/src/grep 01.2 inp
real 1.53
user 1.48
sys 0.05
$ grep-2.4.2/src/grep 01.2 inp
real 1.52
user 1.47
sys 0.04
$ grep-2.5.4/src/grep 01.2 inp
real 1.53
user 1.47
sys 0.05
$ grep-2.6/src/grep 01.2 inp
real 1.51
user 1.47
sys 0.04
$ grep-2.6.1/src/grep 01.2 inp
real 1.50
user 1.44
sys 0.05
$ grep-2.6.2/src/grep 01.2 inp
real 1.52
user 1.46
sys 0.05
$ grep-2.6.3/src/grep 01.2 inp
real 1.52
user 1.47
sys 0.05
$ grep-2.7/src/grep 01.2 inp
real 1.53
user 1.49
sys 0.04
$ grep-2.8/src/grep 01.2 inp
real 1.52
user 1.46
sys 0.05
$ grep-2.9/src/grep 01.2 inp
real 1.54
user 1.50
sys 0.04
$ grep-2.10/src/grep 01.2 inp
real 1.51
user 1.46
sys 0.05
$ grep-2.11/src/grep 01.2 inp
real 1.53
user 1.48
sys 0.05
$ grep-2.12/src/grep 01.2 inp
real 1.51
user 1.47
sys 0.03
$ grep-2.13/src/grep 01.2 inp
real 1.52
user 1.47
sys 0.03
$ grep-2.14/src/grep 01.2 inp
real 1.52
user 1.47
sys 0.04
$ grep-2.15/src/grep 01.2 inp
real 1.55
user 1.49
sys 0.05
$ grep-2.16/src/grep 01.2 inp
real 1.53
user 1.48
sys 0.04
$ grep-2.17/src/grep 01.2 inp
real 1.53
user 1.48
sys 0.05
$ grep-2.18/src/grep 01.2 inp
real 1.51
user 1.44
sys 0.06
$ grep-2.19/src/grep 01.2 inp
real 0.06
user 0.02
sys 0.04
$ grep-2.20/src/grep 01.2 inp
real 0.07
user 0.01
sys 0.05
$ grep-2.21/src/grep 01.2 inp
real 0.06
user 0.02
sys 0.04
$ grep-2.22/src/grep 01.2 inp
real 0.06
user 0.01
sys 0.05
$ grep-2.23/src/grep 01.2 inp
real 0.09
user 0.04
sys 0.05
$ grep-2.24/src/grep 01.2 inp
real 0.09
user 0.04
sys 0.04
$ grep-2.25/src/grep 01.2 inp
real 0.09
user 0.05
sys 0.04
$ grep-2.26/src/grep 01.2 inp
real 0.09
user 0.04
sys 0.05
$ grep-2.27/src/grep 01.2 inp
real 0.09
user 0.04
sys 0.04
$ grep-2.28/src/grep 01.2 inp
real 0.09
user 0.04
sys 0.04
$ grep-3.0/src/grep 01.2 inp
real 0.09
user 0.04
sys 0.04
$ grep-3.1/src/grep 01.2 inp
real 0.11
user 0.05
sys 0.06
$ grep-3.2/src/grep 01.2 inp
real 0.37
user 0.32
sys 0.04
$ grep-3.3/src/grep 01.2 inp
real 0.29
user 0.25
sys 0.04
Thanks,
Norihiro
[0001-dfa-separate-parse-and-compile-phase.patch (text/plain, attachment)]
[0001-grep-a-kwset-matcher-not-work-in-a-grep-matcher.patch (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Sat, 23 Mar 2019 02:50:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 34951 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Sat, 23 Mar 2019 08:06:35 +0900
Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> A kwset matcher is not built in a grep matcher after token re-order is
> introduced in commit 5c7a0371823876cca7a1347fa09ca26bbbff0c98 in dfa.
> It caused performance degradation in some typical cases. This bug is
> introduced in grep-3.2.
>
> DFAMUST() does not work if tokens which are parsed in dfa matcher are
> re-ordered. Therefore, change as it is called after parse and before
> tokens re-order.
>
> BTW, this change does not affect programs that do not use DFAMUST(),
> such as sed or gawk.
>
> $ yes $(printf '%040d' 0) | head -10000000 >inp
> $ grep-2.2/src/grep 01.2 inp
> real 1.61
> user 1.53
> sys 0.07
> $ grep-2.3/src/grep 01.2 inp
> real 1.57
> user 1.48
> sys 0.08
> $ grep-2.4/src/grep 01.2 inp
> real 1.50
> user 1.44
> sys 0.05
> $ grep-2.4.1/src/grep 01.2 inp
> real 1.53
> user 1.48
> sys 0.05
> $ grep-2.4.2/src/grep 01.2 inp
> real 1.52
> user 1.47
> sys 0.04
> $ grep-2.5.4/src/grep 01.2 inp
> real 1.53
> user 1.47
> sys 0.05
> $ grep-2.6/src/grep 01.2 inp
> real 1.51
> user 1.47
> sys 0.04
> $ grep-2.6.1/src/grep 01.2 inp
> real 1.50
> user 1.44
> sys 0.05
> $ grep-2.6.2/src/grep 01.2 inp
> real 1.52
> user 1.46
> sys 0.05
> $ grep-2.6.3/src/grep 01.2 inp
> real 1.52
> user 1.47
> sys 0.05
> $ grep-2.7/src/grep 01.2 inp
> real 1.53
> user 1.49
> sys 0.04
> $ grep-2.8/src/grep 01.2 inp
> real 1.52
> user 1.46
> sys 0.05
> $ grep-2.9/src/grep 01.2 inp
> real 1.54
> user 1.50
> sys 0.04
> $ grep-2.10/src/grep 01.2 inp
> real 1.51
> user 1.46
> sys 0.05
> $ grep-2.11/src/grep 01.2 inp
> real 1.53
> user 1.48
> sys 0.05
> $ grep-2.12/src/grep 01.2 inp
> real 1.51
> user 1.47
> sys 0.03
> $ grep-2.13/src/grep 01.2 inp
> real 1.52
> user 1.47
> sys 0.03
> $ grep-2.14/src/grep 01.2 inp
> real 1.52
> user 1.47
> sys 0.04
> $ grep-2.15/src/grep 01.2 inp
> real 1.55
> user 1.49
> sys 0.05
> $ grep-2.16/src/grep 01.2 inp
> real 1.53
> user 1.48
> sys 0.04
> $ grep-2.17/src/grep 01.2 inp
> real 1.53
> user 1.48
> sys 0.05
> $ grep-2.18/src/grep 01.2 inp
> real 1.51
> user 1.44
> sys 0.06
> $ grep-2.19/src/grep 01.2 inp
> real 0.06
> user 0.02
> sys 0.04
> $ grep-2.20/src/grep 01.2 inp
> real 0.07
> user 0.01
> sys 0.05
> $ grep-2.21/src/grep 01.2 inp
> real 0.06
> user 0.02
> sys 0.04
> $ grep-2.22/src/grep 01.2 inp
> real 0.06
> user 0.01
> sys 0.05
> $ grep-2.23/src/grep 01.2 inp
> real 0.09
> user 0.04
> sys 0.05
> $ grep-2.24/src/grep 01.2 inp
> real 0.09
> user 0.04
> sys 0.04
> $ grep-2.25/src/grep 01.2 inp
> real 0.09
> user 0.05
> sys 0.04
> $ grep-2.26/src/grep 01.2 inp
> real 0.09
> user 0.04
> sys 0.05
> $ grep-2.27/src/grep 01.2 inp
> real 0.09
> user 0.04
> sys 0.04
> $ grep-2.28/src/grep 01.2 inp
> real 0.09
> user 0.04
> sys 0.04
> $ grep-3.0/src/grep 01.2 inp
> real 0.09
> user 0.04
> sys 0.04
> $ grep-3.1/src/grep 01.2 inp
> real 0.11
> user 0.05
> sys 0.06
> $ grep-3.2/src/grep 01.2 inp
> real 0.37
> user 0.32
> sys 0.04
> $ grep-3.3/src/grep 01.2 inp
> real 0.29
> user 0.25
> sys 0.04
>
> Thanks,
> Norihiro
Missing a patch for dfa. Re-send correct patch file.
[0001-dfa-separate-parse-and-compile-phase.patch (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Sat, 23 Mar 2019 02:59:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 34951 <at> debbugs.gnu.org (full text, mbox):
How make grep walinh through FS by scanning breadth first instead of
the usual depth
On 3/23/19, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> On Sat, 23 Mar 2019 08:06:35 +0900
> Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
>
>> A kwset matcher is not built in a grep matcher after token re-order is
>> introduced in commit 5c7a0371823876cca7a1347fa09ca26bbbff0c98 in dfa.
>> It caused performance degradation in some typical cases. This bug is
>> introduced in grep-3.2.
>>
>> DFAMUST() does not work if tokens which are parsed in dfa matcher are
>> re-ordered. Therefore, change as it is called after parse and before
>> tokens re-order.
>>
>> BTW, this change does not affect programs that do not use DFAMUST(),
>> such as sed or gawk.
>>
>> $ yes $(printf '%040d' 0) | head -10000000 >inp
>> $ grep-2.2/src/grep 01.2 inp
>> real 1.61
>> user 1.53
>> sys 0.07
>> $ grep-2.3/src/grep 01.2 inp
>> real 1.57
>> user 1.48
>> sys 0.08
>> $ grep-2.4/src/grep 01.2 inp
>> real 1.50
>> user 1.44
>> sys 0.05
>> $ grep-2.4.1/src/grep 01.2 inp
>> real 1.53
>> user 1.48
>> sys 0.05
>> $ grep-2.4.2/src/grep 01.2 inp
>> real 1.52
>> user 1.47
>> sys 0.04
>> $ grep-2.5.4/src/grep 01.2 inp
>> real 1.53
>> user 1.47
>> sys 0.05
>> $ grep-2.6/src/grep 01.2 inp
>> real 1.51
>> user 1.47
>> sys 0.04
>> $ grep-2.6.1/src/grep 01.2 inp
>> real 1.50
>> user 1.44
>> sys 0.05
>> $ grep-2.6.2/src/grep 01.2 inp
>> real 1.52
>> user 1.46
>> sys 0.05
>> $ grep-2.6.3/src/grep 01.2 inp
>> real 1.52
>> user 1.47
>> sys 0.05
>> $ grep-2.7/src/grep 01.2 inp
>> real 1.53
>> user 1.49
>> sys 0.04
>> $ grep-2.8/src/grep 01.2 inp
>> real 1.52
>> user 1.46
>> sys 0.05
>> $ grep-2.9/src/grep 01.2 inp
>> real 1.54
>> user 1.50
>> sys 0.04
>> $ grep-2.10/src/grep 01.2 inp
>> real 1.51
>> user 1.46
>> sys 0.05
>> $ grep-2.11/src/grep 01.2 inp
>> real 1.53
>> user 1.48
>> sys 0.05
>> $ grep-2.12/src/grep 01.2 inp
>> real 1.51
>> user 1.47
>> sys 0.03
>> $ grep-2.13/src/grep 01.2 inp
>> real 1.52
>> user 1.47
>> sys 0.03
>> $ grep-2.14/src/grep 01.2 inp
>> real 1.52
>> user 1.47
>> sys 0.04
>> $ grep-2.15/src/grep 01.2 inp
>> real 1.55
>> user 1.49
>> sys 0.05
>> $ grep-2.16/src/grep 01.2 inp
>> real 1.53
>> user 1.48
>> sys 0.04
>> $ grep-2.17/src/grep 01.2 inp
>> real 1.53
>> user 1.48
>> sys 0.05
>> $ grep-2.18/src/grep 01.2 inp
>> real 1.51
>> user 1.44
>> sys 0.06
>> $ grep-2.19/src/grep 01.2 inp
>> real 0.06
>> user 0.02
>> sys 0.04
>> $ grep-2.20/src/grep 01.2 inp
>> real 0.07
>> user 0.01
>> sys 0.05
>> $ grep-2.21/src/grep 01.2 inp
>> real 0.06
>> user 0.02
>> sys 0.04
>> $ grep-2.22/src/grep 01.2 inp
>> real 0.06
>> user 0.01
>> sys 0.05
>> $ grep-2.23/src/grep 01.2 inp
>> real 0.09
>> user 0.04
>> sys 0.05
>> $ grep-2.24/src/grep 01.2 inp
>> real 0.09
>> user 0.04
>> sys 0.04
>> $ grep-2.25/src/grep 01.2 inp
>> real 0.09
>> user 0.05
>> sys 0.04
>> $ grep-2.26/src/grep 01.2 inp
>> real 0.09
>> user 0.04
>> sys 0.05
>> $ grep-2.27/src/grep 01.2 inp
>> real 0.09
>> user 0.04
>> sys 0.04
>> $ grep-2.28/src/grep 01.2 inp
>> real 0.09
>> user 0.04
>> sys 0.04
>> $ grep-3.0/src/grep 01.2 inp
>> real 0.09
>> user 0.04
>> sys 0.04
>> $ grep-3.1/src/grep 01.2 inp
>> real 0.11
>> user 0.05
>> sys 0.06
>> $ grep-3.2/src/grep 01.2 inp
>> real 0.37
>> user 0.32
>> sys 0.04
>> $ grep-3.3/src/grep 01.2 inp
>> real 0.29
>> user 0.25
>> sys 0.04
>>
>> Thanks,
>> Norihiro
>
> Missing a patch for dfa. Re-send correct patch file.
>
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Sat, 23 Mar 2019 03:00:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 34951 <at> debbugs.gnu.org (full text, mbox):
How make grep walking through FS by scanning breadth first instead of
On 3/23/19, Budi <budikusasi <at> gmail.com> wrote:
> How make grep walinh through FS by scanning breadth first instead of
> the usual depth
>
> On 3/23/19, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
>> On Sat, 23 Mar 2019 08:06:35 +0900
>> Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
>>
>>> A kwset matcher is not built in a grep matcher after token re-order is
>>> introduced in commit 5c7a0371823876cca7a1347fa09ca26bbbff0c98 in dfa.
>>> It caused performance degradation in some typical cases. This bug is
>>> introduced in grep-3.2.
>>>
>>> DFAMUST() does not work if tokens which are parsed in dfa matcher are
>>> re-ordered. Therefore, change as it is called after parse and before
>>> tokens re-order.
>>>
>>> BTW, this change does not affect programs that do not use DFAMUST(),
>>> such as sed or gawk.
>>>
>>> $ yes $(printf '%040d' 0) | head -10000000 >inp
>>> $ grep-2.2/src/grep 01.2 inp
>>> real 1.61
>>> user 1.53
>>> sys 0.07
>>> $ grep-2.3/src/grep 01.2 inp
>>> real 1.57
>>> user 1.48
>>> sys 0.08
>>> $ grep-2.4/src/grep 01.2 inp
>>> real 1.50
>>> user 1.44
>>> sys 0.05
>>> $ grep-2.4.1/src/grep 01.2 inp
>>> real 1.53
>>> user 1.48
>>> sys 0.05
>>> $ grep-2.4.2/src/grep 01.2 inp
>>> real 1.52
>>> user 1.47
>>> sys 0.04
>>> $ grep-2.5.4/src/grep 01.2 inp
>>> real 1.53
>>> user 1.47
>>> sys 0.05
>>> $ grep-2.6/src/grep 01.2 inp
>>> real 1.51
>>> user 1.47
>>> sys 0.04
>>> $ grep-2.6.1/src/grep 01.2 inp
>>> real 1.50
>>> user 1.44
>>> sys 0.05
>>> $ grep-2.6.2/src/grep 01.2 inp
>>> real 1.52
>>> user 1.46
>>> sys 0.05
>>> $ grep-2.6.3/src/grep 01.2 inp
>>> real 1.52
>>> user 1.47
>>> sys 0.05
>>> $ grep-2.7/src/grep 01.2 inp
>>> real 1.53
>>> user 1.49
>>> sys 0.04
>>> $ grep-2.8/src/grep 01.2 inp
>>> real 1.52
>>> user 1.46
>>> sys 0.05
>>> $ grep-2.9/src/grep 01.2 inp
>>> real 1.54
>>> user 1.50
>>> sys 0.04
>>> $ grep-2.10/src/grep 01.2 inp
>>> real 1.51
>>> user 1.46
>>> sys 0.05
>>> $ grep-2.11/src/grep 01.2 inp
>>> real 1.53
>>> user 1.48
>>> sys 0.05
>>> $ grep-2.12/src/grep 01.2 inp
>>> real 1.51
>>> user 1.47
>>> sys 0.03
>>> $ grep-2.13/src/grep 01.2 inp
>>> real 1.52
>>> user 1.47
>>> sys 0.03
>>> $ grep-2.14/src/grep 01.2 inp
>>> real 1.52
>>> user 1.47
>>> sys 0.04
>>> $ grep-2.15/src/grep 01.2 inp
>>> real 1.55
>>> user 1.49
>>> sys 0.05
>>> $ grep-2.16/src/grep 01.2 inp
>>> real 1.53
>>> user 1.48
>>> sys 0.04
>>> $ grep-2.17/src/grep 01.2 inp
>>> real 1.53
>>> user 1.48
>>> sys 0.05
>>> $ grep-2.18/src/grep 01.2 inp
>>> real 1.51
>>> user 1.44
>>> sys 0.06
>>> $ grep-2.19/src/grep 01.2 inp
>>> real 0.06
>>> user 0.02
>>> sys 0.04
>>> $ grep-2.20/src/grep 01.2 inp
>>> real 0.07
>>> user 0.01
>>> sys 0.05
>>> $ grep-2.21/src/grep 01.2 inp
>>> real 0.06
>>> user 0.02
>>> sys 0.04
>>> $ grep-2.22/src/grep 01.2 inp
>>> real 0.06
>>> user 0.01
>>> sys 0.05
>>> $ grep-2.23/src/grep 01.2 inp
>>> real 0.09
>>> user 0.04
>>> sys 0.05
>>> $ grep-2.24/src/grep 01.2 inp
>>> real 0.09
>>> user 0.04
>>> sys 0.04
>>> $ grep-2.25/src/grep 01.2 inp
>>> real 0.09
>>> user 0.05
>>> sys 0.04
>>> $ grep-2.26/src/grep 01.2 inp
>>> real 0.09
>>> user 0.04
>>> sys 0.05
>>> $ grep-2.27/src/grep 01.2 inp
>>> real 0.09
>>> user 0.04
>>> sys 0.04
>>> $ grep-2.28/src/grep 01.2 inp
>>> real 0.09
>>> user 0.04
>>> sys 0.04
>>> $ grep-3.0/src/grep 01.2 inp
>>> real 0.09
>>> user 0.04
>>> sys 0.04
>>> $ grep-3.1/src/grep 01.2 inp
>>> real 0.11
>>> user 0.05
>>> sys 0.06
>>> $ grep-3.2/src/grep 01.2 inp
>>> real 0.37
>>> user 0.32
>>> sys 0.04
>>> $ grep-3.3/src/grep 01.2 inp
>>> real 0.29
>>> user 0.25
>>> sys 0.04
>>>
>>> Thanks,
>>> Norihiro
>>
>> Missing a patch for dfa. Re-send correct patch file.
>>
>
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Sat, 23 Mar 2019 12:40:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 34951 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 3/22/19 9:59 PM, Budi wrote:
> How make grep walking through FS by scanning breadth first instead of
>
> On 3/23/19, Budi <budikusasi <at> gmail.com> wrote:
>> How make grep walinh through FS by scanning breadth first instead of
>> the usual depth
>>
>> On 3/23/19, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
>>> On Sat, 23 Mar 2019 08:06:35 +0900
>>> Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
>>>
>>>> A kwset matcher is not built in a grep matcher after token re-order is
Budi,
Hijacking a tread on a posted patch to ask an unrelated question via
top-posting is not very nice netiquette. Better is to start a new
thread for asking questions, and to use bottom posting for technical lists.
That said, the answer to your question is that there is no way to change
the way that grep walks the file system when using 'grep -r'. And when
you consider that 'grep -r' is a GNU extension not required by POSIX
(http://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html),
and that we are reluctant to bloat grep any further when 'find' already
exists as the POSIX-sanctioned file walker, you are better off getting
'find' to do the traversal you want (where find or xargs is used to
invoke plain 'grep' on the resulting files) rather than trying to
convince us to patch 'grep -r' to have more flexibility.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Fri, 29 Mar 2019 11:00:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 34951 <at> debbugs.gnu.org (full text, mbox):
Hi.
Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Missing a patch for dfa. Re-send correct patch file.
Paul - is this going to be merged into GNULIB? If so, I'll put it into
gawk now; I want to make a release soon.
Thanks,
Arnold
[
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Wed, 11 Dec 2019 23:26:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 34951 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 3/22/19 7:49 PM, Norihiro Tanaka wrote:
> Missing a patch for dfa. Re-send correct patch file.
Thanks, I installed the DFA-relevant parts of your proposed fix into
Gnulib. (The grep parts still need doing.) I also installed the attached
commentary followup.
While I was at it I installed a patch to fix an unlikely integer
overflow that I noticed while reviewing your fix. I also installed some
internal changes to prefer signed to unsigned integers for indexes, as
this should make future integer overflows easier to catch. See:
https://lists.gnu.org/r/bug-gnulib/2019-12/msg00058.html
https://lists.gnu.org/r/bug-gnulib/2019-12/msg00059.html
I'd also like to change dfa.h's API to prefer ptrdiff_t to size_t, for
the same integer-overflow reason. This would be a (minor) API change so
I thought I'd ask first. Any objections?
PS. Arnold, the above discusses all the changes I know about for dfa.c
and dfa.h. The proposed API change (size_t->ptrdiff_t) could be
installed either before or after the next Gawk release.
[0001-dfa-update-commentary-for-previous-change.patch (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Thu, 12 Dec 2019 07:24:01 GMT)
Full text and
rfc822 format available.
Message #26 received at 34951 <at> debbugs.gnu.org (full text, mbox):
Hi Paul.
Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 3/22/19 7:49 PM, Norihiro Tanaka wrote:
> > Missing a patch for dfa. Re-send correct patch file.
>
> Thanks, I installed the DFA-relevant parts of your proposed fix into
> Gnulib. (The grep parts still need doing.) I also installed the attached
> commentary followup.
>
> While I was at it I installed a patch to fix an unlikely integer
> overflow that I noticed while reviewing your fix. I also installed some
> internal changes to prefer signed to unsigned integers for indexes, as
> this should make future integer overflows easier to catch. See:
>
> https://lists.gnu.org/r/bug-gnulib/2019-12/msg00058.html
> https://lists.gnu.org/r/bug-gnulib/2019-12/msg00059.html
I am reviewing these. In general using signed integers internally
looks OK to me.
> I'd also like to change dfa.h's API to prefer ptrdiff_t to size_t, for
> the same integer-overflow reason. This would be a (minor) API change so
> I thought I'd ask first. Any objections?
Yes. I object. Strongly.
We're passing length and count values and those are supposed
to be size_t. If you REALLY want signed values, then I could
live with ssize_t (as returned by read(2), for example), but I
would find ptrdiff_t to be ugly and unintuitive.
> PS. Arnold, the above discusses all the changes I know about for dfa.c
> and dfa.h. The proposed API change (size_t->ptrdiff_t) could be
> installed either before or after the next Gawk release.
Thanks. I'm skimming the other changes now.
Arnold
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Thu, 12 Dec 2019 07:32:01 GMT)
Full text and
rfc822 format available.
Message #29 received at 34951 <at> debbugs.gnu.org (full text, mbox):
Hi Paul.
Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> https://lists.gnu.org/r/bug-gnulib/2019-12/msg00058.html
> https://lists.gnu.org/r/bug-gnulib/2019-12/msg00059.html
Looking at this:
| @@ -1733,11 +1733,11 @@ add_utf8_anychar (struct dfa *dfa)
| /* f0-f7: 4-byte sequence. */
| CHARCLASS_INIT (0, 0, 0, 0, 0, 0, 0, 0xff0000)
| };
| - const unsigned int n = sizeof (utf8_classes) / sizeof (utf8_classes[0]);
| + int n = sizeof utf8_classes / sizeof *utf8_classes;
Why are you throwing away const here?
Other than this, I think internally too, I'd prefer that you
1,$s/ptrdiff_t/ssize_t/g
(and fix any printf calls). It just feels like an abuse of
the type, which is for representing differences between pointers,
and not regular large signed integeers.
However, I'm not going to insist about it internally, whereas
I would object strongly to the use of ptrdiff_t in the API.
Thanks!
Arnold
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Thu, 12 Dec 2019 07:49:01 GMT)
Full text and
rfc822 format available.
Message #32 received at 34951 <at> debbugs.gnu.org (full text, mbox):
arnold <at> skeeve.com wrote:
> Other than this, I think internally too, I'd prefer that you
>
> 1,$s/ptrdiff_t/ssize_t/g
I did this, just to see. gawk passes its test suite, both in
64- and 32-bit mode.
FWIW.
Thanks,
Arnold
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Thu, 12 Dec 2019 22:27:02 GMT)
Full text and
rfc822 format available.
Message #35 received at 34951 <at> debbugs.gnu.org (full text, mbox):
On 12/11/19 11:31 PM, arnold <at> skeeve.com wrote:
> 1,$s/ptrdiff_t/ssize_t/g
ssize_t can be narrower than ptrdiff_t, so it's not a good type to use
for this notion. Its original motivation was "the type that 'read'
returns", and on systems where 'read' can return at most INT_MAX,
ssize_t can be 32 bits even if size_t is 64 bits.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Fri, 13 Dec 2019 08:11:01 GMT)
Full text and
rfc822 format available.
Message #38 received at 34951 <at> debbugs.gnu.org (full text, mbox):
Hi Paul.
Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 12/11/19 11:31 PM, arnold <at> skeeve.com wrote:
>
> > 1,$s/ptrdiff_t/ssize_t/g
>
> ssize_t can be narrower than ptrdiff_t, so it's not a good type to use
> for this notion. Its original motivation was "the type that 'read'
> returns", and on systems where 'read' can return at most INT_MAX,
> ssize_t can be 32 bits even if size_t is 64 bits.
In practice, how many system are there where ssize_t is 32 bits and size_t
is 64? If that number is <= 5 then I wouldn't worry about using ssize_t.
In any case, as I said, I can live with ptrdiff_t in the implementation,
even though I don't like it that much. (A nice block comment at the
top of dfa.c explaining why ptrdiff_t is used would be appropriate.)
But I really don't want ptrdiff_t in the API.
Thanks,
Arnold
Thanks,
Arnold
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Fri, 13 Dec 2019 12:09:02 GMT)
Full text and
rfc822 format available.
Message #41 received at 34951 <at> debbugs.gnu.org (full text, mbox):
arnold <at> skeeve.com wrote:
> But I really don't want ptrdiff_t in the API.
I see that Paul has made the change to the API over my objections.
Jim --- do you have an opinion on this?
Thanks,
Arnold
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Fri, 13 Dec 2019 17:54:02 GMT)
Full text and
rfc822 format available.
Message #44 received at 34951 <at> debbugs.gnu.org (full text, mbox):
On Fri, Dec 13, 2019 at 4:08 AM <arnold <at> skeeve.com> wrote:
> arnold <at> skeeve.com write:
>
> > But I really don't want ptrdiff_t in the API.
>
> I see that Paul has made the change to the API over my objections.
>
> Jim --- do you have an opinion on this?
Hi Aharon,
I used to feel the way you do. However, given the way compilers and
static/dynamic analysis have evolved, I have come around to Paul's
point of view. It still feels "wrong" in some sense, but using the
signed type makes the code more robust, enabling automatic
detection/avoidance of more bugs than with unsigned types. Thus, a net
improvement.
Paul, can you point to a link that lists the benefits/tradeoffs? If I
had such a link handy, I would have provided it here.
Jim
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Fri, 13 Dec 2019 20:02:01 GMT)
Full text and
rfc822 format available.
Message #47 received at 34951 <at> debbugs.gnu.org (full text, mbox):
>> I see that Paul has made the change to the API over my objections.
I made the change while responding to Bruno's objections, but before
seeing yours. Ooops. Sorry about that. However, I hope the followup
emails have addressed your comments, at least to some extent.
> Paul, can you point to a link that lists the benefits/tradeoffs? If I
> had such a link handy, I would have provided it here.
Avoiding unsigned types for indexes and sizes seems to be a growing
movement. Admittedly there are arguments for unsigned, but these
arguments are getting weaker with time. Here are a couple of links, the
first for C and the second for C++:
https://www.gnu.org/software/emacs/manual/html_node/elisp/C-Integer-Types.html
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0.pdf
As for ssize_t vs ptrdiff_t: ssize_t is less central to the C language
(ptrdiff_t is in the C standard but ssize_t is not). And ssize_t is less
convenient: for example, there's no simple, portable way to printf an
ssize_t value, as there is with "%td" and ptrdiff_t. So there are
technical reasons for preferring ptrdiff_t to ssize_t for this sort of
thing (even though "ssize_t" is a shorter and better name). Thich is why
Emacs, other parts of Gnulib, and other Gnu applications have used
ptrdiff_t instead of ssize_t for this sort of thing.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Sun, 15 Dec 2019 08:15:02 GMT)
Full text and
rfc822 format available.
Message #50 received at 34951 <at> debbugs.gnu.org (full text, mbox):
OK. I skimmed the links. But why not write the code to say what
we mean? For example:
#include <stdint.h>
typedef int64_t dfa_size_t;
extern void dfaparse (char const *, dfa_size_t, struct dfa *);
extern void dfacomp (char const *, dfa_size_t, struct dfa *, bool);
bool allow_nl, dfa_size_t *count, bool *backref);
Using ptrdiff_t directly simply because it is defined to be the
largest signed integer remains ugly (and Paul has already moved to
a typedef in the implementation.)
int64_t is just as standard as ptrdiff_t and just as clear.
Thanks,
Arnold
Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> >> I see that Paul has made the change to the API over my objections.
>
> I made the change while responding to Bruno's objections, but before
> seeing yours. Ooops. Sorry about that. However, I hope the followup
> emails have addressed your comments, at least to some extent.
>
> > Paul, can you point to a link that lists the benefits/tradeoffs? If I
> > had such a link handy, I would have provided it here.
>
> Avoiding unsigned types for indexes and sizes seems to be a growing
> movement. Admittedly there are arguments for unsigned, but these
> arguments are getting weaker with time. Here are a couple of links, the
> first for C and the second for C++:
>
> https://www.gnu.org/software/emacs/manual/html_node/elisp/C-Integer-Types.html
>
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0.pdf
>
> As for ssize_t vs ptrdiff_t: ssize_t is less central to the C language
> (ptrdiff_t is in the C standard but ssize_t is not). And ssize_t is less
> convenient: for example, there's no simple, portable way to printf an
> ssize_t value, as there is with "%td" and ptrdiff_t. So there are
> technical reasons for preferring ptrdiff_t to ssize_t for this sort of
> thing (even though "ssize_t" is a shorter and better name). Thich is why
> Emacs, other parts of Gnulib, and other Gnu applications have used
> ptrdiff_t instead of ssize_t for this sort of thing.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Mon, 16 Dec 2019 09:57:01 GMT)
Full text and
rfc822 format available.
Message #53 received at 34951 <at> debbugs.gnu.org (full text, mbox):
On 12/15/19 12:14 AM, arnold <at> skeeve.com wrote:
> int64_t is just as standard as ptrdiff_t and just as clear.
Actually, int64_t is optional (as even C18 and POSIX-2018 do not require it),
whereas ptrdiff_t has been required since C89. More importantly, int64_t would
be overkill on 32-bit GNU/Linux, whereas ptrdiff_t suffices and is typically
more efficient.
(Besides, what would we do if 72-bit pointers came back into vogue? :-)
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Mon, 16 Dec 2019 10:13:02 GMT)
Full text and
rfc822 format available.
Message #56 received at 34951 <at> debbugs.gnu.org (full text, mbox):
Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 12/15/19 12:14 AM, arnold <at> skeeve.com wrote:
>
> > int64_t is just as standard as ptrdiff_t and just as clear.
>
> Actually, int64_t is optional (as even C18 and POSIX-2018 do not require it),
> whereas ptrdiff_t has been required since C89. More importantly, int64_t would
> be overkill on 32-bit GNU/Linux, whereas ptrdiff_t suffices and is typically
> more efficient.
>
> (Besides, what would we do if 72-bit pointers came back into vogue? :-)
Fine.
What about
typedef ptrdiff_t dfa_size_t
?
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Fri, 20 Dec 2019 03:19:02 GMT)
Full text and
rfc822 format available.
Message #59 received at 34951 <at> debbugs.gnu.org (full text, mbox):
On 12/16/19 2:12 AM, arnold <at> skeeve.com wrote:
> What about
>
> typedef ptrdiff_t dfa_size_t
That declaration would imply that the type is specific to DFAs. However, the
type is used (with exactly the same meaning) in a lot of other places. This is
why I used the more-generic name "idx_t" internally dfa.c.
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Fri, 20 Dec 2019 03:42:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
bug acknowledged by developer.
(Fri, 20 Dec 2019 03:42:02 GMT)
Full text and
rfc822 format available.
Message #64 received at 34951-done <at> debbugs.gnu.org (full text, mbox):
On 12/11/19 3:25 PM, Paul Eggert wrote:
> On 3/22/19 7:49 PM, Norihiro Tanaka wrote:
>> Missing a patch for dfa. Re-send correct patch file.
>
> Thanks, I installed the DFA-relevant parts of your proposed fix into Gnulib.
> (The grep parts still need doing.)
I finally got around to reviewing the grep parts, and installed them into 'grep'
master. Thanks again for the fix, and sorry about the delay. Closing the bug
report, as the original bug has been fixed (though we can still talk about what
name to give ptrdiff_t, in bug-gnulib perhaps).
I followed up with this NEWS entry:
A performance bug has been fixed for patterns like '01.2' that
cause grep to reorder tokens internally.
[Bug#34951 introduced in grep 3.2]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#34951
; Package
grep
.
(Fri, 20 Dec 2019 10:36:02 GMT)
Full text and
rfc822 format available.
Message #67 received at 34951 <at> debbugs.gnu.org (full text, mbox):
Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 12/16/19 2:12 AM, arnold <at> skeeve.com wrote:
> > What about
> >
> > typedef ptrdiff_t dfa_size_t
>
> That declaration would imply that the type is specific to DFAs. However, the
> type is used (with exactly the same meaning) in a lot of other places. This is
> why I used the more-generic name "idx_t" internally dfa.c.
I give up. Leave it ptrdiff_t. I may submit comment changes for dfa.h
later.
Arnold
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 17 Jan 2020 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 4 years and 94 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.