GNU bug report logs -
#16893
[PATCH] Avoid matching line-by-line for case-insensitive with grep
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Thu, 27 Feb 2014 16:03:02 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 16893 in the body.
You can then email your comments to 16893 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
help-debbugs <at> gnu.org
:
bug#16893
; Package
debbugs.gnu.org
.
(Thu, 27 Feb 2014 16:03:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
New bug report received and forwarded. Copy sent to
help-debbugs <at> gnu.org
.
(Thu, 27 Feb 2014 16:03:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Now grep and awk matchers doesn't waste buffer in case-sensisitive matching.
So I think that we can avoid line-by-line matching for them.
It enable to speed up case-sensitive matching with grep or awk matcher
without trivial_case_ignore as fast as when with it.
In bug#16232:
> The following times 2.16, 2.17 and 2.17+patch two ways:
>
> $ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 > k
> $ for i in 16 17 18; do echo $i; env LC_ALL=en_US.UTF-8 time
> /p/p/grep-2.$i/bin/grep -i foobar k; done
> 16
> 15.96 real 14.57 user 0.12 sys
> 17
> 1.13 real 1.07 user 0.06 sys
> 18
> 1.96 real 1.89 user 0.06 sys
>
> The above search takes more than 70% longer with the proposed patch.
Therefore, I think 30% slow-down is caused by the line-by-line matching
for them.
[avoid_line_by_line.txt (application/octet-stream, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16893
; Package
grep
.
(Thu, 27 Feb 2014 16:55:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 16893 <at> debbugs.gnu.org (full text, mbox):
This message was sent to the submit <at> debbugs address with no
"Package:" specified in the body. So it ended up on the help-debbugs
mailing list rather than bug-grep. I have assigned it to grep and am
sending this mail, which will appear on the bug-grep list.
For new reports, either use the bug-grep address, or remember to use
Package: grep at the start of the body. They both have identical results.
Norihiro Tanaka wrote:
> Now grep and awk matchers doesn't waste buffer in case-sensisitive matching.
> So I think that we can avoid line-by-line matching for them.
>
> It enable to speed up case-sensitive matching with grep or awk matcher
> without trivial_case_ignore as fast as when with it.
>
> In bug#16232:
>> The following times 2.16, 2.17 and 2.17+patch two ways:
>>
>> $ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 > k
>> $ for i in 16 17 18; do echo $i; env LC_ALL=en_US.UTF-8 time
>> /p/p/grep-2.$i/bin/grep -i foobar k; done
>> 16
>> 15.96 real 14.57 user 0.12 sys
>> 17
>> 1.13 real 1.07 user 0.06 sys
>> 18
>> 1.96 real 1.89 user 0.06 sys
>>
>> The above search takes more than 70% longer with the proposed patch.
>
> Therefore, I think 30% slow-down is caused by the line-by-line matching
> for them.
[See attachment at http://debbugs.gnu.org/16893]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16893
; Package
grep
.
(Fri, 28 Feb 2014 02:55:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 16893 <at> debbugs.gnu.org (full text, mbox):
On Thu, Feb 27, 2014 at 8:54 AM, Glenn Morris <rgm <at> gnu.org> wrote:
>
> This message was sent to the submit <at> debbugs address with no
> "Package:" specified in the body. So it ended up on the help-debbugs
> mailing list rather than bug-grep. I have assigned it to grep and am
> sending this mail, which will appear on the bug-grep list.
>
> For new reports, either use the bug-grep address, or remember to use
> Package: grep at the start of the body. They both have identical results.
>
> Norihiro Tanaka wrote:
>
>> Now grep and awk matchers doesn't waste buffer in case-sensisitive matching.
>> So I think that we can avoid line-by-line matching for them.
>>
>> It enable to speed up case-sensitive matching with grep or awk matcher
>> without trivial_case_ignore as fast as when with it.
>>
>> In bug#16232:
>>> The following times 2.16, 2.17 and 2.17+patch two ways:
>>>
>>> $ yes jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjjj | head -10000000 > k
>>> $ for i in 16 17 18; do echo $i; env LC_ALL=en_US.UTF-8 time
>>> /p/p/grep-2.$i/bin/grep -i foobar k; done
>>> 16
>>> 15.96 real 14.57 user 0.12 sys
>>> 17
>>> 1.13 real 1.07 user 0.06 sys
>>> 18
>>> 1.96 real 1.89 user 0.06 sys
>>>
>>> The above search takes more than 70% longer with the proposed patch.
>>
>> Therefore, I think 30% slow-down is caused by the line-by-line matching
>> for them.
>
> [See attachment at http://debbugs.gnu.org/16893]
Thank you for forwarding that, Glenn.
Thank you for the patch, Norihiro.
However, your removal of the "MB_CUR_MAX == 1" disjunct
would cause unibyte grep -i with (-F or -P) to match line-by-line,
whereas currently it uses the buffer-matching code.
That would make a search like this take 3 times longer:
seq 30000000 > in
LC_ALL=C grep -iF foo in
Without your patch, best of 5 wall clock time is 0.31s,
yet with the patch, it takes 1.00s.
I presume you will want to retain that disjunct?
If you submit an adjusted patch, please also include in the
commit log some timing examples showing how the change affects
grep's performance.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16893
; Package
grep
.
(Fri, 28 Feb 2014 15:03:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 16893 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi Jim,
I thank you for your review and pointing the bug for the patch. You are
right. I have written the wrong if conditions. I think that behavior
shouldn't be changed for the pcre or fgrep matcher by the patch. I have
fixed its bug, and re-send the patch and results of tests.
Norihiro
[avoid_line_by_line.txt (application/octet-stream, attachment)]
[tests.txt (application/octet-stream, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16893
; Package
grep
.
(Fri, 28 Feb 2014 15:13:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 16893 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I used the attachment on this mail to test for "removal of
trivial_case_ignore".
Norihiro
[removal_of_trivial_case_ignore.txt (application/octet-stream, attachment)]
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Sat, 01 Mar 2014 06:27:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
bug acknowledged by developer.
(Sat, 01 Mar 2014 06:27:04 GMT)
Full text and
rfc822 format available.
Message #22 received at 16893-done <at> debbugs.gnu.org (full text, mbox):
Thanks, I applied that. It simplifies another patch I've been meaning
to put in, yay!
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16893
; Package
grep
.
(Sat, 01 Mar 2014 06:35:02 GMT)
Full text and
rfc822 format available.
Message #25 received at 16893-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I also applied the attached followup patch to clean out macros no longer
used.
[0001-grep-remove-lint.patch (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16893
; Package
grep
.
(Sat, 01 Mar 2014 23:18:02 GMT)
Full text and
rfc822 format available.
Message #28 received at 16893-done <at> debbugs.gnu.org (full text, mbox):
Thank you both for all of the patches!
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 30 Mar 2014 11:24:07 GMT)
Full text and
rfc822 format available.
This bug report was last modified 11 years and 9 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.