GNU bug report logs -
#38690
grep -P quadratic on long lines
Previous Next
To reply to this bug, email your comments to 38690 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#38690
; Package
grep
.
(Fri, 20 Dec 2019 15:33:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Martin Raszyk <m.raszyk <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Fri, 20 Dec 2019 15:33:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Dear grep maintainers,
I've realized that grep -P takes quadratic time on long lines with
many short matches. For instance, executing './src/grep -P -o "a"
a.txt > a.out' on a file 'a.txt' consisting of N characters 'a' takes
time quadratic in N. I've used grep-3.3 and pcre-8.43 for the
benchmarks.
The root causes for this behavior are as follows:
1. in src/pcresearch.c on l. 222 (at commit
cf09252295c554dd3eba4cdb8eb53911fb495f40), the end of the line is
searched each time a new match is searched; this already results in
quadratic runtime in the above mentioned case
2. the function 'pcre_exec' from pcre-8.43 called in src/pcresearch.c
on l. 71 for each match checks if the provided string is valid UTF-8
(code implemented in pcre_valid_utf8.c); this also results in
quadratic runtime
On your side, it is possible to fix the first root cause. I'll post an
e-mail to PCRE mailing list about the second root cause.
Best regards,
Martin
This bug report was last modified 4 years and 340 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.