GNU bug report logs - #38690
grep -P quadratic on long lines

Previous Next

Package: grep;

Reported by: Martin Raszyk <m.raszyk <at> gmail.com>

Date: Fri, 20 Dec 2019 15:33:02 UTC

Severity: normal

To reply to this bug, email your comments to 38690 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#38690; Package grep. (Fri, 20 Dec 2019 15:33:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Martin Raszyk <m.raszyk <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Fri, 20 Dec 2019 15:33:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Martin Raszyk <m.raszyk <at> gmail.com>
To: bug-grep <at> gnu.org
Subject: grep -P quadratic on long lines
Date: Fri, 20 Dec 2019 10:54:14 +0100
Dear grep maintainers,

I've realized that grep -P takes quadratic time on long lines with
many short matches. For instance, executing './src/grep -P -o "a"
a.txt > a.out' on a file 'a.txt' consisting of N characters 'a' takes
time quadratic in N. I've used grep-3.3 and pcre-8.43 for the
benchmarks.

The root causes for this behavior are as follows:
1. in src/pcresearch.c on l. 222 (at commit
cf09252295c554dd3eba4cdb8eb53911fb495f40), the end of the line is
searched each time a new match is searched; this already results in
quadratic runtime in the above mentioned case
2. the function 'pcre_exec' from pcre-8.43 called in src/pcresearch.c
on l. 71 for each match checks if the provided string is valid UTF-8
(code implemented in pcre_valid_utf8.c); this also results in
quadratic runtime

On your side, it is possible to fix the first root cause. I'll post an
e-mail to PCRE mailing list about the second root cause.

Best regards,
Martin




This bug report was last modified 5 years and 9 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.