GNU bug report logs -
#29124
[PATCH] grep: simplify out_file handling
Previous Next
Reported by: Zev Weiss <zev <at> bewilderbeest.net>
Date: Fri, 3 Nov 2017 08:10:01 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29124 in the body.
You can then email your comments to 29124 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#29124
; Package
grep
.
(Fri, 03 Nov 2017 08:10:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Zev Weiss <zev <at> bewilderbeest.net>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Fri, 03 Nov 2017 08:10:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
* src/grep.c (grepdirent): Don't twiddle out_file back and forth
during recursion.
(main): Set out_file once based on command-line arguments.
* bootstrap.conf (gnulib_modules): Add isdir gnulib module.
---
bootstrap.conf | 1 +
src/grep.c | 18 ++++++++----------
2 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/bootstrap.conf b/bootstrap.conf
index 1c50974..da7b711 100644
--- a/bootstrap.conf
+++ b/bootstrap.conf
@@ -50,6 +50,7 @@ intprops
inttypes
isatty
isblank
+isdir
iswctype
largefile
locale
diff --git a/src/grep.c b/src/grep.c
index 0643a7e..66d9e91 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -42,6 +42,7 @@
#include "getprogname.h"
#include "grep.h"
#include "intprops.h"
+#include "isdir.h"
#include "propername.h"
#include "quote.h"
#include "safe-read.h"
@@ -1590,11 +1591,7 @@ grepdirent (FTS *fts, FTSENT *ent, bool command_line)
command_line &= ent->fts_level == FTS_ROOTLEVEL;
if (ent->fts_info == FTS_DP)
- {
- if (directories == RECURSE_DIRECTORIES && command_line)
- out_file &= ~ (2 * !no_filenames);
- return true;
- }
+ return true;
if (!command_line
&& skipped_file (ent->fts_name, false,
@@ -1615,10 +1612,7 @@ grepdirent (FTS *fts, FTSENT *ent, bool command_line)
{
case FTS_D:
if (directories == RECURSE_DIRECTORIES)
- {
- out_file |= 2 * !no_filenames;
- return true;
- }
+ return true;
fts_set (fts, ent, FTS_SKIP);
break;
@@ -2885,7 +2879,11 @@ main (int argc, char **argv)
&match_size, NULL) == 0)
== out_invert);
- if ((argc - optind > 1 && !no_filenames) || with_filenames)
+ if (((argc - optind > 1
+ || (directories == RECURSE_DIRECTORIES
+ && !(argc - optind == 1 && !isdir (argv[optind]))))
+ && !no_filenames)
+ || with_filenames)
out_file = 1;
if (binary)
--
2.15.0
Information forwarded
to
bug-grep <at> gnu.org
:
bug#29124
; Package
grep
.
(Fri, 03 Nov 2017 08:48:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 29124 <at> debbugs.gnu.org (full text, mbox):
This adds a syscall, which sounds like both a performance loss and a correctness
issue, as it introduces a race if the file system is being modified while we
grep. Is there some way to simplify out_file without adding a syscall or a race?
Information forwarded
to
bug-grep <at> gnu.org
:
bug#29124
; Package
grep
.
(Fri, 03 Nov 2017 15:45:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 29124 <at> debbugs.gnu.org (full text, mbox):
On Fri, Nov 03, 2017 at 03:46:59AM CDT, Paul Eggert wrote:
>This adds a syscall, which sounds like both a performance loss
This I considered while writing the patch, but decided was unlikely to
be significant. The isdir() is only evaluated in the case of receiving
exactly one command-line argument and recursion being enabled (e.g.
'-r'). So I think the case where the overhead of the syscall would be
highest relative to the total execution time would be a recursive grep
of a single empty file.
Benchmarking that (src/grep-patched includes the simplification patch;
src/grep does not):
[zev <at> hatter: grep]% perf stat -e dummy -r 1000 ./src/grep -r '' /dev/null
Performance counter stats for './src/grep -r /dev/null' (1000 runs):
0 dummy:u
0.001551488 seconds time elapsed ( +- 0.15% )
[zev <at> hatter: grep]% perf stat -e dummy -r 1000 ./src/grep-patched -r '' /dev/null
Performance counter stats for './src/grep-patched -r /dev/null' (1000 runs):
0 dummy:u
0.001554579 seconds time elapsed ( +- 0.15% )
[zev <at> hatter: grep]% echo 5 k 0.001554579 0.001551488 / p | dc
1.00199
So a 3-microsecond difference (<0.2%, which will of course rapidly
approach 0% in the case of any actual work to do). In fact, it doesn't
even seem to be reliably above the threshold of measurement noise --
here's another pair of runs (bearing in mind these are averages of a
thousand iterations each):
[zev <at> hatter: grep]% perf stat -e dummy -r 1000 ./src/grep-patched -r '' /dev/null
Performance counter stats for './src/grep-patched -r /dev/null' (1000 runs):
0 dummy:u
0.001548791 seconds time elapsed ( +- 0.15% )
[zev <at> hatter: grep]% perf stat -e dummy -r 1000 ./src/grep -r '' /dev/null
Performance counter stats for './src/grep -r /dev/null' (1000 runs):
0 dummy:u
0.001549041 seconds time elapsed ( +- 0.17% )
[zev <at> hatter: grep]% echo 5 k 0.001548791 0.001549041 / p | dc
.99983
One could perhaps also argue that asymptotically, with a *large* amount
of work to do, the time savings of not updating out_file would
eventually outweigh the cost of the added stat call.
>and a correctness issue, as it introduces a race if the file system is
>being modified while we grep.
This I did not consider. However, I'm wondering how robust (paranoid?)
grep really needs to be in the face of concurrent filesystem
modifications. Say grep is searching for the pattern AB (where A and B
are each sufficiently lengthy substrings) in a file that contains it,
has successfully matched A, and is in the process of matching B. If
before it finishes doing so another process modifies the file and pokes
a byte in A making it no longer match, should grep be expected to detect
that and not print the match? (I would certainly think not.)
Similarly, a user, script, etc. expecting any sort of reliable,
well-defined behavior from grep while *changing the type of a path grep
is currently operating on* would be...sort of unreasonable I'd think?
Zev
Information forwarded
to
bug-grep <at> gnu.org
:
bug#29124
; Package
grep
.
(Sat, 04 Nov 2017 06:56:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 29124 <at> debbugs.gnu.org (full text, mbox):
Zev Weiss wrote:
> I'm wondering how robust (paranoid?) grep really needs to be in the face of
> concurrent filesystem modifications.
Not that paranoid. Still, it's better for grep to avoid races when it's easy, as
is the case here.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#29124
; Package
grep
.
(Sat, 04 Nov 2017 21:01:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 29124 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Sat, Nov 04, 2017 at 01:55:51AM CDT, Paul Eggert wrote:
>Zev Weiss wrote:
>>I'm wondering how robust (paranoid?) grep really needs to be in the
>>face of concurrent filesystem modifications.
>
>Not that paranoid. Still, it's better for grep to avoid races when
>it's easy, as is the case here.
OK, how about the attached patch then -- it both avoids updating
out_file during directory recursion (simpler, and perhaps a tiny
performance win) and doesn't add any syscalls or races.
Zev
[0001-grep-simplify-out_file-handling.patch (text/x-diff, attachment)]
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Tue, 05 Nov 2019 23:07:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Zev Weiss <zev <at> bewilderbeest.net>
:
bug acknowledged by developer.
(Tue, 05 Nov 2019 23:07:02 GMT)
Full text and
rfc822 format available.
Message #22 received at 29124-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 11/4/17 2:00 PM, Zev Weiss wrote:
> OK, how about the attached patch then -- it both avoids updating
> out_file during directory recursion (simpler, and perhaps a tiny
> performance win) and doesn't add any syscalls or races.
Thanks, I (finally) installed that cleanup patch, and followed it up
with two more simplifying patches. The combined change looks like the
attached.
[grep.diff (text/x-patch, attachment)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Wed, 04 Dec 2019 12:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 4 years and 137 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.