GNU bug report logs - #29124
[PATCH] grep: simplify out_file handling

Previous Next

Package: grep;

Reported by: Zev Weiss <zev <at> bewilderbeest.net>

Date: Fri, 3 Nov 2017 08:10:01 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29124 in the body.
You can then email your comments to 29124 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#29124; Package grep. (Fri, 03 Nov 2017 08:10:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Zev Weiss <zev <at> bewilderbeest.net>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Fri, 03 Nov 2017 08:10:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Zev Weiss <zev <at> bewilderbeest.net>
To: bug-grep <at> gnu.org
Cc: Zev Weiss <zev <at> bewilderbeest.net>
Subject: [PATCH] grep: simplify out_file handling
Date: Fri,  3 Nov 2017 03:09:25 -0500
* src/grep.c (grepdirent): Don't twiddle out_file back and forth
during recursion.
(main): Set out_file once based on command-line arguments.
* bootstrap.conf (gnulib_modules): Add isdir gnulib module.
---
 bootstrap.conf |  1 +
 src/grep.c     | 18 ++++++++----------
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/bootstrap.conf b/bootstrap.conf
index 1c50974..da7b711 100644
--- a/bootstrap.conf
+++ b/bootstrap.conf
@@ -50,6 +50,7 @@ intprops
 inttypes
 isatty
 isblank
+isdir
 iswctype
 largefile
 locale
diff --git a/src/grep.c b/src/grep.c
index 0643a7e..66d9e91 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -42,6 +42,7 @@
 #include "getprogname.h"
 #include "grep.h"
 #include "intprops.h"
+#include "isdir.h"
 #include "propername.h"
 #include "quote.h"
 #include "safe-read.h"
@@ -1590,11 +1591,7 @@ grepdirent (FTS *fts, FTSENT *ent, bool command_line)
   command_line &= ent->fts_level == FTS_ROOTLEVEL;
 
   if (ent->fts_info == FTS_DP)
-    {
-      if (directories == RECURSE_DIRECTORIES && command_line)
-        out_file &= ~ (2 * !no_filenames);
-      return true;
-    }
+    return true;
 
   if (!command_line
       && skipped_file (ent->fts_name, false,
@@ -1615,10 +1612,7 @@ grepdirent (FTS *fts, FTSENT *ent, bool command_line)
     {
     case FTS_D:
       if (directories == RECURSE_DIRECTORIES)
-        {
-          out_file |= 2 * !no_filenames;
-          return true;
-        }
+        return true;
       fts_set (fts, ent, FTS_SKIP);
       break;
 
@@ -2885,7 +2879,11 @@ main (int argc, char **argv)
                                 &match_size, NULL) == 0)
                       == out_invert);
 
-  if ((argc - optind > 1 && !no_filenames) || with_filenames)
+  if (((argc - optind > 1
+        || (directories == RECURSE_DIRECTORIES
+            && !(argc - optind == 1 && !isdir (argv[optind]))))
+       && !no_filenames)
+      || with_filenames)
     out_file = 1;
 
   if (binary)
-- 
2.15.0





Information forwarded to bug-grep <at> gnu.org:
bug#29124; Package grep. (Fri, 03 Nov 2017 08:48:02 GMT) Full text and rfc822 format available.

Message #8 received at 29124 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Zev Weiss <zev <at> bewilderbeest.net>, 29124 <at> debbugs.gnu.org
Subject: Re: bug#29124: [PATCH] grep: simplify out_file handling
Date: Fri, 3 Nov 2017 01:46:59 -0700
This adds a syscall, which sounds like both a performance loss and a correctness 
issue, as it introduces a race if the file system is being modified while we 
grep. Is there some way to simplify out_file without adding a syscall or a race?




Information forwarded to bug-grep <at> gnu.org:
bug#29124; Package grep. (Fri, 03 Nov 2017 15:45:01 GMT) Full text and rfc822 format available.

Message #11 received at 29124 <at> debbugs.gnu.org (full text, mbox):

From: Zev Weiss <zev <at> bewilderbeest.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 29124 <at> debbugs.gnu.org
Subject: Re: bug#29124: [PATCH] grep: simplify out_file handling
Date: Fri, 3 Nov 2017 10:44:31 -0500
On Fri, Nov 03, 2017 at 03:46:59AM CDT, Paul Eggert wrote:
>This adds a syscall, which sounds like both a performance loss 

This I considered while writing the patch, but decided was unlikely to 
be significant.  The isdir() is only evaluated in the case of receiving 
exactly one command-line argument and recursion being enabled (e.g.  
'-r').  So I think the case where the overhead of the syscall would be 
highest relative to the total execution time would be a recursive grep 
of a single empty file.

Benchmarking that (src/grep-patched includes the simplification patch;
src/grep does not):

[zev <at> hatter: grep]% perf stat -e dummy -r 1000 ./src/grep -r '' /dev/null

Performance counter stats for './src/grep -r  /dev/null' (1000 runs):

                0      dummy:u

      0.001551488 seconds time elapsed                                          ( +-  0.15% )

[zev <at> hatter: grep]% perf stat -e dummy -r 1000 ./src/grep-patched -r '' /dev/null

Performance counter stats for './src/grep-patched -r  /dev/null' (1000 runs):

                0      dummy:u

      0.001554579 seconds time elapsed                                          ( +-  0.15% )

[zev <at> hatter: grep]% echo 5 k 0.001554579 0.001551488 / p | dc
1.00199

So a 3-microsecond difference (<0.2%, which will of course rapidly 
approach 0% in the case of any actual work to do).  In fact, it doesn't 
even seem to be reliably above the threshold of measurement noise -- 
here's another pair of runs (bearing in mind these are averages of a 
thousand iterations each):

[zev <at> hatter: grep]% perf stat -e dummy -r 1000 ./src/grep-patched -r '' /dev/null

Performance counter stats for './src/grep-patched -r  /dev/null' (1000 runs):

                0      dummy:u

      0.001548791 seconds time elapsed                                          ( +-  0.15% )

[zev <at> hatter: grep]% perf stat -e dummy -r 1000 ./src/grep -r '' /dev/null

Performance counter stats for './src/grep -r  /dev/null' (1000 runs):

                0      dummy:u

      0.001549041 seconds time elapsed                                          ( +-  0.17% )

[zev <at> hatter: grep]% echo 5 k 0.001548791 0.001549041 / p | dc
.99983

One could perhaps also argue that asymptotically, with a *large* amount 
of work to do, the time savings of not updating out_file would 
eventually outweigh the cost of the added stat call.

>and a correctness issue, as it introduces a race if the file system is 
>being modified while we grep.

This I did not consider.  However, I'm wondering how robust (paranoid?) 
grep really needs to be in the face of concurrent filesystem 
modifications.  Say grep is searching for the pattern AB (where A and B 
are each sufficiently lengthy substrings) in a file that contains it, 
has successfully matched A, and is in the process of matching B.  If 
before it finishes doing so another process modifies the file and pokes 
a byte in A making it no longer match, should grep be expected to detect 
that and not print the match?  (I would certainly think not.)  
Similarly, a user, script, etc. expecting any sort of reliable, 
well-defined behavior from grep while *changing the type of a path grep 
is currently operating on* would be...sort of unreasonable I'd think?


Zev





Information forwarded to bug-grep <at> gnu.org:
bug#29124; Package grep. (Sat, 04 Nov 2017 06:56:01 GMT) Full text and rfc822 format available.

Message #14 received at 29124 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Zev Weiss <zev <at> bewilderbeest.net>
Cc: 29124 <at> debbugs.gnu.org
Subject: Re: bug#29124: [PATCH] grep: simplify out_file handling
Date: Fri, 3 Nov 2017 23:55:51 -0700
Zev Weiss wrote:
> I'm wondering how robust (paranoid?) grep really needs to be in the face of 
> concurrent filesystem modifications.

Not that paranoid. Still, it's better for grep to avoid races when it's easy, as 
is the case here.




Information forwarded to bug-grep <at> gnu.org:
bug#29124; Package grep. (Sat, 04 Nov 2017 21:01:01 GMT) Full text and rfc822 format available.

Message #17 received at 29124 <at> debbugs.gnu.org (full text, mbox):

From: Zev Weiss <zev <at> bewilderbeest.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 29124 <at> debbugs.gnu.org
Subject: Re: bug#29124: [PATCH] grep: simplify out_file handling
Date: Sat, 4 Nov 2017 16:00:29 -0500
[Message part 1 (text/plain, inline)]
On Sat, Nov 04, 2017 at 01:55:51AM CDT, Paul Eggert wrote:
>Zev Weiss wrote:
>>I'm wondering how robust (paranoid?) grep really needs to be in the 
>>face of concurrent filesystem modifications.
>
>Not that paranoid. Still, it's better for grep to avoid races when 
>it's easy, as is the case here.

OK, how about the attached patch then -- it both avoids updating 
out_file during directory recursion (simpler, and perhaps a tiny 
performance win) and doesn't add any syscalls or races.


Zev

[0001-grep-simplify-out_file-handling.patch (text/x-diff, attachment)]

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Tue, 05 Nov 2019 23:07:01 GMT) Full text and rfc822 format available.

Notification sent to Zev Weiss <zev <at> bewilderbeest.net>:
bug acknowledged by developer. (Tue, 05 Nov 2019 23:07:02 GMT) Full text and rfc822 format available.

Message #22 received at 29124-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Zev Weiss <zev <at> bewilderbeest.net>
Cc: 29124-done <at> debbugs.gnu.org
Subject: Re: bug#29124: [PATCH] grep: simplify out_file handling
Date: Tue, 5 Nov 2019 15:06:10 -0800
[Message part 1 (text/plain, inline)]
On 11/4/17 2:00 PM, Zev Weiss wrote:

> OK, how about the attached patch then -- it both avoids updating 
> out_file during directory recursion (simpler, and perhaps a tiny 
> performance win) and doesn't add any syscalls or races.

Thanks, I (finally) installed that cleanup patch, and followed it up 
with two more simplifying patches. The combined change looks like the 
attached.
[grep.diff (text/x-patch, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 04 Dec 2019 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 137 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.