GNU bug report logs - #38792
man grep

Previous Next

Package: grep;

Reported by: Martin Simons <martin <at> webhuis.nl>

Date: Sun, 29 Dec 2019 15:19:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 38792 in the body.
You can then email your comments to 38792 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#38792; Package grep. (Sun, 29 Dec 2019 15:19:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Martin Simons <martin <at> webhuis.nl>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sun, 29 Dec 2019 15:19:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Martin Simons <martin <at> webhuis.nl>
To: bug-grep <at> gnu.org
Subject: man grep
Date: Sun, 29 Dec 2019 15:24:47 +0100
Dear Friend,

At the moment I am working part time as a Unix / Linux teacher and also 
as an AIX system administrator.
Privately I am using Debian / Ubuntu and I use it while in class, giving 
presentations and on the fly examples of statements and so on.

In class I tell the students how grep should be used in a directory 
called test. Please find the contents of the directory and some files 
below. My example of use is as follows:
martin <at> laptop:~/test$ grep 'Jantje*' school.txt
Which delivers the desired output:
Jantje
Ik las dat Jantje
Jantje voortaan op tijd op school komt,
Hoogachtend, Jantjes vader, Piet Bel.

So far for the class. In reality, however, in most occasions the 
statement will be issued like this (objections are being laughed away):
martin <at> laptop:~/test$ grep Jantje* school.txt
Delivering this undesired output:
Jantjes:Jantje zag eens pruimen hangen
Jantjes:en Jantje wilde pruimen plukken voor zijn moeder
Jantjes:toen zei Jantjes moeder
Jantjes:pas toch op Jantje!
school.txt:Jantje
school.txt:Ik las dat Jantje
school.txt:Jantje voortaan op tijd op school komt,
school.txt:Hoogachtend, Jantjes vader, Piet Bel.

In trying to give more weight to the argument I pointed the students to 
the man page of grep, but it struck me that there is not a single 
reference to the need of using quotes in the search pattern. This does 
not help.

I checked the AIX man page for grep and they at least give some, not 
all, examples of using quotes around the search pattern.

The example given above makes me assume grave errors occur in production 
environments, just because users are not lead to use search patterns 
right.

It may no be the task of the grep project to provide a man page, but 
even then I feel there is an opportunity for improvement here by 
providing a basic man page with a couple of good examples. I would be 
more than glad to contribute.

Files and Contents of test.
Diectory test:
-rw-r--r-- 1 martin martin 208 May 20  2019 Jantje
-rw-r--r-- 1 martin martin 124 Apr 30  2019 Jantjes
-rw-r--r-- 1 martin martin 242 Jun 27  2019 school.txt

The contents of Jantjes and school.txt are:
file Jantjes:
Jantje zag eens pruimen hangen
en Jantje wilde pruimen plukken voor zijn moeder
toen zei Jantjes moeder
pas toch op Jantje!

file school.txt:
Geachte Heer de Vries,

Jantje

Ik las dat Jantje
weer eens te laat op school kwam.

Ik kan u verzekeren dat wij er alles aan
zullen doen om er voor te zorgen dat
Jantje voortaan op tijd op school komt,

Hoogachtend, Jantjes vader, Piet Bel.

Met vriendelijke groet,
Martin.

LinkedIn: https://www.linkedin.com/in/martinsimons1/
GitHub:   https://github.com/Webhuis/CFEngine-Roadshow/tree/master/
Twitter:  https://twitter.com/webhuis @Webhuis #TheCFEngineRoadshow





Information forwarded to bug-grep <at> gnu.org:
bug#38792; Package grep. (Sun, 29 Dec 2019 18:47:02 GMT) Full text and rfc822 format available.

Message #8 received at 38792 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Martin Simons <martin <at> webhuis.nl>
Cc: 38792 <at> debbugs.gnu.org
Subject: Re: bug#38792: man grep
Date: Sun, 29 Dec 2019 10:46:10 -0800
[Message part 1 (text/plain, inline)]
On 12/29/19 6:24 AM, Martin Simons wrote:
> It may not be the task of the grep project to provide a man page, but even then I
> feel there is an opportunity for improvement here

Right on both counts. I installed the attached patches to improve things a bit
in the next version of grep. Thanks for reporting the problem.

The GNU guidelines deprecate man pages, so extended examples should go into
doc/grep.texi where students can view them at
<https://www.gnu.org/software/grep/manual/>. So if you'd like to improve the
examples further, please suggest them as patches to doc/grep.texi in "git
format-patch" style. If they're extensive we'll also need to get the copyright
formalities done which I can fill you in about if you're interested in pursuing
this.
[0001-doc-document-quoting-better.patch (text/x-patch, attachment)]
[0002-doc-fix-typo-in-previous-patch.patch (text/x-patch, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#38792; Package grep. (Sun, 29 Dec 2019 19:01:01 GMT) Full text and rfc822 format available.

Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane.chazelas <at> gmail.com>
To: bug-grep <at> gnu.org
Subject: Re: bug#38792: man grep
Date: Sun, 29 Dec 2019 18:56:30 +0000
2019-12-29 15:24:47 +0100, Martin Simons:
[...]
> martin <at> laptop:~/test$ grep 'Jantje*' school.txt
> Which delivers the desired output:
> Jantje

It also matches on Jantj? The * regexp operator matches 0 or
more of the preceding atom. So e* matches 0 or more "e"s. It's
not to confused with the "*" shell wildcard operator.

In effect, that's equivalent to

   grep Jantj school.txt

[...]
> martin <at> laptop:~/test$ grep Jantje* school.txt
> Delivering this undesired output:
> Jantjes:Jantje zag eens pruimen hangen
> Jantjes:en Jantje wilde pruimen plukken voor zijn moeder
> Jantjes:toen zei Jantjes moeder
> Jantjes:pas toch op Jantje!
> school.txt:Jantje
> school.txt:Ik las dat Jantje
> school.txt:Jantje voortaan op tijd op school komt,
> school.txt:Hoogachtend, Jantjes vader, Piet Bel.
> 
> In trying to give more weight to the argument I pointed the students to the
> man page of grep, but it struck me that there is not a single reference to
> the need of using quotes in the search pattern. This does not help.

All grep receives is a list of arguments. That * needs to be
quoted for the shell not to treat it as a glob operator, not for
grep.

You'll find explanation of shell globbing (aka filename
generation, aka pathname expansion, aka filename expansion) and
the effect of quoting on it in your shell manual.

For the GNU shell, that's with:

   info bash 'filename expansion'

What needs to be quoted for a given shell varies with the shell
implementation, grep has nothing to do with that.

-- 
Stephane





Information forwarded to bug-grep <at> gnu.org:
bug#38792; Package grep. (Sun, 29 Dec 2019 19:16:02 GMT) Full text and rfc822 format available.

Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane.chazelas <at> gmail.com>
To: bug-grep <at> gnu.org
Subject: Re: bug#38792: man grep
Date: Sun, 29 Dec 2019 19:10:51 +0000
2019-12-29 10:46:10 -0800, Paul Eggert:
> On 12/29/19 6:24 AM, Martin Simons wrote:
> > It may not be the task of the grep project to provide a man page, but even then I
> > feel there is an opportunity for improvement here
> 
> Right on both counts. I installed the attached patches to improve things a bit
> in the next version of grep.
[...]


> From 8b7da49786e613c6ae9a2b299b1ce2187b32ed26 Mon Sep 17 00:00:00 2001
> Subject: [PATCH 1/2] doc: document quoting better
[...]
> +.SH "EXIT STATUS"
> +Normally the exit status is 0 if a line is selected, 1 if no lines
> +were selected, and 2 if an error occurred.
[...]

Note that that wording makes it unclear what the exit status
should be if -o is in use.

[...]
> +$ \fBgrep\fP \-n 'f.*\e.c$' *g*.h /dev/null
[...]

It should be

grep -n -- 'f.*\.c$' *g*.h /dev/null

Or:

grep -ne 'f.*\.c$' -- *g*.h /dev/null

(unless $POSIXLY_CORRECT is set).

grep pattern *.h

is fine in POSIX compliant greps, but not in GNU grep as GNU
getopt*() accept options after non-option arguments. IMO, it's
worth pointing out as it's a common gotchas with GNU utilities.

grep -e pattern *.h

is not fine in any grep.

(why not using -H instead of /dev/null btw?).

[...]
> +@example
> +$ @kbd{grep -n 'f.*\.c$' *g*.h /dev/null}
> +argmatch.h:1:/* definitions and prototypes for argmatch.c
> +@end example
[...]

same in texinfo.

-- 
Stephane





Information forwarded to bug-grep <at> gnu.org:
bug#38792; Package grep. (Mon, 30 Dec 2019 03:00:02 GMT) Full text and rfc822 format available.

Message #17 received at 38792 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Stephane Chazelas <stephane.chazelas <at> gmail.com>
Cc: 38792 <at> debbugs.gnu.org
Subject: Re: bug#38792: man grep
Date: Sun, 29 Dec 2019 18:59:22 -0800
[Message part 1 (text/plain, inline)]
On 12/29/19 11:10 AM, Stephane Chazelas wrote:

> Note that that wording makes it unclear what the exit status
> should be if -o is in use.

It seems reasonably clear that a line would be selected if any part of it is
selected. Anyway, that text is exactly the same as before, so rewordsmithing it
could be a different thread.

> It should be
> 
> grep -n -- 'f.*\.c$' *g*.h /dev/null

Thanks, done in the attached patch.

> (why not using -H instead of /dev/null btw?).

I wanted the example to be portable.
[0001-doc-Add-to-more-complex-example.patch (text/x-patch, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#38792; Package grep. (Mon, 30 Dec 2019 08:56:02 GMT) Full text and rfc822 format available.

Message #20 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane.chazelas <at> gmail.com>
To: bug-grep <at> gnu.org
Subject: Re: bug#38792: man grep
Date: Mon, 30 Dec 2019 08:52:26 +0000
2019-12-29 18:59:22 -0800, Paul Eggert:
[...]
> > It should be
> > 
> > grep -n -- 'f.*\.c$' *g*.h /dev/null
> 
> Thanks, done in the attached patch.
[...]

There were a few other instances. The patch below attempts to
fix those (includes your patch).

I've also replaced find|xargs to the standard and more portable
(and simpler/faster...) find -exec + (GNU find used not to
support that but that was eventually fixed over 10 years ago).

I've replaced the POSIX+XSI ps -e with POSIX ps -A (for BSD
compatibility).

I've moved the grep -H and grep -- FAQ entries to the front so
they are explained before being used.

diff --git a/doc/grep.in.1 b/doc/grep.in.1
index a382966..a91b2a6 100644
--- a/doc/grep.in.1
+++ b/doc/grep.in.1
@@ -1333,13 +1333,18 @@ The following example outputs the location and contents of any line
 containing \*(lqf\*(rq and ending in \*(lq.c\*(rq,
 within all files in the current directory whose names
 contain \*(lqg\*(rq and end in \*(lq.h\*(rq.
-The command also searches the empty file /dev/null,
-so that file names are displayed
+The
+.B \-n
+option outputs line numbers, the
+.B \-\-
+argument treats expansions of \*(lq*g*.h\*(rq starting with \*(lq\-\*(rq
+as file names not options,
+and the empty file /dev/null causes file names to be output
 even if only one file name happens to be of the form \*(lq*g*.h\*(rq.
 .PP
 .in +2n
 .EX
-$ \fBgrep\fP \-n 'f.*\e.c$' *g*.h /dev/null
+$ \fBgrep\fP \-n \-\- 'f.*\e.c$' *g*.h /dev/null
 argmatch.h:1:/* definitions and prototypes for argmatch.c
 .EE
 .in
diff --git a/doc/grep.texi b/doc/grep.texi
index 873b53c..de7d028 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1585,12 +1585,13 @@ showing the location and contents of any line
 containing @samp{f} and ending in @samp{.c},
 within all files in the current directory whose names
 contain @samp{g} and end in @samp{.h}.
-The command also searches the empty file @file{/dev/null},
-so that file names are displayed
+The @option{-n} option outputs line numbers, the @option{--} argument
+treats any later arguments starting with @samp{-} as file names not
+options, and the empty file @file{/dev/null} causes file names to be output
 even if only one file name happens to be of the form @samp{*g*.h}.
 
 @example
-$ @kbd{grep -n 'f.*\.c$' *g*.h /dev/null}
+$ @kbd{grep -n -- 'f.*\.c$' *g*.h /dev/null}
 argmatch.h:1:/* definitions and prototypes for argmatch.c
 @end example
 
@@ -1608,11 +1609,78 @@ Here are some common questions and answers about @command{grep} usage.
 
 @enumerate
 
+@item
+How do I force @command{grep} to print the name of the file?
+
+Append @file{/dev/null}:
+
+@example
+grep 'eli' /etc/passwd /dev/null
+@end example
+
+gets you:
+
+@example
+/etc/passwd:eli:x:2098:1000:Eli Smith:/home/eli:/bin/bash
+@end example
+
+Alternatively, use @option{-H}, which is a GNU extension:
+
+@example
+grep -H 'eli' /etc/passwd
+@end example
+
+Using that trick is generally wanted when using @command{find}, shell
+globbing or other forms of expansions or more generally when we don't
+know in advance what file are being searched in and want that
+information to be returned. Without it
+
+@example
+grep -- pattern *.txt
+@end example
+
+could output the matched lines without indication of which file they
+were found in if there was only one non-hidden @samp{.txt} file in the
+current directory.
+
+@item
+What if a pattern or filename has a leading @samp{-}?
+
+@example
+grep -H -- '--cut here--' *
+@end example
+
+@noindent
+searches for all lines matching @samp{--cut here--}.
+
+@samp{--} marks the end of options. None of the arguments after
+@samp{--} are treated as options even if they start with
+@samp{-}.
+
+Alternatively, one can use @option{-e} to specify the pattern:
+
+@example
+grep -H -e '--cut here--' -- *
+@end example
+
+But @samp{--} is still needed to guard against filenames that
+start with @samp{-}.
+
+Another approach would be to use:
+
+@example
+grep -H -e '--cut here--' ./*
+@end example
+
+Which also guards against a file called @samp{-} (which
+@command{grep} would otherwise interpret as meaning standard
+input).
+
 @item
 How can I list just the names of matching files?
 
 @example
-grep -l 'main' *.c
+grep -l -- 'main' *.c
 @end example
 
 @noindent
@@ -1630,45 +1698,33 @@ grep -r 'hello' /home/gigi
 searches for @samp{hello} in all files
 under the @file{/home/gigi} directory.
 For more control over which files are searched,
-use @command{find}, @command{grep}, and @command{xargs}.
+use @command{find} in combination with @command{grep}.
 For example, the following command searches only C files:
 
 @example
-find /home/gigi -name '*.c' -print0 | xargs -0r grep -H 'hello'
+find /home/gigi -name '*.c' -exec grep -H 'hello' '@{@}' +
 @end example
 
 This differs from the command:
 
 @example
-grep -H 'hello' *.c
+grep -H -- 'hello' *.c
 @end example
 
-which merely looks for @samp{hello} in all files in the current
-directory whose names end in @samp{.c}.
+which merely looks for @samp{hello} in all (non-hidden) files in
+the current directory whose names end in @samp{.c}.
 The @samp{find ...} command line above is more similar to the command:
 
 @example
-grep -rH --include='*.c' 'hello' /home/gigi
+grep -r --include='*.c' 'hello' /home/gigi
 @end example
 
-@item
-What if a pattern has a leading @samp{-}?
-
-@example
-grep -e '--cut here--' *
-@end example
-
-@noindent
-searches for all lines matching @samp{--cut here--}.
-Without @option{-e},
-@command{grep} would attempt to parse @samp{--cut here--} as a list of
-options.
 
 @item
 Suppose I want to search for a whole word, not a part of a word?
 
 @example
-grep -w 'hello' *
+grep -H -w -- 'hello' *
 @end example
 
 @noindent
@@ -1679,7 +1735,7 @@ For more control, use @samp{\<} and
 For example:
 
 @example
-grep 'hello\>' *
+grep -H -- 'hello\>' *
 @end example
 
 @noindent
@@ -1690,38 +1746,17 @@ searches only for words ending in @samp{hello}, so it matches the word
 How do I output context around the matching lines?
 
 @example
-grep -C 2 'hello' *
+grep -H -C 2 -- 'hello' *
 @end example
 
 @noindent
 prints two lines of context around each matching line.
 
-@item
-How do I force @command{grep} to print the name of the file?
-
-Append @file{/dev/null}:
-
-@example
-grep 'eli' /etc/passwd /dev/null
-@end example
-
-gets you:
-
-@example
-/etc/passwd:eli:x:2098:1000:Eli Smith:/home/eli:/bin/bash
-@end example
-
-Alternatively, use @option{-H}, which is a GNU extension:
-
-@example
-grep -H 'eli' /etc/passwd
-@end example
-
 @item
 Why do people use strange regular expressions on @command{ps} output?
 
 @example
-ps -ef | grep '[c]ron'
+ps -Af | grep '[c]ron'
 @end example
 
 If the pattern had been written without the square brackets, it would
@@ -1788,6 +1823,9 @@ Use the special file name @samp{-}:
 cat /etc/passwd | grep 'alain' - /etc/motd
 @end example
 
+To match in a file called @samp{-} in the current directory, use
+@samp{./-}.
+
 @item
 @cindex palindromes
 How to express palindromes in a regular expression?
diff --git a/gnulib b/gnulib
--- a/gnulib
+++ b/gnulib
@@ -1 +1 @@
-Subproject commit 575b0ecbad2f34d5777f9562eebc2d0c815bfc5c
+Subproject commit 575b0ecbad2f34d5777f9562eebc2d0c815bfc5c-dirty






Information forwarded to bug-grep <at> gnu.org:
bug#38792; Package grep. (Mon, 30 Dec 2019 10:00:02 GMT) Full text and rfc822 format available.

Message #23 received at 38792 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Stephane Chazelas <stephane.chazelas <at> gmail.com>
Cc: 38792 <at> debbugs.gnu.org
Subject: Re: bug#38792: man grep
Date: Mon, 30 Dec 2019 01:59:09 -0800
[Message part 1 (text/plain, inline)]
Thanks, I installed the attached, which is a bit more conservative than what you
suggested but which I hope catches all the issues.
[0001-doc-robustify-some-examples.patch (text/x-patch, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#38792; Package grep. (Mon, 30 Dec 2019 17:55:01 GMT) Full text and rfc822 format available.

Message #26 received at 38792 <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane.chazelas <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 38792 <at> debbugs.gnu.org
Subject: Re: bug#38792: man grep
Date: Mon, 30 Dec 2019 17:54:26 +0000
2019-12-30 01:59:09 -0800, Paul Eggert:
> Thanks, I installed the attached, which is a bit more conservative than what you
> suggested but which I hope catches all the issues.
[...]
>  @example
> -grep -l 'main' *.c
> +grep -l 'main' test-*.c
>  @end example
[...]

Thanks,

though I find it's a bit of a shame to /hide/ the problem like
that here.

It's so common for people to forget that "--" (I just came
across https://en.wikipedia.org/wiki/Glob_(programming) which
does that mistake in the very first paragraph for instance) that
it would be beneficial IMO if the manual showed the right way to do,
especially for the GNU implementation of grep which makes the
problem worse by accepting options after non-option arguments.

Please consider making it


grep -l -- 'main' *.c

to raise awareness and try and teach people "the right way".

-- 
Stephane




Information forwarded to bug-grep <at> gnu.org:
bug#38792; Package grep. (Mon, 30 Dec 2019 18:46:01 GMT) Full text and rfc822 format available.

Message #29 received at 38792 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Stephane Chazelas <stephane.chazelas <at> gmail.com>
Cc: 38792 <at> debbugs.gnu.org
Subject: Re: bug#38792: man grep
Date: Mon, 30 Dec 2019 10:45:35 -0800
On 12/30/19 9:54 AM, Stephane Chazelas wrote:
> Please consider making it
> 
> 
> grep -l -- 'main' *.c
> 
> to raise awareness and try and teach people "the right way".

Interactively, I invariably use 'grep foo *.c' without the '--' and it works
just fine because the files under my control have names that do not start with
'-'. This is standard practice and simplifies the common-usage examples. So
although '--' should (and does) appear in examples to document a defensive
measure for scripts that must work even in nonstandard environments, '--'
needn't be used in examples everywhere that it could conceivably apply.




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Mon, 21 Sep 2020 19:11:02 GMT) Full text and rfc822 format available.

Notification sent to Martin Simons <martin <at> webhuis.nl>:
bug acknowledged by developer. (Mon, 21 Sep 2020 19:11:02 GMT) Full text and rfc822 format available.

Message #34 received at 38792-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Stephane Chazelas <stephane.chazelas <at> gmail.com>
Cc: 38792-done <at> debbugs.gnu.org
Subject: Re: bug#38792: man grep
Date: Mon, 21 Sep 2020 12:09:56 -0700
Discussion on this worthsmithing issue died down in December so I'm taking the 
liberty of closing the bug report. We can reopen it or start a new one as necessary.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 20 Oct 2020 11:24:14 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 182 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.