GNU bug report logs - #51860
[PATCH] Reinstate Binary file matches to stdout

Previous Next

Package: grep;

Reported by: Duncan Roe <duncan_roe <at> optusnet.com.au>

Date: Mon, 15 Nov 2021 06:08:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 51860 in the body.
You can then email your comments to 51860 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#51860; Package grep. (Mon, 15 Nov 2021 06:08:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Duncan Roe <duncan_roe <at> optusnet.com.au>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Mon, 15 Nov 2021 06:08:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Duncan Roe <duncan_roe <at> optusnet.com.au>
To: bug-grep <at> gnu.org
Cc: Duncan Roe <duncan_roe <at> optusnet.com.au>
Subject: [PATCH] Reinstate Binary file matches to stdout
Date: Mon, 15 Nov 2021 17:06:29 +1100
Revert commit 271793f0.

* NEWS: Mention this.
* doc/grep.texi: remove 1 line added in 271793f0.
* src/grep.c (grep): Send traditional "Binary file FOO matches"
to stdout again.
* tests/surrogate-pair: test for traditional behaviour (test updated
since 271793f0).
* tests/null-byte: Reverted (non-conflicting changes since 271793f0).
* tests/encoding-error, tests/invalid-multibyte-infloop:
* tests/pcre-count, tests/symlink, tests/unibyte-binary:
Reverted (unchanged since 271793f0).
* tests/binary-file-matches: test for traditional behavior (new test).

Signed-off-by: Duncan Roe <duncan_roe <at> optusnet.com.au>
---
 NEWS                            |  6 ++++++
 doc/grep.texi                   |  3 +--
 src/grep.c                      |  8 ++++++--
 tests/binary-file-matches       |  6 +++---
 tests/encoding-error            |  5 +++--
 tests/invalid-multibyte-infloop |  4 +++-
 tests/null-byte                 |  2 +-
 tests/pcre-count                |  5 +++--
 tests/surrogate-pair            | 12 ++++++++----
 tests/symlink                   |  6 +++++-
 tests/unibyte-binary            |  2 +-
 11 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/NEWS b/NEWS
index 2f63071..0173187 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,12 @@ GNU grep NEWS                                    -*- outline -*-
 
 ** Changes in behavior
 
+  The message that a binary file matches is again sent to standard output
+  with the wording as it was before release 3.5. The -I option is available
+  to suppress this message if so desired. The behavior in grep 3.5
+  through 3.7 prevented a user seeing binary matches when using
+  grep -s to suppress error messages.
+
   The -P option is now based on PCRE2 instead of the older PCRE,
   thanks to code contributed by Carlo Arenas.
 
diff --git a/doc/grep.texi b/doc/grep.texi
index c3c4bbf..2482fa0 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1819,8 +1819,7 @@ to output lines even from files that appear to be binary, use the
 @option{-a} or @samp{--binary-files=text} option.
 To eliminate the
 ``Binary file matches'' messages, use the @option{-I} or
-@samp{--binary-files=without-match} option,
-or the @option{-s} or @option{--no-messages} option.
+@samp{--binary-files=without-match} option.
 
 @item
 Why doesn't @samp{grep -lv} print non-matching file names?
diff --git a/src/grep.c b/src/grep.c
index a55194c..0e60023 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -1646,10 +1646,14 @@ grep (int fd, struct stat const *st, bool *ineof)
  finish_grep:
   done_on_match = done_on_match_0;
   out_quiet = out_quiet_0;
-  if (binary_files == BINARY_BINARY_FILES && ! (out_quiet | suppress_errors)
+  if (binary_files == BINARY_BINARY_FILES && !out_quiet
       && (encoding_error_output
           || (0 <= nlines_first_null && nlines_first_null < nlines)))
-    error (0, 0, _("%s: binary file matches"), input_filename ());
+    {
+      printf_errno (_("Binary file %s matches\n"), input_filename ());
+      if (line_buffered)
+        fflush_errno ();
+    }
   return nlines;
 }
 
diff --git a/tests/binary-file-matches b/tests/binary-file-matches
index 7fc4a11..dc4c3e7 100755
--- a/tests/binary-file-matches
+++ b/tests/binary-file-matches
@@ -11,11 +11,11 @@
 
 fail=0
 
-echo "grep: (standard input): binary file matches" > exp \
+echo "Binary file (standard input) matches" > exp \
   || framework_failure_
 
 printf 'a\0' | grep a > out 2> err || fail=1
-compare /dev/null out || fail=1
-compare exp err || fail=1
+compare /dev/null err || fail=1
+compare exp out || fail=1
 
 Exit $fail
diff --git a/tests/encoding-error b/tests/encoding-error
index 2ec71c9..9fb496c 100755
--- a/tests/encoding-error
+++ b/tests/encoding-error
@@ -25,7 +25,8 @@ grep '^A' in >out || fail=1
 compare a out || fail=1
 
 grep '^P' in >out || fail=1
-compare /dev/null out || fail=1
+printf 'Binary file in matches\n' >exp || framework_failure_
+compare exp out || fail=1
 
 grep -I '^P' in >out 2>err || fail=1
 compare /dev/null out || fail=1
@@ -38,7 +39,7 @@ returns_ 1 grep '^X' in >out || fail=1
 compare /dev/null out || fail=1
 
 grep . in >out || fail=1
-cat a j >exp || framework_failure_
+(cat a j && printf 'Binary file in matches\n') >exp || framework_failure_
 compare exp out || fail=1
 
 grep -I . in >out 2>err || fail=1
diff --git a/tests/invalid-multibyte-infloop b/tests/invalid-multibyte-infloop
index b4ad14b..5b3bdfc 100755
--- a/tests/invalid-multibyte-infloop
+++ b/tests/invalid-multibyte-infloop
@@ -24,10 +24,12 @@ else
   test $status -eq 2
 fi || fail=1
 
+echo 'Binary file input matches' >binary-file-matches
+
 LC_ALL=en_US.UTF-8 timeout 10 grep -F $(encode A) input > out
 status=$?
 if test $status -eq 0; then
-  compare /dev/null out
+  compare binary-file-matches out
 elif test $status -eq 1; then
   compare_dev_null_ /dev/null out
 else
diff --git a/tests/null-byte b/tests/null-byte
index d86c249..9354aaf 100755
--- a/tests/null-byte
+++ b/tests/null-byte
@@ -56,7 +56,7 @@ echo xxx >exp || framework_failure_
 grep xxx in >out || fail=1
 compare exp out || fail=1
 
-printf 'xxx\n' > exp || framework_failure_
+printf '%s\n' xxx 'Binary file in matches' > exp || framework_failure_
 grep -E 'xxx|z' in >out || fail=1
 compare exp out || fail=1
 
diff --git a/tests/pcre-count b/tests/pcre-count
index 9eda54b..656780e 100755
--- a/tests/pcre-count
+++ b/tests/pcre-count
@@ -17,9 +17,10 @@ printf 'a\n%032768d\nb\0\n%032768d\na\n' 0 0 > in || framework_failure_
 
 # grep will discover that the input is a binary file sooner if the
 # page size is larger, so allow for either possible output.
-printf 'a\n' >exp1a || framework_failure_
+printf 'a\nBinary file in matches\n' >exp1a || framework_failure_
+printf 'Binary file in matches\n' >exp1b || framework_failure_
 LC_ALL=C grep -P 'a' in >out || fail=1
-compare exp1a out || compare /dev/null out || fail=1
+compare exp1a out || compare exp1b out || fail=1
 
 printf '2\n' >exp2 || framework_failure_
 LC_ALL=C grep -Pc 'a' in >out || fail=1
diff --git a/tests/surrogate-pair b/tests/surrogate-pair
index a91fa36..7f8373a 100755
--- a/tests/surrogate-pair
+++ b/tests/surrogate-pair
@@ -26,6 +26,10 @@ fail=0
 s_pair=$(printf '\360\220\220\205')
 printf '%s\n' "$s_pair" > in || framework_failure_
 
+# On platforms where wchar_t is only 16 bits, wchar_t cannot represent
+# the character encoded in 'in', so accept that behavior too.
+printf 'Binary file in matches\n' > out16 || framework_failure_
+
 LC_ALL=en_US.UTF-8
 export LC_ALL
 
@@ -41,12 +45,12 @@ grep . in > out 2> err || fail=1
 # On platforms where wchar_t is only 16 bits, wchar_t cannot represent
 # the character encoded in 'in'.
 
-# On such old systems the above prints nothing on stdout and a diagnostic
-# on stderr.  In that case, return early; otherwise, the following tests
+# On such old systems the above prints  diagnostic on stdout.
+# In that case, return early; otherwise, the following tests
 # would all fail.
 io_pair=$(cat out):$(cat err)
 case $io_pair in
-  :'grep: in: binary file matches') Exit $fail;;
+  'Binary file in matches:') Exit $fail;;
   $s_pair:) ;;
   *) fail_ "unexpected output: $io_pair"; fail=1;;
 esac
@@ -54,7 +58,7 @@ esac
 # Also test whether a surrogate-pair in the search string works.
 for opt in '' -i -E -F -iE -iF; do
   grep --file=in $opt in > out 2>&1 || fail=1
-  compare out in || fail=1
+  compare out in || compare out out16 || fail=1
 done
 
 Exit $fail
diff --git a/tests/symlink b/tests/symlink
index b580ce7..427dfed 100755
--- a/tests/symlink
+++ b/tests/symlink
@@ -58,7 +58,11 @@ do
 
     printf "$exp" >exp || framework_failure_
 
-    LC_ALL=C sort grepout >out || fail=1
+    LC_ALL=C sort grepout >out-t || fail=1
+
+    # Ignore "Binary file d matches" on systems for which
+    # reading from a directory actually succeeds.
+    grep -v Binary out-t > out; case $? in 0|1) ;; *) fail=1;; esac
 
     compare exp out || fail=1
   done
diff --git a/tests/unibyte-binary b/tests/unibyte-binary
index f76276f..7e8f2aa 100755
--- a/tests/unibyte-binary
+++ b/tests/unibyte-binary
@@ -22,7 +22,7 @@ require_unibyte_locale
 fail=0
 
 printf 'a\n\200\nb\n' >in || framework_failure_
-printf 'a\n' >exp || framework_failure_
+printf 'a\nBinary file in matches\n' >exp || framework_failure_
 grep . in >out || fail=1
 
 # In some unibyte locales, \200 is an encoding error;
-- 
2.33.1





Information forwarded to bug-grep <at> gnu.org:
bug#51860; Package grep. (Tue, 16 Nov 2021 18:09:01 GMT) Full text and rfc822 format available.

Message #8 received at 51860 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Duncan Roe <duncan_roe <at> optusnet.com.au>
Cc: 51860 <at> debbugs.gnu.org
Subject: Re: bug#51860: [PATCH] Reinstate Binary file matches to stdout
Date: Tue, 16 Nov 2021 10:08:33 -0800
The change that you're objecting to was put in to fix issues with the 
way 'grep' used to behave[1][2]. Could you describe what problems the 
change caused for you?

[1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=29668
[2] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=33552




Information forwarded to bug-grep <at> gnu.org:
bug#51860; Package grep. (Fri, 19 Nov 2021 01:20:01 GMT) Full text and rfc822 format available.

Message #11 received at 51860 <at> debbugs.gnu.org (full text, mbox):

From: Duncan Roe <duncan_roe <at> optusnet.com.au>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 51860 <at> debbugs.gnu.org
Subject: Re: bug#51860: [PATCH] Reinstate Binary file matches to stdout
Date: Fri, 19 Nov 2021 12:19:12 +1100
Bug#33552 refers to -I not working which was already fixed by c3245083 before
271793f0. (c3245083 shows up before 271793f0 in `git log` although commit times
are in reverse order).

With regard to Bug#29668: they can use `grep -s -I`.
Their real problem was that `-I` didn't work.

Now to the problems the change caused for me:

Background: I have a collection[1] of find/grep commands all designed to be
piped into `less`. They all use `grep --line-buffered` (since `less` to the tty
will be line buffered), grep -s (to avoid stderr output which would mess up
`less`) and `sed -u` (approximately line buffered as well).

My distro is Slackware and most of the time I use the 14.2 bare metal system
which has grep 2.25. So I didn't see the altered grep behaviour until I had a
Slackware Current (15.0 Beta) VM.

On the 15.0 VM I was searching in a build directory for why a program was
complaining about a shared library it had no business using. So I searched via
the `yfl` symlink (which "always" finds everything) and found ... nothing. Not
even a binary match in the executable. I spent some time reviewing the `sfl`
script before I twigged it was changed grep behaviour.

(BTW the executable was the only match. I had left LIBS in the environment from
something else I was working on).

I could go on, but hope I have said enough.

[1] https://github.com/duncan-roe/command_line_tools/blob/master/bin/sfl
    There are 21 symlinks to this script, for a total of 22 commands.




Information forwarded to bug-grep <at> gnu.org:
bug#51860; Package grep. (Fri, 19 Nov 2021 09:23:01 GMT) Full text and rfc822 format available.

Message #14 received at 51860 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Duncan Roe <duncan_roe <at> optusnet.com.au>
Cc: 51860 <at> debbugs.gnu.org
Subject: Re: bug#51860: [PATCH] Reinstate Binary file matches to stdout
Date: Fri, 19 Nov 2021 01:22:23 -0800
On 11/18/21 17:19, Duncan Roe wrote:
> With regard to Bug#29668: they can use `grep -s -I`.
> Their real problem was that `-I` didn't work.

The problem I was referring to was described here:

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=29668#17

This is the problem of what an ordinary 'grep' user would expect with 
just 'grep PATTERN FILE | wc', without any options. When the "Binary 
file FILE matches" message is sent to 'wc' its information is lost to 
the user. When the message is sent to stderr, the user sees it and has a 
helpful indication that the usage is problematic.

Using 'grep -s -I' wouldn't have helped with this problem.


> They all use `grep --line-buffered` (since `less` to the tty
> will be line buffered), grep -s (to avoid stderr output which would mess up
> `less`)

You can use "grep PATTERN FILE 2>&1 | less". This shouldn't mess up 'less'.




Information forwarded to bug-grep <at> gnu.org:
bug#51860; Package grep. (Sun, 21 Nov 2021 01:40:02 GMT) Full text and rfc822 format available.

Message #17 received at 51860 <at> debbugs.gnu.org (full text, mbox):

From: Duncan Roe <duncan_roe <at> optusnet.com.au>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 51860 <at> debbugs.gnu.org
Subject: Re: bug#51860: [PATCH] Reinstate Binary file matches to stdout
Date: Sun, 21 Nov 2021 12:38:51 +1100
On Fri, Nov 19, 2021 at 01:22:23AM -0800, Paul Eggert wrote:
> On 11/18/21 17:19, Duncan Roe wrote:
> > With regard to Bug#29668: they can use `grep -s -I`.
> > Their real problem was that `-I` didn't work.
>
> The problem I was referring to was described here:
>
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=29668#17
>
> This is the problem of what an ordinary 'grep' user would expect with just
> 'grep PATTERN FILE | wc', without any options.

"ordinary 'grep' user"? Like in "grep user who doesn't RTFM"?

This was *documented* *behaviour*. And it still is documented behaviour, except
now grep doesn't abide by its documentation(*).

> When the "Binary file FILE
> matches" message is sent to 'wc' its information is lost to the user. When
> the message is sent to stderr, the user sees it and has a helpful indication
> that the usage is problematic.

That's a nice idea, but I suggest it was a mis-step to unilaterally change grep
to achieve it. A new (GNU extension) option to send binary file matches to
stderr would have been better IMHO.

I'd further suggest biaary matches output to stderr by the new option be not
affected by the setting of `-s`, only by `-I`. This reinstates orthogonality of
-s and -I which 271793f0 broke. Documenting the new option would be easy. On
which subject:

Quoting commit 271793f0
> Adjust tests to match new behavior.  In all cases this
> simplifies the tests, which is a good sign.
A good sign? Maybe. I have found that a much better sign is "How easy is
updating the documentation to reflect this change?". If it's awkward, perhaps
the change was a bad idea.

The man page is not yet updated to reflect the new `-s` behaviour. Nor is the
texi(*). For option `-s` the man page says:
> Suppress error messages about nonexistent or unreadable files.
Binary file matches are neither of the above. You need to update the man page to
explain that binary file matches are now treated as errors. Personally, I'd find
that embasassing.

>
> Using 'grep -s -I' wouldn't have helped with this problem.

Why did you offer an updated `-s` as a solution then?
>
>
> > They all use `grep --line-buffered` (since `less` to the tty
> > will be line buffered), grep -s (to avoid stderr output which would mess up
> > `less`)
>
> You can use "grep PATTERN FILE 2>&1 | less". This shouldn't mess up 'less'.

As it happens, I find I can. I don't need `grep -s` inside `sfl` - it's a
holdover from before when I first wrote `sfl` last century. Back then, I was
using `grep -swn` to hide errors about unreadable directories.

Since you have provided me with a workaround, I have no further interset in this
bug. I still think the 271793f0 functionality change was wrong and could and
should be improved on. But that's up to you.

(*) I'm discounting the 1-liner in the FAQ as wholly inadequate.




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Sun, 21 Nov 2021 07:00:02 GMT) Full text and rfc822 format available.

Notification sent to Duncan Roe <duncan_roe <at> optusnet.com.au>:
bug acknowledged by developer. (Sun, 21 Nov 2021 07:00:02 GMT) Full text and rfc822 format available.

Message #22 received at 51860-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Duncan Roe <duncan_roe <at> optusnet.com.au>
Cc: 51860-done <at> debbugs.gnu.org
Subject: Re: bug#51860: [PATCH] Reinstate Binary file matches to stdout
Date: Sat, 20 Nov 2021 22:59:43 -0800
[Message part 1 (text/plain, inline)]
On 11/20/21 17:38, Duncan Roe wrote:

> This was *documented* *behaviour*.

No, it wasn't documented whether the "Binary file matches" message is 
sent to stdout or to stderr. But now that you mention it, this should be 
documented. First patch attached, and installed into the development 
version.


> I'd further suggest biaary matches output to stderr by the new option be not
> affected by the setting of `-s`, only by `-I`. This reinstates orthogonality of
> -s and -I which 271793f0 broke.

Hmm, you have a point that -s is documented to "Suppress error messages 
about nonexistent or unreadable files" and it's plausible to say that 
the "binary file matches" diagnostic, although it is about a binary file 
that 'grep' has some trouble reading, is not about a file that is 
"unreadable". So, let's fix the code to match that part of the spec 
again. Second patch attached and installed.

> Since you have provided me with a workaround, I have no further interset in this
> bug.

OK, closing the bug report. Thanks for reporting the issues.
[0001-doc-binary-file-matches-stderr-Bug-51860.patch (text/x-patch, attachment)]
[0002-grep-s-does-not-suppress-binary-file-matches.patch (text/x-patch, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#51860; Package grep. (Tue, 23 Nov 2021 00:38:01 GMT) Full text and rfc822 format available.

Message #25 received at 51860-done <at> debbugs.gnu.org (full text, mbox):

From: Duncan Roe <duncan_roe <at> optusnet.com.au>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 51860-done <at> debbugs.gnu.org
Subject: Re: bug#51860: [PATCH] Reinstate Binary file matches to stdout
Date: Tue, 23 Nov 2021 11:37:31 +1100
On Sat, Nov 20, 2021 at 10:59:43PM -0800, Paul Eggert wrote:
> On 11/20/21 17:38, Duncan Roe wrote:
> 
> > I'd further suggest biaary matches output to stderr by the new option be not
> > affected by the setting of `-s`, only by `-I`. This reinstates orthogonality of
> > -s and -I which 271793f0 broke.
> 
> Hmm, you have a point that -s is documented to "Suppress error messages
> about nonexistent or unreadable files" and it's plausible to say that the
> "binary file matches" diagnostic, although it is about a binary file that
> 'grep' has some trouble reading, is not about a file that is "unreadable".
> So, let's fix the code to match that part of the spec again. Second patch
> attached and installed.
> 
Excellent. But I think you still need the patch below.

Cheers ... Duncan.


=======================

diff --git a/doc/grep.texi b/doc/grep.texi
index c3c4bbf..2482fa0 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1819,8 +1819,7 @@ to output lines even from files that appear to be binary, use the
 @option{-a} or @samp{--binary-files=text} option.
 To eliminate the
 ``Binary file matches'' messages, use the @option{-I} or
-@samp{--binary-files=without-match} option,
-or the @option{-s} or @option{--no-messages} option.
+@samp{--binary-files=without-match} option.
 
 @item
 Why doesn't @samp{grep -lv} print non-matching file names?




Information forwarded to bug-grep <at> gnu.org:
bug#51860; Package grep. (Tue, 23 Nov 2021 02:17:01 GMT) Full text and rfc822 format available.

Message #28 received at 51860-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Duncan Roe <duncan_roe <at> optusnet.com.au>
Cc: 51860-done <at> debbugs.gnu.org
Subject: Re: bug#51860: [PATCH] Reinstate Binary file matches to stdout
Date: Mon, 22 Nov 2021 18:16:35 -0800
On 11/22/21 16:37, Duncan Roe wrote:
> I think you still need the patch below.

Thanks for reporting that; I installed the patch.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 21 Dec 2021 12:24:10 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 127 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.