GNU bug report logs - #30525
Unexpected matches for input data from a patch file

Previous Next

Package: grep;

Reported by: SF Markus Elfring <elfring <at> users.sourceforge.net>

Date: Mon, 19 Feb 2018 14:33:02 UTC

Severity: normal

Tags: notabug

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 30525 in the body.
You can then email your comments to 30525 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#30525; Package grep. (Mon, 19 Feb 2018 14:33:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to SF Markus Elfring <elfring <at> users.sourceforge.net>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Mon, 19 Feb 2018 14:33:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: SF Markus Elfring <elfring <at> users.sourceforge.net>
To: bug-grep <at> gnu.org
Subject: Unexpected matches for input data from a patch file
Date: Mon, 19 Feb 2018 15:32:38 +0100
Hello,

I have tried the command “`printf -- '-\tif\n'|grep -E '^-\s+[^i]'`” out
with the version “3.1-1.4”. I get no match which I expect in this use case.

But I wonder why matches are displayed when if lines are passed from
a patch file instead. Should unwanted characters be also excluded by such
a small regular expression here?

Regards,
Markus




Information forwarded to bug-grep <at> gnu.org:
bug#30525; Package grep. (Mon, 19 Feb 2018 20:17:02 GMT) Full text and rfc822 format available.

Message #8 received at 30525 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: SF Markus Elfring <elfring <at> users.sourceforge.net>, 30525 <at> debbugs.gnu.org
Subject: Re: bug#30525: Unexpected matches for input data from a patch file
Date: Mon, 19 Feb 2018 12:16:25 -0800
SF Markus Elfring wrote:
> I have tried the command “`printf -- '-\tif\n'|grep -E '^-\s+[^i]'`” out
> with the version “3.1-1.4”. I get no match which I expect in this use case.
> 
> But I wonder why matches are displayed when if lines are passed from
> a patch file instead.

I assume it's because the if-lines match the regular expression in question, 
which is as it should be. If not, could you give a self-contained example 
illustrating the bug? Thanks.




Information forwarded to bug-grep <at> gnu.org:
bug#30525; Package grep. (Wed, 28 Feb 2018 08:24:02 GMT) Full text and rfc822 format available.

Message #11 received at 30525 <at> debbugs.gnu.org (full text, mbox):

From: SF Markus Elfring <elfring <at> users.sourceforge.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>, 30525 <at> debbugs.gnu.org
Subject: Re: bug#30525: Unexpected matches for input data from a patch file
Date: Wed, 28 Feb 2018 09:23:20 +0100
> I assume it's because the if-lines match the regular expression in question,
> which is as it should be.

I got other expectations for a desired software behaviour.

I should take possessive quantifiers (or atomic grouping) better into account.


> If not, could you give a self-contained example illustrating the bug?

Which test result do you get for the command example
“printf -- '\t\tif\n'|grep -E '^\s+[^i]'”?


* Does the tool “grep” output any extra colour information also for
  matched tab characters?

* Is the match marking just missing in the display from a “selection”
  of terminal programs for them?


A similar clarification attempt resulted in a bit of better understanding
for the software situation.
https://github.com/beyondgrep/ack2/issues/661

Regards,
Markus




Information forwarded to bug-grep <at> gnu.org:
bug#30525; Package grep. (Wed, 28 Feb 2018 16:31:02 GMT) Full text and rfc822 format available.

Message #14 received at 30525 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: SF Markus Elfring <elfring <at> users.sourceforge.net>, 30525 <at> debbugs.gnu.org
Subject: Re: bug#30525: Unexpected matches for input data from a patch file
Date: Wed, 28 Feb 2018 08:29:57 -0800
SF Markus Elfring wrote:

> Which test result do you get for the command example
> “printf -- '\t\tif\n'|grep -E '^\s+[^i]'”?

I get a line like this:

		if

and this is the correct answer.

> * Does the tool “grep” output any extra colour information also for
>    matched tab characters?

Not with that test case, no. There is no color information at all. grep's 
--color option was not used.

> A similar clarification attempt resulted in a bit of better understanding
> for the software situation.
> https://github.com/beyondgrep/ack2/issues/661

That's a long page, which I don't have the patience to read to the end. As near 
as I can make out, the basic problem is that you didn't understand how grep is 
supposed to work, and reported its behavior as a bug. However, as far as I can 
see from the above example, grep is working as specified.




Information forwarded to bug-grep <at> gnu.org:
bug#30525; Package grep. (Thu, 01 Mar 2018 07:53:01 GMT) Full text and rfc822 format available.

Message #17 received at 30525 <at> debbugs.gnu.org (full text, mbox):

From: SF Markus Elfring <elfring <at> users.sourceforge.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>, 30525 <at> debbugs.gnu.org
Subject: Re: bug#30525: Unexpected matches for input data from a patch file
Date: Thu, 1 Mar 2018 08:52:13 +0100
> and this is the correct answer.

This view is appropriate for the core functionality.


>> * Does the tool “grep” output any extra colour information also for
>>    matched tab characters?
> 
> Not with that test case, no.

I am still unsure.


> There is no color information at all.

I suggest to take another look there.


> grep's --color option was not used.

The matched characters are marked in red (for other search patterns)
on my test system even if this command parameter is not passed explicitly.
It seems that it belongs to my default configuration for this program.

Will the visual representation change anywhere (because of escape sequences)
if you add it?


> As near as I can make out, the basic problem is
> that you didn't understand how grep is supposed to work,
> and reported its behavior as a bug.

My application knowledge was improvable.


> However, as far as I can see from the above example,
> grep is working as specified.

I got doubts from auxiliary functionality of match colouring.
There are further challenges to consider for special characters.

Would it make sense to replace them by printable variants?
https://en.wikipedia.org/wiki/C0_and_C1_control_codes
https://en.wikipedia.org/wiki/Unicode_control_characters#Control_pictures

Regards,
Markus




Information forwarded to bug-grep <at> gnu.org:
bug#30525; Package grep. (Thu, 01 Mar 2018 09:07:02 GMT) Full text and rfc822 format available.

Message #20 received at 30525 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: SF Markus Elfring <elfring <at> users.sourceforge.net>,
 Paul Eggert <eggert <at> cs.ucla.edu>, 30525 <at> debbugs.gnu.org
Subject: Re: bug#30525: Unexpected matches for input data from a patch file
Date: Thu, 1 Mar 2018 02:06:48 -0700
Hello Markus,

I believe there are actually several different issues here,
perhaps it's worth stating them explicitly to ensure we're
on the same page.


On 2018-03-01 12:52 AM, SF Markus Elfring wrote:
>>> * Does the tool “grep” output any extra colour information also for
>>>     matched tab characters?
[...]
>> grep's --color option was not used.
> 
> The matched characters are marked in red (for other search patterns)
> on my test system even if this command parameter is not passed explicitly.
[...]
> There are further challenges to consider for special characters.
> Would it make sense to replace them by printable variants?

First,

grep will print color information in the following situations:
1. If you use "--color=always"
2. If you use "--color=auto" and the output is a terminal
3. If you don't specify "--color" at all, and environment
variable GREP_OPTIONS is empty, and the output is a terminal
(then "--color=auto" is assumed).

Observe the following:

If you type this on the terminal, the letter "A" should be colored:

  $ printf "AB\n" | grep A
  AB

With this command, grep's output is a pipe (not a terminal),
and by default there will be no color:

  $ printf "AB\n" | grep A | cat
  AB

You can force color output with:

  $ printf "AB\n" | grep --color=always A | cat
  AB

And you can examine the color escape sequences with:

  $ printf "AB\n" | grep --color=always A | od -An -c
   033   [   0   1   ;   3   1   m 033   [   K   A 033   [   m 033
     [   K   B  \n

The characters "\033[01;31m" are the sequence to change color,
and "\033[m" is the sequence to reset the colors.
These are technically called "ansi terminal control escape sequences", 
more here: https://en.wikipedia.org/wiki/ANSI_escape_code .

Therefore, when discussing grep's coloring options
it's important to say if the output is a terminal or not,
and whether coloring is on or off (and when troubleshooting,
it is best to explicitly use --color=XXX).



Second,
When coloring is enabled (e.g. with "--color=always"), grep will
surround the TAB characters with the color escape sequences.
Observe the following:

  $ printf "\t\t\n" | grep -E --color=always '\s+' | od -An -c
   033   [   0   1   ;   3   1   m 033   [   K  \t  \t 033   [   m
   033   [   K  \n

Notice there is an ansi color escape sequence, followed by two tabs 
(\t), followed by "reset color" sequence, followed by "\n".




Third,
Grep's default coloring is red text and default background.
But TAB (and space) are empty characters - they do not have text.
Because the default background color is not changed, you will not
see them highlighted with a color.
You can change the default color with GREP_COLORS environment variables.

For example:

Here you should see "A" and "B" in color,
with empty (non-colored) spaces between them:

  $ printf "A \tB\n" | grep --color=always '.*'
  A       B

You can force the background color to be something else
like so:

  $ printf "A \tB\n" | GREP_COLORS="mt=41" grep --color=always '.*'
  A       B

The above command should print "A" and "B" with red background
and default text color (See the next item regarding the white-space 
colors). To learn more about using GREP_COLORS, read "man grep".



Fourth,
This is an unexpected "gotcha" - some terminal programs
do NOT color tab characters at all! they just move the cursor,
while others print multiple spaces which are colored.

(by terminal programs I mean "xterm" or "gnome-terminal" or "konsole",
and this is also affected by using tmux or screen, etc.).

To test this, try the following on several terminals:

  printf "\033[41m A B\tC\n\033[m"

This sequences means:
background red color, space, A, space, B, tab, C, new line, reset-color.


On gnome-terminal, I get the entire line in red background.
On xterm, I get black space between B and C - meaning TAB is just moving 
the cursor.
You might see different results on your terminal - which means "grep" is 
not the problem at all here.



Fifth and last,
You asked about replacing non-printable characters.
This is easy enough to do with existing programs,
so not likely to be added as a new option to grep.

If you just want to replace TAB characters:

  printf "\t\tif\n" | grep --color=always -E '^\s+[^i]' | cat -T
or
  printf "\t\tif\n" | grep --color=always -E '^\s+[^i]' | tr '\t' x

To replace other characters, add more characters to 'tr', or something 
more complicated, use sed:

  printf "\t\tif\n" | grep --color=always -E '^\s+[^i]' \
                          | sed 's/\t/<TAB>/g ; s/ /<SPACE>/g'



Hope this helps,
regards,
 - assaf









Information forwarded to bug-grep <at> gnu.org:
bug#30525; Package grep. (Fri, 02 Mar 2018 10:11:01 GMT) Full text and rfc822 format available.

Message #23 received at 30525 <at> debbugs.gnu.org (full text, mbox):

From: "SF Markus Elfring" <elfring <at> users.sourceforge.net>
To: "Assaf Gordon" <assafgordon <at> gmail.com>, 30525 <at> debbugs.gnu.org
Cc: Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#30525: Unexpected matches for input data from a patch file
Date: Fri, 2 Mar 2018 11:10:16 +0100
> I believe there are actually several different issues here,

Yes. - I agree.


> perhaps it's worth stating them explicitly to ensure we're
> on the same page.

I thank you for your very detailed answer.


> Grep's default coloring is red text and default background.

This setting can be fine in several use cases.


> But TAB (and space) are empty characters - they do not have text.

I know this detail also.


> Because the default background color is not changed,

Can this display detail be adjusted anyhow (like it is provided
by the tool “ack” by default)?


> you will not see them highlighted with a color.

This can trigger questionable interpretations of the involved
software behaviour.


> This is an unexpected "gotcha" - some terminal programs
> do NOT color tab characters at all!

I am curious if such a situation should become better known.



> they just move the cursor,

Indentation is performed then at least.


> while others print multiple spaces which are colored.

This would be nice if you could actually see a different colour.


> You might see different results on your terminal

This can happen because of variations in involved programming interfaces
and supported display capabilities.


> - which means "grep" is not the problem at all here.

There are additional constraints to consider.


> You asked about replacing non-printable characters.
> This is easy enough to do with existing programs,

Do you know any approach which provides a complete replacement
for an usable visual representation already?


> so not likely to be added as a new option to grep.

The match colouring evolved. So I imagine that a corresponding
character replacement could be performed by a companion tool
(similar to your command examples).


> Hope this helps,

I am also curious on how the application knowledge will evolve
further for regular expressions in recent software versions.
Patch code filtering might become safer and easier.

Regards,
Markus




Added tag(s) notabug. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Wed, 01 Jan 2020 07:26:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 30525 <at> debbugs.gnu.org and SF Markus Elfring <elfring <at> users.sourceforge.net> Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Wed, 01 Jan 2020 07:26:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 29 Jan 2020 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 60 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.