GNU bug report logs - #41004
Documentation:enhancement - search for hexvalue

Previous Next

Package: grep;

Reported by: Radisson97 <at> web.de

Date: Fri, 1 May 2020 17:07:01 UTC

Severity: wishlist

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 41004 in the body.
You can then email your comments to 41004 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#41004; Package grep. (Fri, 01 May 2020 17:07:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Radisson97 <at> web.de:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Fri, 01 May 2020 17:07:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Radisson97 <at> web.de
To: bug-grep <at> gnu.org
Subject: Documentation:enhancement - search for hexvalue
Date: Fri, 1 May 2020 19:05:28 +0200
Hi,
i had the problem of searching for a non-printable character in a long
list of strings. I found nothing the documentation and but several discussion
how to do that where either complicated or did not fit for my case, maybe i
was unlucky, ntl i found a simple solution that should be mentioned in the
documentation.

problem: grep for a character where only the hexcode in known.

solution:        use $'\xNN'
                     then shell expands this to the required code

example:       printf "A\nB\nC\n" | grep $'\x41'

note: that uses only printable characters, it works also with anything else
         except \0 (i guess).

i found that solution nice, it did no require any flags etc, for my problem it
worked like a charm.
(i am not member of the list please reply directly to this address) .

hope that helps,
 radisson




Information forwarded to bug-grep <at> gnu.org:
bug#41004; Package grep. (Sun, 03 May 2020 19:26:01 GMT) Full text and rfc822 format available.

Message #8 received at 41004 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Radisson97 <at> web.de
Cc: 41004 <at> debbugs.gnu.org
Subject: Re: bug#41004: Documentation:enhancement - search for hexvalue
Date: Sun, 3 May 2020 12:25:04 -0700
On Fri, May 1, 2020 at 10:07 AM <Radisson97 <at> web.de> wrote:
> Hi,
> i had the problem of searching for a non-printable character in a long
> list of strings. I found nothing the documentation and but several discussion
> how to do that where either complicated or did not fit for my case, maybe i
> was unlucky, ntl i found a simple solution that should be mentioned in the
> documentation.
>
> problem: grep for a character where only the hexcode in known.
>
> solution:        use $'\xNN'
>                      then shell expands this to the required code
>
> example:       printf "A\nB\nC\n" | grep $'\x41'
>
> note: that uses only printable characters, it works also with anything else
>          except \0 (i guess).
>
> i found that solution nice, it did no require any flags etc, for my problem it
> worked like a charm.
> (i am not member of the list please reply directly to this address) .

Thank you for the suggestion. Another approach is to use grep's -P option:

$ printf '%s\n' A B C| grep -P '\x41'
A

If you'd like to add an example to the documentation, please send a
patch, but I'm not sure how much of PCRE syntax we want to document in
grep's own manual.




Information forwarded to bug-grep <at> gnu.org:
bug#41004; Package grep. (Sun, 10 May 2020 17:00:03 GMT) Full text and rfc822 format available.

Message #11 received at 41004 <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane <at> chazelas.org>
To: Radisson97 <at> web.de
Cc: 41004 <at> debbugs.gnu.org
Subject: Re: bug#41004: Documentation:enhancement - search for hexvalue
Date: Sun, 10 May 2020 17:46:44 +0100
2020-05-01 19:05:28 +0200, Radisson97 <at> web.de:
[...]
> problem: grep for a character where only the hexcode in known.
> 
> solution:        use $'\xNN'
>                      then shell expands this to the required code
> 
> example:       printf "A\nB\nC\n" | grep $'\x41'
[...]

The $'\x41' ksh93 quoting operator expands to *byte* values.

To get a character based on the Unicode codepoint value, you'd
need the $'\u41' zsh operator (or $'\U10000' for code points
above 0xffff).

But in any case, that is done by the shell, that has nothing to
do with grep and the syntax of those shell operators varies
between shells.

In the fish shell you'd use:

grep \u41

or

grep \x41

instead.

Also, since it's done by the shell, things like:

grep $'\u2e'

where U+002E is "FULL STOP", would not only match on "."
characters but on any character. All grep sees is a "."
character. That would be different from grep -P '\x2e' which
matches "." (U+002E) only.

Note that:

grep -P '\xE9'

matches on the byte 0xE9 in singlebyte locales (regardless of 
what character that byte represents in the locale's charset) and
on character U+00E9 in UTF-8 locales (so the 0xc3 0xa9 sequence
of bytes, not byte 0xe9).

-- 
Stephane




Information forwarded to bug-grep <at> gnu.org:
bug#41004; Package grep. (Wed, 13 May 2020 03:20:01 GMT) Full text and rfc822 format available.

Message #14 received at 41004 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Stephane Chazelas <stephane <at> chazelas.org>
Cc: 41004 <at> debbugs.gnu.org, Radisson97 <at> web.de
Subject: Re: bug#41004: Documentation:enhancement - search for hexvalue
Date: Tue, 12 May 2020 20:19:05 -0700
On Sun, May 10, 2020 at 10:00 AM Stephane Chazelas
<stephane <at> chazelas.org> wrote:
>
> 2020-05-01 19:05:28 +0200, Radisson97 <at> web.de:
> [...]
> > problem: grep for a character where only the hexcode in known.
> >
> > solution:        use $'\xNN'
> >                      then shell expands this to the required code
> >
> > example:       printf "A\nB\nC\n" | grep $'\x41'
> [...]
>
> The $'\x41' ksh93 quoting operator expands to *byte* values.
>
> To get a character based on the Unicode codepoint value, you'd
> need the $'\u41' zsh operator (or $'\U10000' for code points
> above 0xffff).
>
> But in any case, that is done by the shell, that has nothing to
> do with grep and the syntax of those shell operators varies
> between shells.
>
> In the fish shell you'd use:
>
> grep \u41
>
> or
>
> grep \x41
>
> instead.
>
> Also, since it's done by the shell, things like:
>
> grep $'\u2e'
>
> where U+002E is "FULL STOP", would not only match on "."
> characters but on any character. All grep sees is a "."
> character. That would be different from grep -P '\x2e' which
> matches "." (U+002E) only.
>
> Note that:
>
> grep -P '\xE9'
>
> matches on the byte 0xE9 in singlebyte locales (regardless of
> what character that byte represents in the locale's charset) and
> on character U+00E9 in UTF-8 locales (so the 0xc3 0xa9 sequence
> of bytes, not byte 0xe9).

Thank you for the thorough reply, Stephane!
Bearing that in mind, Radisson, please consider submitting a revised patch.
I suggest to recommend something like this:

$ printf '%s\n' A B C| LC_ALL=C grep -P '\x41'
A

so that the example is independent of both the current locale and the shell.




Severity set to 'wishlist' from 'normal' Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Mon, 21 Sep 2020 19:35:02 GMT) Full text and rfc822 format available.

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Tue, 22 Sep 2020 03:26:02 GMT) Full text and rfc822 format available.

Notification sent to Radisson97 <at> web.de:
bug acknowledged by developer. (Tue, 22 Sep 2020 03:26:02 GMT) Full text and rfc822 format available.

Message #21 received at 41004-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>, Stephane Chazelas <stephane <at> chazelas.org>
Cc: 41004-done <at> debbugs.gnu.org, Radisson97 <at> web.de
Subject: Re: bug#41004: Documentation:enhancement - search for hexvalue
Date: Mon, 21 Sep 2020 20:25:15 -0700
[Message part 1 (text/plain, inline)]
I installed the attached doc patch, which I hope addresses the issues mentioned 
in this bug report, and am boldly closing the bug report.
[0001-doc-say-how-to-match-chars-by-code.patch (text/x-patch, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 20 Oct 2020 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 182 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.