GNU bug report logs -
#41004
Documentation:enhancement - search for hexvalue
Previous Next
Reported by: Radisson97 <at> web.de
Date: Fri, 1 May 2020 17:07:01 UTC
Severity: wishlist
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 41004 in the body.
You can then email your comments to 41004 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#41004
; Package
grep
.
(Fri, 01 May 2020 17:07:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Radisson97 <at> web.de
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Fri, 01 May 2020 17:07:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi,
i had the problem of searching for a non-printable character in a long
list of strings. I found nothing the documentation and but several discussion
how to do that where either complicated or did not fit for my case, maybe i
was unlucky, ntl i found a simple solution that should be mentioned in the
documentation.
problem: grep for a character where only the hexcode in known.
solution: use $'\xNN'
then shell expands this to the required code
example: printf "A\nB\nC\n" | grep $'\x41'
note: that uses only printable characters, it works also with anything else
except \0 (i guess).
i found that solution nice, it did no require any flags etc, for my problem it
worked like a charm.
(i am not member of the list please reply directly to this address) .
hope that helps,
radisson
Information forwarded
to
bug-grep <at> gnu.org
:
bug#41004
; Package
grep
.
(Sun, 03 May 2020 19:26:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 41004 <at> debbugs.gnu.org (full text, mbox):
On Fri, May 1, 2020 at 10:07 AM <Radisson97 <at> web.de> wrote:
> Hi,
> i had the problem of searching for a non-printable character in a long
> list of strings. I found nothing the documentation and but several discussion
> how to do that where either complicated or did not fit for my case, maybe i
> was unlucky, ntl i found a simple solution that should be mentioned in the
> documentation.
>
> problem: grep for a character where only the hexcode in known.
>
> solution: use $'\xNN'
> then shell expands this to the required code
>
> example: printf "A\nB\nC\n" | grep $'\x41'
>
> note: that uses only printable characters, it works also with anything else
> except \0 (i guess).
>
> i found that solution nice, it did no require any flags etc, for my problem it
> worked like a charm.
> (i am not member of the list please reply directly to this address) .
Thank you for the suggestion. Another approach is to use grep's -P option:
$ printf '%s\n' A B C| grep -P '\x41'
A
If you'd like to add an example to the documentation, please send a
patch, but I'm not sure how much of PCRE syntax we want to document in
grep's own manual.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#41004
; Package
grep
.
(Sun, 10 May 2020 17:00:03 GMT)
Full text and
rfc822 format available.
Message #11 received at 41004 <at> debbugs.gnu.org (full text, mbox):
2020-05-01 19:05:28 +0200, Radisson97 <at> web.de:
[...]
> problem: grep for a character where only the hexcode in known.
>
> solution: use $'\xNN'
> then shell expands this to the required code
>
> example: printf "A\nB\nC\n" | grep $'\x41'
[...]
The $'\x41' ksh93 quoting operator expands to *byte* values.
To get a character based on the Unicode codepoint value, you'd
need the $'\u41' zsh operator (or $'\U10000' for code points
above 0xffff).
But in any case, that is done by the shell, that has nothing to
do with grep and the syntax of those shell operators varies
between shells.
In the fish shell you'd use:
grep \u41
or
grep \x41
instead.
Also, since it's done by the shell, things like:
grep $'\u2e'
where U+002E is "FULL STOP", would not only match on "."
characters but on any character. All grep sees is a "."
character. That would be different from grep -P '\x2e' which
matches "." (U+002E) only.
Note that:
grep -P '\xE9'
matches on the byte 0xE9 in singlebyte locales (regardless of
what character that byte represents in the locale's charset) and
on character U+00E9 in UTF-8 locales (so the 0xc3 0xa9 sequence
of bytes, not byte 0xe9).
--
Stephane
Information forwarded
to
bug-grep <at> gnu.org
:
bug#41004
; Package
grep
.
(Wed, 13 May 2020 03:20:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 41004 <at> debbugs.gnu.org (full text, mbox):
On Sun, May 10, 2020 at 10:00 AM Stephane Chazelas
<stephane <at> chazelas.org> wrote:
>
> 2020-05-01 19:05:28 +0200, Radisson97 <at> web.de:
> [...]
> > problem: grep for a character where only the hexcode in known.
> >
> > solution: use $'\xNN'
> > then shell expands this to the required code
> >
> > example: printf "A\nB\nC\n" | grep $'\x41'
> [...]
>
> The $'\x41' ksh93 quoting operator expands to *byte* values.
>
> To get a character based on the Unicode codepoint value, you'd
> need the $'\u41' zsh operator (or $'\U10000' for code points
> above 0xffff).
>
> But in any case, that is done by the shell, that has nothing to
> do with grep and the syntax of those shell operators varies
> between shells.
>
> In the fish shell you'd use:
>
> grep \u41
>
> or
>
> grep \x41
>
> instead.
>
> Also, since it's done by the shell, things like:
>
> grep $'\u2e'
>
> where U+002E is "FULL STOP", would not only match on "."
> characters but on any character. All grep sees is a "."
> character. That would be different from grep -P '\x2e' which
> matches "." (U+002E) only.
>
> Note that:
>
> grep -P '\xE9'
>
> matches on the byte 0xE9 in singlebyte locales (regardless of
> what character that byte represents in the locale's charset) and
> on character U+00E9 in UTF-8 locales (so the 0xc3 0xa9 sequence
> of bytes, not byte 0xe9).
Thank you for the thorough reply, Stephane!
Bearing that in mind, Radisson, please consider submitting a revised patch.
I suggest to recommend something like this:
$ printf '%s\n' A B C| LC_ALL=C grep -P '\x41'
A
so that the example is independent of both the current locale and the shell.
Severity set to 'wishlist' from 'normal'
Request was from
Paul Eggert <eggert <at> cs.ucla.edu>
to
control <at> debbugs.gnu.org
.
(Mon, 21 Sep 2020 19:35:02 GMT)
Full text and
rfc822 format available.
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Tue, 22 Sep 2020 03:26:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Radisson97 <at> web.de
:
bug acknowledged by developer.
(Tue, 22 Sep 2020 03:26:02 GMT)
Full text and
rfc822 format available.
Message #21 received at 41004-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I installed the attached doc patch, which I hope addresses the issues mentioned
in this bug report, and am boldly closing the bug report.
[0001-doc-say-how-to-match-chars-by-code.patch (text/x-patch, attachment)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Tue, 20 Oct 2020 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 3 years and 182 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.