GNU bug report logs - #18987
the bourne shell printf-vs-\xHH portability trap

Previous Next

Package: grep;

Reported by: Jim Meyering <jim <at> meyering.net>

Date: Fri, 7 Nov 2014 17:15:03 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 18987 in the body.
You can then email your comments to 18987 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Fri, 07 Nov 2014 17:15:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jim Meyering <jim <at> meyering.net>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Fri, 07 Nov 2014 17:15:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: bug-grep <at> gnu.org
Subject: the bourne shell printf-vs-\xHH portability trap
Date: Fri, 7 Nov 2014 11:14:21 -0600
[Message part 1 (text/plain, inline)]
I ran grep's tests on a debian system this morning and was
surprised to see the word-multibyte test fail...
Until I realized it was because that system was configured
to use dash for /bin/sh, and this test relied on the unportable
printf '\xc3\xa1\n' to print an à (A-grave).  Using \xHH
hexadecimal constants works with bash and zsh, but that
is not portable, and dash's printf built-in emits the 9 bytes
rather than the expected three.

This isn't the first time this has happened, so I'll be writing
a syntax-check rule to help avoid another repeat.

Here's how I've fixed it:
[0001-maint-move-helper-function-hex_printf-to-init.cfg.patch (application/octet-stream, attachment)]
[0002-tests-avoid-printf-xHH-portability-trap.patch (application/octet-stream, attachment)]

Reply sent to Jim Meyering <jim <at> meyering.net>:
You have taken responsibility. (Fri, 07 Nov 2014 20:14:02 GMT) Full text and rfc822 format available.

Notification sent to Jim Meyering <jim <at> meyering.net>:
bug acknowledged by developer. (Fri, 07 Nov 2014 20:14:02 GMT) Full text and rfc822 format available.

Message #10 received at 18987-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: 18987-done <at> debbugs.gnu.org
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Fri, 7 Nov 2014 12:12:55 -0800
I've pushed these, and will make a new snapshot soon.
Holler if there's anything else you think should be included.




Information forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Fri, 07 Nov 2014 22:31:02 GMT) Full text and rfc822 format available.

Message #13 received at 18987 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: 18987 <at> debbugs.gnu.org
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Sat, 08 Nov 2014 07:30:09 +0900
Jim Meyering <jim <at> meyering.net> wrote:

> I ran grep's tests on a debian system this morning and was
> surprised to see the word-multibyte test fail...
> Until I realized it was because that system was configured
> to use dash for /bin/sh, and this test relied on the unportable
> printf '\xc3\xa1\n' to print an a (A-grave).  Using \xHH
> hexadecimal constants works with bash and zsh, but that
> is not portable, and dash's printf built-in emits the 9 bytes
> rather than the expected three.
> 
> This isn't the first time this has happened, so I'll be writing
> a syntax-check rule to help avoid another repeat.
> 
> Here's how I've fixed it:

Thanks, but it seem that it is also unportable.  On Solaris 10 and AIX 7,
below.  Need Gawk for tests?

$ awk 'BEGIN { printf "\x41" }' </dev/null
\x41

BTW, On Solaris 10, AIX 7, HP-UX 11.23, below.

$ /usr/bin/printf '\x41'
\x41





Information forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Fri, 07 Nov 2014 22:44:02 GMT) Full text and rfc822 format available.

Message #16 received at 18987 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 18987 <at> debbugs.gnu.org
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Fri, 7 Nov 2014 14:42:54 -0800
On Fri, Nov 7, 2014 at 2:30 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Jim Meyering <jim <at> meyering.net> wrote:
>
>> I ran grep's tests on a debian system this morning and was
>> surprised to see the word-multibyte test fail...
>> Until I realized it was because that system was configured
>> to use dash for /bin/sh, and this test relied on the unportable
>> printf '\xc3\xa1\n' to print an a (A-grave).  Using \xHH
>> hexadecimal constants works with bash and zsh, but that
>> is not portable, and dash's printf built-in emits the 9 bytes
>> rather than the expected three.
>>
>> This isn't the first time this has happened, so I'll be writing
>> a syntax-check rule to help avoid another repeat.
>>
>> Here's how I've fixed it:
>
> Thanks, but it seem that it is also unportable.  On Solaris 10 and AIX 7,
> below.  Need Gawk for tests?
>
> $ awk 'BEGIN { printf "\x41" }' </dev/null
> \x41
>
> BTW, On Solaris 10, AIX 7, HP-UX 11.23, below.
>
> $ /usr/bin/printf '\x41'
> \x41

Thank you for testing and reporting that!
I have a marked preference for using hexadecimal (readability),
but if I can't find a good, universally-portable converter that is
sufficiently simple, I'll just revert to using octal.




Information forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Fri, 07 Nov 2014 22:46:01 GMT) Full text and rfc822 format available.

Message #19 received at 18987 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 18987 <18987 <at> debbugs.gnu.org>
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Fri, 7 Nov 2014 14:45:06 -0800
On Fri, Nov 7, 2014 at 2:30 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Jim Meyering <jim <at> meyering.net> wrote:
>
>> I ran grep's tests on a debian system this morning and was
>> surprised to see the word-multibyte test fail...
>> Until I realized it was because that system was configured
>> to use dash for /bin/sh, and this test relied on the unportable
>> printf '\xc3\xa1\n' to print an a (A-grave).  Using \xHH
>> hexadecimal constants works with bash and zsh, but that
>> is not portable, and dash's printf built-in emits the 9 bytes
>> rather than the expected three.
>>
>> This isn't the first time this has happened, so I'll be writing
>> a syntax-check rule to help avoid another repeat.
>>
>> Here's how I've fixed it:
>
> Thanks, but it seem that it is also unportable.  On Solaris 10 and AIX 7,
> below.  Need Gawk for tests?
>
> $ awk 'BEGIN { printf "\x41" }' </dev/null
> \x41

By the way, "no", we cannot rely on gawk for these tests.




Information forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Sat, 08 Nov 2014 07:57:01 GMT) Full text and rfc822 format available.

Message #22 received at 18987 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: 18987 <at> debbugs.gnu.org
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Sat, 08 Nov 2014 16:56:45 +0900
[Message part 1 (text/plain, inline)]
Jim Meyering <jim <at> meyering.net> wrote:
> Thank you for testing and reporting that!
> I have a marked preference for using hexadecimal (readability),
> but if I can't find a good, universally-portable converter that is
> sufficiently simple, I'll just revert to using octal.

Thanks, I fixed left multibyte-white-space.  Although I do not try it
on Debian, passed on CentOS, Solaris, HP-UX and AIX.
[0001-tests-avoid-awk-printf-xHH-portability-trap.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Sat, 08 Nov 2014 15:58:01 GMT) Full text and rfc822 format available.

Message #25 received at 18987 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 18987 <18987 <at> debbugs.gnu.org>
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Sat, 8 Nov 2014 07:56:48 -0800
[Message part 1 (text/plain, inline)]
On Fri, Nov 7, 2014 at 11:56 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Jim Meyering <jim <at> meyering.net> wrote:
>> Thank you for testing and reporting that!
>> I have a marked preference for using hexadecimal (readability),
>> but if I can't find a good, universally-portable converter that is
>> sufficiently simple, I'll just revert to using octal.
>
> Thanks, I fixed left multibyte-white-space.  Although I do not try it
> on Debian, passed on CentOS, Solaris, HP-UX and AIX.

Thank you for working on that.
I've improved your patch: update the now-shared hex_printf_
rather than making a copy, use a better definition of that function
(knowing that "printf %s a b c d e" reuses the format string and
prints just 5 bytes helps), also update word-multibyte to work
with the new definition, and rewrite the commit log.

I'll push after you ACK:
[0001-tests-avoid-awk-printf-xHH-portability-trap.patch (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Sat, 08 Nov 2014 19:47:02 GMT) Full text and rfc822 format available.

Message #28 received at 18987 <at> debbugs.gnu.org (full text, mbox):

From: arnold <at> skeeve.com
To: noritnk <at> kcn.ne.jp, jim <at> meyering.net
Cc: 18987 <at> debbugs.gnu.org
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Sat, 08 Nov 2014 12:46:15 -0700
Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:

> Thanks, but it seem that it is also unportable.  On Solaris 10 and AIX 7,
> below.  Need Gawk for tests?
>
> $ awk 'BEGIN { printf "\x41" }' </dev/null
> \x41

If you use octal it should work with any awk.

Arnold




Information forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Sat, 08 Nov 2014 20:11:01 GMT) Full text and rfc822 format available.

Message #31 received at 18987 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Aharon Robbins <arnold <at> skeeve.com>
Cc: 18987 <18987 <at> debbugs.gnu.org>, Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Sat, 8 Nov 2014 12:09:45 -0800
On Sat, Nov 8, 2014 at 11:46 AM,  <arnold <at> skeeve.com> wrote:
> Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
>
>> Thanks, but it seem that it is also unportable.  On Solaris 10 and AIX 7,
>> below.  Need Gawk for tests?
>>
>> $ awk 'BEGIN { printf "\x41" }' </dev/null
>> \x41
>
> If you use octal it should work with any awk.

Thanks, but octal would also work with printf.
My only reason to use awk was because I thought (wrongly)
that it could portably handle that use of hex constants.
The trouble was that I have a strong preference for using hex
constants in the tests.

With the updated patch I sent today, the problem is resolved.




Information forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Sat, 08 Nov 2014 23:39:02 GMT) Full text and rfc822 format available.

Message #34 received at 18987 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: 18987 <at> debbugs.gnu.org
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Sun, 09 Nov 2014 08:38:18 +0900
[Message part 1 (text/plain, inline)]
On Sat, 8 Nov 2014 07:56:48 -0800
Jim Meyering <jim <at> meyering.net> wrote:
> Thank you for working on that.
> I've improved your patch: update the now-shared hex_printf_
> rather than making a copy, use a better definition of that function
> (knowing that "printf %s a b c d e" reuses the format string and
> prints just 5 bytes helps), also update word-multibyte to work
> with the new definition, and rewrite the commit log.
> 
> I'll push after you ACK:

Thanks for the review.  I added to a change the patch as word-multibyte
uses \OOO in printf simply, because use it in other test module e.g.
euc-mb.
[0001-tests-avoid-awk-printf-xHH-portability-trap.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Sun, 09 Nov 2014 01:03:02 GMT) Full text and rfc822 format available.

Message #37 received at 18987-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 18987 <18987-done <at> debbugs.gnu.org>
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Sat, 8 Nov 2014 17:02:28 -0800
[Message part 1 (text/plain, inline)]
On Sat, Nov 8, 2014 at 3:38 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> On Sat, 8 Nov 2014 07:56:48 -0800
> Jim Meyering <jim <at> meyering.net> wrote:
>> Thank you for working on that.
>> I've improved your patch: update the now-shared hex_printf_
>> rather than making a copy, use a better definition of that function
>> (knowing that "printf %s a b c d e" reuses the format string and
>> prints just 5 bytes helps), also update word-multibyte to work
>> with the new definition, and rewrite the commit log.
>>
>> I'll push after you ACK:
>
> Thanks for the review.  I added to a change the patch as word-multibyte
> uses \OOO in printf simply, because use it in other test module e.g.
> euc-mb.

For reference, just because something is used in another test
does not necessarily mean it is desirable. There is nontrivial
variance in the style/quality of grep's test scripts.

However, I can see how one might prefer to use printf
directly, so since I've left your name on this change, I'll
also let you choose octal here.

To help with readability, I've given a name to
the character we're using as input: e_acute, since that
one is used in other tests.
[0001-tests-avoid-awk-printf-xHH-portability-trap.patch (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Sun, 09 Nov 2014 01:55:01 GMT) Full text and rfc822 format available.

Message #40 received at 18987-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 18987 <18987-done <at> debbugs.gnu.org>
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Sat, 8 Nov 2014 17:54:03 -0800
[Message part 1 (text/plain, inline)]
On Sat, Nov 8, 2014 at 5:02 PM, Jim Meyering <jim <at> meyering.net> wrote:
> On Sat, Nov 8, 2014 at 3:38 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
>> On Sat, 8 Nov 2014 07:56:48 -0800
>> Jim Meyering <jim <at> meyering.net> wrote:
>>> Thank you for working on that.
>>> I've improved your patch: update the now-shared hex_printf_
>>> rather than making a copy, use a better definition of that function
>>> (knowing that "printf %s a b c d e" reuses the format string and
>>> prints just 5 bytes helps), also update word-multibyte to work
>>> with the new definition, and rewrite the commit log.
>>>
>>> I'll push after you ACK:
>>
>> Thanks for the review.  I added to a change the patch as word-multibyte
>> uses \OOO in printf simply, because use it in other test module e.g.
>> euc-mb.
>
> For reference, just because something is used in another test
> does not necessarily mean it is desirable. There is nontrivial
> variance in the style/quality of grep's test scripts.
>
> However, I can see how one might prefer to use printf
> directly, so since I've left your name on this change, I'll
> also let you choose octal here.
>
> To help with readability, I've given a name to
> the character we're using as input: e_acute, since that
> one is used in other tests.

I pushed that, then tested more and found an error I'd introduced.
Here's the fix:
[0001-tests-fix-typo-in-previous-change.patch (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Sun, 09 Nov 2014 03:53:02 GMT) Full text and rfc822 format available.

Message #43 received at 18987-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: 18987 <18987-done <at> debbugs.gnu.org>
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Sat, 08 Nov 2014 19:52:13 -0800
I have some qualms about that patch.  It assumes the C locale, and it's a bit 
safer to spell it out as in '0-9abcdefABCDEF'.  Also, the temporary streams 
(i.e., the output of 'COMMAND inside '$(COMMAND)') are not text, and arguably 
this does not conform to POSIX (POSIX is murky here) and anyway I suspect some 
picky shells will complain.  Third and most important, it'd be nicer if 
hex_printf_ worked like 'printf', except with support for hexadecimal escapes.

How about something like the following instead?  It's brute-force, but it should 
be portable.

  hex_printf_()
  {
    hex_printf_format=$(printf '%s\n' "$1" | sed '
      s/^/_/
      s/$/_/
      s/\([^\\]\(\\\\\)*\\x\)\([0-9aAbBcCdDeEfF][^0-9aAbBcCdDeEfF]\)/\10\3/g
      s/\([^\\]\(\\\\\)*\\x\)\([0-3]\)/\10\3/g
      s/\([^\\]\(\\\\\)*\\x\)\([4-7]\)/\11\3/g
      s/\([^\\]\(\\\\\)*\\x\)\([89aAbB]\)/\12\3/g
      s/\([^\\]\(\\\\\)*\\x\)\([cCdDeEfF]\)/\13\3/g
      s/\([^\\]\(\\\\\)*\\x[0-3]\)[048cC]\([0-7]\)/\1,0\3/g
      s/\([^\\]\(\\\\\)*\\x[0-3]\)[048cC]\([89aAbBcCdDeEfF]\)/\1,1\3/g
      s/\([^\\]\(\\\\\)*\\x[0-3]\)[159dD]\([0-7]\)/\1,2\3/g
      s/\([^\\]\(\\\\\)*\\x[0-3]\)[159dD]\([89abcdef]\)/\1,3\3/g
      s/\([^\\]\(\\\\\)*\\x[0-3]\)[26aAeE]\([0-7]\)/\1,4\3/g
      s/\([^\\]\(\\\\\)*\\x[0-3]\)[26aAeE]\([89aAbBcCdDeEfF]\)/\1,5\3/g
      s/\([^\\]\(\\\\\)*\\x[0-3]\)[37bBfF]\([0-7]\)/\1,6\3/g
      s/\([^\\]\(\\\\\)*\\x[0-3]\)[37bBfF]\([89aAbBcCdDeEfF]\)/\1,7\3/g
      s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[08]/\1\3\40/g
      s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[19]/\1\3\41/g
      s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[2aA]/\1\3\42/g
      s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[3bB]/\1\3\43/g
      s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[4cC]/\1\3\44/g
      s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[5dD]/\1\3\45/g
      s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[6eE]/\1\3\46/g
      s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[7fF]/\1\3\47/g
      s/^_//
      s/_$//
    ')
    shift
    printf "$hex_printf_format" "$@"
  }

  hex_printf_ '\x34\\x%dX\x45\n' 100





Information forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Sun, 09 Nov 2014 04:21:01 GMT) Full text and rfc822 format available.

Message #46 received at 18987-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 18987 <18987-done <at> debbugs.gnu.org>
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Sat, 8 Nov 2014 20:19:44 -0800
On Sat, Nov 8, 2014 at 7:52 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>   hex_printf_()
>   {
>     hex_printf_format=$(printf '%s\n' "$1" | sed '
>       s/^/_/
>       s/$/_/
>       s/\([^\\]\(\\\\\)*\\x\)\([0-9aAbBcCdDeEfF][^0-9aAbBcCdDeEfF]\)/\10\3/g
>       s/\([^\\]\(\\\\\)*\\x\)\([0-3]\)/\10\3/g
>       s/\([^\\]\(\\\\\)*\\x\)\([4-7]\)/\11\3/g
>       s/\([^\\]\(\\\\\)*\\x\)\([89aAbB]\)/\12\3/g
>       s/\([^\\]\(\\\\\)*\\x\)\([cCdDeEfF]\)/\13\3/g
>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[048cC]\([0-7]\)/\1,0\3/g
>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[048cC]\([89aAbBcCdDeEfF]\)/\1,1\3/g
>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[159dD]\([0-7]\)/\1,2\3/g
>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[159dD]\([89abcdef]\)/\1,3\3/g
>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[26aAeE]\([0-7]\)/\1,4\3/g
>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[26aAeE]\([89aAbBcCdDeEfF]\)/\1,5\3/g
>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[37bBfF]\([0-7]\)/\1,6\3/g
>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[37bBfF]\([89aAbBcCdDeEfF]\)/\1,7\3/g
>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[08]/\1\3\40/g
>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[19]/\1\3\41/g
>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[2aA]/\1\3\42/g
>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[3bB]/\1\3\43/g
>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[4cC]/\1\3\44/g
>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[5dD]/\1\3\45/g
>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[6eE]/\1\3\46/g
>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[7fF]/\1\3\47/g
>       s/^_//
>       s/_$//
>     ')
>     shift
>     printf "$hex_printf_format" "$@"
>   }

How elegantly twisted ;-)
I like it.

Do you have time to write the complete patch?
I'd like to make a pre-release snapshot tomorrow.




Information forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Sun, 09 Nov 2014 05:02:02 GMT) Full text and rfc822 format available.

Message #49 received at 18987-done <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: 18987 <18987-done <at> debbugs.gnu.org>
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Sun, 09 Nov 2014 14:01:43 +0900
On Sat, 8 Nov 2014 17:54:03 -0800
Jim Meyering <jim <at> meyering.net> wrote:
> I pushed that, then tested more and found an error I'd introduced.
> Here's the fix:

Ah, I could not find it.  Thanks.





Information forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Sun, 09 Nov 2014 18:21:02 GMT) Full text and rfc822 format available.

Message #52 received at 18987-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 18987 <18987-done <at> debbugs.gnu.org>
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Sun, 9 Nov 2014 10:19:57 -0800
2014-11-08 20:19 GMT-08:00 Jim Meyering <jim <at> meyering.net>:
> On Sat, Nov 8, 2014 at 7:52 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>>   hex_printf_()
>>   {
>>     hex_printf_format=$(printf '%s\n' "$1" | sed '
>>       s/^/_/
>>       s/$/_/
>>       s/\([^\\]\(\\\\\)*\\x\)\([0-9aAbBcCdDeEfF][^0-9aAbBcCdDeEfF]\)/\10\3/g
>>       s/\([^\\]\(\\\\\)*\\x\)\([0-3]\)/\10\3/g
>>       s/\([^\\]\(\\\\\)*\\x\)\([4-7]\)/\11\3/g
>>       s/\([^\\]\(\\\\\)*\\x\)\([89aAbB]\)/\12\3/g
>>       s/\([^\\]\(\\\\\)*\\x\)\([cCdDeEfF]\)/\13\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[048cC]\([0-7]\)/\1,0\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[048cC]\([89aAbBcCdDeEfF]\)/\1,1\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[159dD]\([0-7]\)/\1,2\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[159dD]\([89abcdef]\)/\1,3\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[26aAeE]\([0-7]\)/\1,4\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[26aAeE]\([89aAbBcCdDeEfF]\)/\1,5\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[37bBfF]\([0-7]\)/\1,6\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[37bBfF]\([89aAbBcCdDeEfF]\)/\1,7\3/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[08]/\1\3\40/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[19]/\1\3\41/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[2aA]/\1\3\42/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[3bB]/\1\3\43/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[4cC]/\1\3\44/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[5dD]/\1\3\45/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[6eE]/\1\3\46/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[7fF]/\1\3\47/g
>>       s/^_//
>>       s/_$//
>>     ')
>>     shift
>>     printf "$hex_printf_format" "$@"
>>   }
>
> How elegantly twisted ;-)
> I like it.
>
> Do you have time to write the complete patch?
> I'd like to make a pre-release snapshot tomorrow.

I tried it, and found that this new function makes the multibyte-white-space
test fail with GNU sed. Here's a simplified example showing where
it goes wrong. This shows that only the first \x285 is transformed
into \x2,05:

  $ printf '%s\n' '_\x285\x285\n_' \
     |sed 's/\([^\\]\(\\\\\)*\\x[0-3]\)[
048cC]\([0-7]\)/\1,0\3/g'
  _\x2,05\x285\n_

The intent was that it transform both, of course.
The trouble arises when the regexp consumes all 3 hex
digits.  Then there is no longer a non-backslash remaining
to be consumed on 2nd and subsequent iterations.

There is also a portability problem in that Solaris 5.10's /bin/sed
seems unable to handle some of that code. For example,
using that same example with its /bin/sed, neither \x285
string is transformed.




Information forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Sun, 09 Nov 2014 18:30:03 GMT) Full text and rfc822 format available.

Message #55 received at 18987-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 18987 <18987-done <at> debbugs.gnu.org>
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Sun, 9 Nov 2014 10:29:00 -0800
[Message part 1 (text/plain, inline)]
On Sun, Nov 9, 2014 at 10:19 AM, Jim Meyering <jim <at> meyering.net> wrote:
> 2014-11-08 20:19 GMT-08:00 Jim Meyering <jim <at> meyering.net>:
>> On Sat, Nov 8, 2014 at 7:52 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>>>   hex_printf_()
>>>   {
>>>     hex_printf_format=$(printf '%s\n' "$1" | sed '
>>>       s/^/_/
>>>       s/$/_/
...
>> Do you have time to write the complete patch?
>> I'd like to make a pre-release snapshot tomorrow.
>
> I tried it, and found that this new function makes the multibyte-white-space
> test fail with GNU sed. Here's a simplified example showing where
...

I do like the idea, but now prefer to defer that until after the release.
Instead, I'll address the portability issues you mentioned, with this:
[0001-tests-avoid-hex_printf_-portability-problems.patch (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Sun, 09 Nov 2014 18:33:01 GMT) Full text and rfc822 format available.

Message #58 received at 18987-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: 18987-done <18987-done <at> debbugs.gnu.org>
Subject: Fwd: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Sun, 9 Nov 2014 10:31:51 -0800
[Message part 1 (text/plain, inline)]
Forwarding to the bug tracking system:

---------- Forwarded message ----------
From: Jim Meyering <jim <at> meyering.net>
Date: Sun, Nov 9, 2014 at 10:23 AM
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
To: Paul Eggert <eggert <at> cs.ucla.edu>


On Sun, Nov 9, 2014 at 9:36 AM, Jim Meyering <jim <at> meyering.net> wrote:
> On Sat, Nov 8, 2014 at 7:52 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>> I have some qualms about that patch.  It assumes the C locale, and it's a
>> bit safer to spell it out as in '0-9abcdefABCDEF'.  Also, the temporary
>> streams (i.e., the output of 'COMMAND inside '$(COMMAND)') are not text, and
>> arguably this does not conform to POSIX (POSIX is murky here) and anyway I
>> suspect some picky shells will complain.  Third and most important, it'd be
>> nicer if hex_printf_ worked like 'printf', except with support for
>> hexadecimal escapes.
>>
>> How about something like the following instead?  It's brute-force, but it
>> should be portable.
>>
>>   hex_printf_()
>>   {
>>     hex_printf_format=$(printf '%s\n' "$1" | sed '
>>       s/^/_/
>>       s/$/_/
>>       s/\([^\\]\(\\\\\)*\\x\)\([0-9aAbBcCdDeEfF][^0-9aAbBcCdDeEfF]\)/\10\3/g
>
> I do like the idea, but now prefer to defer that until after the release.
> Instead, I'll address the portability issues you mentioned, with this:
[0001-tests-avoid-hex_printf_-portability-problems.patch (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18987; Package grep. (Sun, 09 Nov 2014 20:06:02 GMT) Full text and rfc822 format available.

Message #61 received at 18987-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: 18987 <18987-done <at> debbugs.gnu.org>
Subject: Re: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Sun, 09 Nov 2014 12:04:56 -0800
Jim Meyering wrote:
> I tried it, and found that this new function makes the multibyte-white-space
> test fail with GNU sed.

Yes, and the more I look at it the less I like it.  I'm afraid I'm now going 
back to the idea that we should just use octal.  This outputting-hex business is 
more trouble than it's worth.  What happens with the current code, for example, 
when one of the printfs fail?  No error is reported.  If we just used octal it'd 
all be saner.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 08 Dec 2014 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 142 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.