GNU bug report logs -
#36887
coreutils-8.31: printf chokes on \u0041
Previous Next
Reported by: Ulrich Mueller <ulm <at> gentoo.org>
Date: Thu, 1 Aug 2019 11:03:01 UTC
Severity: normal
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 36887 in the body.
You can then email your comments to 36887 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#36887
; Package
coreutils
.
(Thu, 01 Aug 2019 11:03:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Ulrich Mueller <ulm <at> gentoo.org>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Thu, 01 Aug 2019 11:03:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Forwarding bug https://bugs.gentoo.org/680244 as requested by the
Gentoo package maintainer.]
According to printf(1):
Interpreted sequences are:
[...]
\uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits)
\UHHHHHHHH
Unicode character with hex value HHHHHHHH (8 digits)
It does not work, though:
$ /usr/bin/printf '\u0041\n'
/usr/bin/printf: invalid universal character name \u0041
$ /usr/bin/printf '\U00000041\n'
/usr/bin/printf: invalid universal character name \U00000041
Other tools interpret the sequence correctly:
$ printf '\u0041\n' # bash
A
$ echo -e '\u0041' # bash
A
$ zsh -c "echo -e '\u0041'"
A
$ emacs -Q --batch --eval '(princ "\u0041\n")'
A
$ python -c "print ('\u0041')"
A
$ ruby -e 'print("\u0041\n")'
A
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#36887
; Package
coreutils
.
(Thu, 01 Aug 2019 13:10:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 36887 <at> debbugs.gnu.org (full text, mbox):
On 01/08/19 12:02, Ulrich Mueller wrote:
> [Forwarding bug https://bugs.gentoo.org/680244 as requested by the
> Gentoo package maintainer.]
>
> According to printf(1):
>
> Interpreted sequences are:
> [...]
>
> \uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits)
>
> \UHHHHHHHH
> Unicode character with hex value HHHHHHHH (8 digits)
>
> It does not work, though:
>
> $ /usr/bin/printf '\u0041\n'
> /usr/bin/printf: invalid universal character name \u0041
> $ /usr/bin/printf '\U00000041\n'
> /usr/bin/printf: invalid universal character name \U00000041
>
> Other tools interpret the sequence correctly:
>
> $ printf '\u0041\n' # bash
> A
> $ echo -e '\u0041' # bash
> A
> $ zsh -c "echo -e '\u0041'"
> A
> $ emacs -Q --batch --eval '(princ "\u0041\n")'
> A
> $ python -c "print ('\u0041')"
> A
> $ ruby -e 'print("\u0041\n")'
> A
I agree this is a bit surprising.
The full manual states:
"Unicode characters in the ranges
U+0000...U+009F, U+D800...U+DFFF cannot be specified by this syntax,
except for U+0024 ($), U+0040 (@), and U+0060 (`)."
This was previously discussed at:
https://lists.gnu.org/archive/html/bug-coreutils/2008-05/threads.html#00067
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#36887
; Package
coreutils
.
(Thu, 01 Aug 2019 20:19:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 36887 <at> debbugs.gnu.org (full text, mbox):
>>>>> On Thu, 01 Aug 2019, Pádraig Brady wrote:
> I agree this is a bit surprising.
Indeed, it most certainly violates the principle of least surprise.
Especially, it means that a shell script that will run in bash won't
run in a shell that doesn't have a built-in printf.
> The full manual states:
> "Unicode characters in the ranges
> U+0000...U+009F, U+D800...U+DFFF cannot be specified by this syntax,
> except for U+0024 ($), U+0040 (@), and U+0060 (`)."
> This was previously discussed at:
> https://lists.gnu.org/archive/html/bug-coreutils/2008-05/threads.html#00067
So, there are reasons for this restriction in C99. However, I fail to
see how those reasons would apply to printf. Except for the surrogates
U+D800...U+DFFF, it looks like an arbitrary restriction, which only
makes the printf implementation incompatible with other GNU programs
(like Bash and Emacs).
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#36887
; Package
coreutils
.
(Thu, 01 Aug 2019 23:38:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 36887 <at> debbugs.gnu.org (full text, mbox):
Ulrich Mueller wrote:
> Except for the surrogates
> U+D800...U+DFFF, it looks like an arbitrary restriction
It's not entirely arbitrary. Because of the restriction, coreutils printf
doesn't have to worry about what this command should do:
printf '\u0025d\n' 1 2
Does this print a single line "%d", or two lines "1" and "2"? There are good
arguments either way, and one can easily construct even-stranger examples.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#36887
; Package
coreutils
.
(Fri, 02 Aug 2019 08:01:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 36887 <at> debbugs.gnu.org (full text, mbox):
>>>>> On Fri, 02 Aug 2019, Paul Eggert wrote:
> It's not entirely arbitrary. Because of the restriction, coreutils
> printf doesn't have to worry about what this command should do:
> printf '\u0025d\n' 1 2
Seems quite obvious, it should do the same as these commands:
printf '\045d\n' 1 2
printf '\x25d\n' 1 2
This is different from C behaviour, because printf(3) doesn't deal with
backslash escapes at all, which are interpreted earlier during parsing
of the string literal. That's why I think the C reasoning doesn't apply
here.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#36887
; Package
coreutils
.
(Fri, 02 Aug 2019 10:16:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 36887 <at> debbugs.gnu.org (full text, mbox):
On 2019/08/01 16:37, Paul Eggert wrote:
> Ulrich Mueller wrote:
>
>> Except for the surrogates
>> U+D800...U+DFFF, it looks like an arbitrary restriction
>>
>
> It's not entirely arbitrary. Because of the restriction, coreutils printf
> doesn't have to worry about what this command should do:
>
> printf '\u0025d\n' 1 2
>
> Does this print a single line "%d", or two lines "1" and "2"? There are good
> arguments either way, and one can easily construct even-stranger examples.
>
There are no format characters in the initial line, so only the 1st
argument is interpreted. You can't do multiple interpretations since if
you do there's no stopping point, (i.e. a hex-encode of a hex-encode of
'%d\n')
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#36887
; Package
coreutils
.
(Wed, 07 Jun 2023 14:17:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 36887 <at> debbugs.gnu.org (full text, mbox):
Can this bug be closed? AFAICS it is fixed since coreutils-9.2.
Relevant commit:
https://git.savannah.gnu.org/cgit/coreutils.git/commit/src/printf.c?id=0925e8a0f413ecf9004153d89b312b385b20d0ee
Reply sent
to
Pádraig Brady <P <at> draigBrady.com>
:
You have taken responsibility.
(Wed, 07 Jun 2023 14:58:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Ulrich Mueller <ulm <at> gentoo.org>
:
bug acknowledged by developer.
(Wed, 07 Jun 2023 14:58:02 GMT)
Full text and
rfc822 format available.
Message #28 received at 36887-done <at> debbugs.gnu.org (full text, mbox):
On 07/06/2023 15:16, Ulrich Mueller wrote:
> Can this bug be closed? AFAICS it is fixed since coreutils-9.2.
>
> Relevant commit:
> https://git.savannah.gnu.org/cgit/coreutils.git/commit/src/printf.c?id=0925e8a0f413ecf9004153d89b312b385b20d0ee
Marked as done.
thanks!
Pádraig
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 06 Jul 2023 11:24:10 GMT)
Full text and
rfc822 format available.
This bug report was last modified 1 year and 307 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.