GNU bug report logs - #36887
coreutils-8.31: printf chokes on \u0041

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: coreutils; Reported by: Ulrich Mueller <ulm@HIDDEN>; dated Thu, 1 Aug 2019 11:03:01 UTC; Maintainer for coreutils is bug-coreutils@HIDDEN.

Message received at 36887 <at> debbugs.gnu.org:


Received: (at 36887) by debbugs.gnu.org; 1 Aug 2019 20:18:53 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Aug 01 16:18:53 2019
Received: from localhost ([127.0.0.1]:55427 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1htHXJ-0001pe-4x
	for submit <at> debbugs.gnu.org; Thu, 01 Aug 2019 16:18:53 -0400
Received: from smtp.gentoo.org ([140.211.166.183]:35206)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ulm@HIDDEN>) id 1htHXH-0001pQ-Bp
 for 36887 <at> debbugs.gnu.org; Thu, 01 Aug 2019 16:18:52 -0400
Received: from a1i15 (host2092.kph.uni-mainz.de [134.93.134.92])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested) (Authenticated sender: ulm)
 by smtp.gentoo.org (Postfix) with ESMTPSA id 25B10349280;
 Thu,  1 Aug 2019 20:18:43 +0000 (UTC)
From: Ulrich Mueller <ulm@HIDDEN>
To: =?utf-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>
Subject: Re: bug#36887: coreutils-8.31: printf chokes on \u0041
References: <w6gh871tagt.fsf@HIDDEN>
 <4041c88e-d9cb-ad28-df50-7cefde550733@HIDDEN>
Date: Thu, 01 Aug 2019 22:18:41 +0200
In-Reply-To: <4041c88e-d9cb-ad28-df50-7cefde550733@HIDDEN>
 (=?utf-8?Q?=22P=C3=A1draig?=
 Brady"'s message of "Thu, 1 Aug 2019 14:09:08 +0100")
Message-ID: <w6gblx8tza6.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -5.0 (-----)
X-Debbugs-Envelope-To: 36887
Cc: base-system@HIDDEN, 36887 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -6.0 (------)

>>>>> On Thu, 01 Aug 2019, P=C3=A1draig Brady wrote:

> I agree this is a bit surprising.

Indeed, it most certainly violates the principle of least surprise.
Especially, it means that a shell script that will run in bash won't
run in a shell that doesn't have a built-in printf.

> The full manual states:

>   "Unicode characters in the ranges
>   U+0000...U+009F, U+D800...U+DFFF cannot be specified by this syntax,
>   except for U+0024 ($), U+0040 (@), and U+0060 (`)."

> This was previously discussed at:
> https://lists.gnu.org/archive/html/bug-coreutils/2008-05/threads.html#000=
67

So, there are reasons for this restriction in C99. However, I fail to
see how those reasons would apply to printf. Except for the surrogates
U+D800...U+DFFF, it looks like an arbitrary restriction, which only
makes the printf implementation incompatible with other GNU programs
(like Bash and Emacs).




Information forwarded to bug-coreutils@HIDDEN:
bug#36887; Package coreutils. Full text available.

Message received at 36887 <at> debbugs.gnu.org:


Received: (at 36887) by debbugs.gnu.org; 1 Aug 2019 13:09:16 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Aug 01 09:09:16 2019
Received: from localhost ([127.0.0.1]:53667 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1htApX-0006fh-IB
	for submit <at> debbugs.gnu.org; Thu, 01 Aug 2019 09:09:15 -0400
Received: from mail.magicbluesmoke.com ([82.195.144.49]:59570)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <P@HIDDEN>) id 1htApV-0006fW-6V
 for 36887 <at> debbugs.gnu.org; Thu, 01 Aug 2019 09:09:13 -0400
Received: from localhost.localdomain (unknown [109.77.225.34])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by mail.magicbluesmoke.com (Postfix) with ESMTPSA id D2AFA990D;
 Thu,  1 Aug 2019 14:09:10 +0100 (IST)
Subject: Re: bug#36887: coreutils-8.31: printf chokes on \u0041
To: Ulrich Mueller <ulm@HIDDEN>, 36887 <at> debbugs.gnu.org
References: <w6gh871tagt.fsf@HIDDEN>
From: =?UTF-8?Q?P=c3=a1draig_Brady?= <P@HIDDEN>
Message-ID: <4041c88e-d9cb-ad28-df50-7cefde550733@HIDDEN>
Date: Thu, 1 Aug 2019 14:09:08 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <w6gh871tagt.fsf@HIDDEN>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 36887
Cc: base-system@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

On 01/08/19 12:02, Ulrich Mueller wrote:
> [Forwarding bug https://bugs.gentoo.org/680244 as requested by the
> Gentoo package maintainer.]
> 
> According to printf(1):
> 
>    Interpreted sequences are:
>    [...]
>    
>    \uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits)
> 
>    \UHHHHHHHH
>           Unicode character with hex value HHHHHHHH (8 digits)
> 
> It does not work, though:
> 
> $ /usr/bin/printf '\u0041\n'
> /usr/bin/printf: invalid universal character name \u0041
> $ /usr/bin/printf '\U00000041\n'
> /usr/bin/printf: invalid universal character name \U00000041
> 
> Other tools interpret the sequence correctly:
> 
> $ printf '\u0041\n'   # bash
> A
> $ echo -e '\u0041'    # bash
> A
> $ zsh -c "echo -e '\u0041'"
> A
> $ emacs -Q --batch --eval '(princ "\u0041\n")'
> A
> $ python -c "print ('\u0041')"
> A
> $ ruby -e 'print("\u0041\n")'
> A

I agree this is a bit surprising.
The full manual states:

  "Unicode characters in the ranges
  U+0000...U+009F, U+D800...U+DFFF cannot be specified by this syntax,
  except for U+0024 ($), U+0040 (@), and U+0060 (`)."

This was previously discussed at:
https://lists.gnu.org/archive/html/bug-coreutils/2008-05/threads.html#00067




Information forwarded to bug-coreutils@HIDDEN:
bug#36887; Package coreutils. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 1 Aug 2019 11:02:37 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Aug 01 07:02:37 2019
Received: from localhost ([127.0.0.1]:53542 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ht8qz-0005cF-KI
	for submit <at> debbugs.gnu.org; Thu, 01 Aug 2019 07:02:37 -0400
Received: from lists.gnu.org ([209.51.188.17]:48377)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ulm@HIDDEN>) id 1ht8qx-0005c5-JE
 for submit <at> debbugs.gnu.org; Thu, 01 Aug 2019 07:02:36 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:55624)
 by lists.gnu.org with esmtp (Exim 4.86_2)
 (envelope-from <ulm@HIDDEN>) id 1ht8qw-0003xp-MR
 for bug-coreutils@HIDDEN; Thu, 01 Aug 2019 07:02:35 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_20 autolearn=disabled
 version=3.3.2
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <ulm@HIDDEN>) id 1ht8qv-0000ZW-OI
 for bug-coreutils@HIDDEN; Thu, 01 Aug 2019 07:02:34 -0400
Received: from smtp.gentoo.org ([2001:470:ea4a:1:5054:ff:fec7:86e4]:50215)
 by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16)
 (Exim 4.71) (envelope-from <ulm@HIDDEN>) id 1ht8qv-0000Yg-J0
 for bug-coreutils@HIDDEN; Thu, 01 Aug 2019 07:02:33 -0400
Received: from a1i15 (host2092.kph.uni-mainz.de [134.93.134.92])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested) (Authenticated sender: ulm)
 by smtp.gentoo.org (Postfix) with ESMTPSA id 607F634915C;
 Thu,  1 Aug 2019 11:02:30 +0000 (UTC)
From: Ulrich Mueller <ulm@HIDDEN>
To: bug-coreutils@HIDDEN
Subject: coreutils-8.31: printf chokes on \u0041
Date: Thu, 01 Aug 2019 13:02:26 +0200
Message-ID: <w6gh871tagt.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
 recognized.
X-Received-From: 2001:470:ea4a:1:5054:ff:fec7:86e4
X-Spam-Score: -1.6 (-)
X-Debbugs-Envelope-To: submit
Cc: base-system@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.6 (--)

[Forwarding bug https://bugs.gentoo.org/680244 as requested by the
Gentoo package maintainer.]

According to printf(1):

   Interpreted sequences are:
   [...]
   
   \uHHHH Unicode (ISO/IEC 10646) character with hex value HHHH (4 digits)

   \UHHHHHHHH
          Unicode character with hex value HHHHHHHH (8 digits)

It does not work, though:

$ /usr/bin/printf '\u0041\n'
/usr/bin/printf: invalid universal character name \u0041
$ /usr/bin/printf '\U00000041\n'
/usr/bin/printf: invalid universal character name \U00000041

Other tools interpret the sequence correctly:

$ printf '\u0041\n'   # bash
A
$ echo -e '\u0041'    # bash
A
$ zsh -c "echo -e '\u0041'"
A
$ emacs -Q --batch --eval '(princ "\u0041\n")'
A
$ python -c "print ('\u0041')"
A
$ ruby -e 'print("\u0041\n")'
A




Acknowledgement sent to Ulrich Mueller <ulm@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-coreutils@HIDDEN. Full text available.
Report forwarded to bug-coreutils@HIDDEN:
bug#36887; Package coreutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Thu, 1 Aug 2019 20:30:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.