GNU logs - #17196, boring messages


Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#17196: UTF-8 printf string formating  problem
Resent-From: Jan Novak <jn@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Sat, 05 Apr 2014 23:22:01 +0000
Resent-Message-ID: <handler.17196.B.139674009412386 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 17196
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: 17196 <at> debbugs.gnu.org
X-Debbugs-Original-To: bug-coreutils@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.139674009412386
          (code B ref -1); Sat, 05 Apr 2014 23:22:01 +0000
Received: (at submit) by debbugs.gnu.org; 5 Apr 2014 23:21:34 +0000
Received: from localhost ([127.0.0.1]:37178 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WWZu1-0003Dh-7N
	for submit <at> debbugs.gnu.org; Sat, 05 Apr 2014 19:21:33 -0400
Received: from eggs.gnu.org ([208.118.235.92]:40757)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <jn@HIDDEN>) id 1WWZqP-000375-Bi
 for submit <at> debbugs.gnu.org; Sat, 05 Apr 2014 19:17:49 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <jn@HIDDEN>) id 1WWZqF-00042O-7k
 for submit <at> debbugs.gnu.org; Sat, 05 Apr 2014 19:17:49 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: *
X-Spam-Status: No, score=1.3 required=5.0 tests=BAYES_40,
 RCVD_IN_BL_SPAMCOP_NET autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:57101)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <jn@HIDDEN>)
 id 1WWZqF-00042K-4T
 for submit <at> debbugs.gnu.org; Sat, 05 Apr 2014 19:17:39 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:42472)
 by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <jn@HIDDEN>)
 id 1WWZq7-0001j1-KZ
 for bug-coreutils@HIDDEN; Sat, 05 Apr 2014 19:17:39 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <jn@HIDDEN>) id 1WWZq0-00041i-6a
 for bug-coreutils@HIDDEN; Sat, 05 Apr 2014 19:17:31 -0400
Received: from smtp1.gts.sk ([195.168.0.153]:52608 helo=smtp5.gts.sk)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <jn@HIDDEN>)
 id 1WWZpz-00041S-VS
 for bug-coreutils@HIDDEN; Sat, 05 Apr 2014 19:17:24 -0400
Received: from localhost (localhost [127.0.0.1])
 by smtp5.gts.sk (Postfix) with ESMTP id E9920E8069
 for <bug-coreutils@HIDDEN>; Sun,  6 Apr 2014 01:17:20 +0200 (CEST)
X-Virus-Scanned: amavisd-new at nextra.sk
Received: from smtp5.gts.sk ([195.168.0.153])
 by localhost (smtp.gts.sk [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id FCLwGCwX2sYd for <bug-coreutils@HIDDEN>;
 Sun,  6 Apr 2014 01:17:19 +0200 (CEST)
Received: from [10.1.2.4] (188-167-225-220.dynamic.chello.sk [188.167.225.220])
 (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits))
 (No client certificate requested)
 (Authenticated sender: nkame@HIDDEN)
 by smtp5.gts.sk (Postfix) with ESMTPSA id 6C90DE807B
 for <bug-coreutils@HIDDEN>; Sun,  6 Apr 2014 01:17:19 +0200 (CEST)
Message-ID: <53408EFF.7050601@HIDDEN>
Date: Sun, 06 Apr 2014 01:17:19 +0200
From: Jan Novak <jn@HIDDEN>
User-Agent: Mozilla/5.0 (X11; Linux i686;
 rv:24.0) Gecko/20100101 Thunderbird/24.4.0
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
 recognized.
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
 (bad octet value).
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -2.8 (--)
X-Mailman-Approved-At: Sat, 05 Apr 2014 19:21:31 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.8 (--)

Hello,

printf string format counts bytes instead of chars, which leads to broken=
 output ...
(the same problem occurs with bash built in printf)


just try this:

$ echo $LANG
us_US.UTF-8


$ printf "|%3s|\n" "a"
|  a|

$ printf "|%3s|\n" "=C3=A1"     (char is a-acute)
| =C3=A1|

expected output:
|  =C3=A1|

Is there some easy solution ?

TIA for the answer


Best regards
Novak




Message sent:


Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Mailer: MIME-tools 5.503 (Entity 5.503)
Content-Type: text/plain; charset=utf-8
X-Loop: help-debbugs@HIDDEN
From: help-debbugs@HIDDEN (GNU bug Tracking System)
To: Jan Novak <jn@HIDDEN>
Subject: bug#17196: Acknowledgement (UTF-8 printf string formating  problem)
Message-ID: <handler.17196.B.139674009412386.ack <at> debbugs.gnu.org>
References: <53408EFF.7050601@HIDDEN>
X-Gnu-PR-Message: ack 17196
X-Gnu-PR-Package: coreutils
Reply-To: 17196 <at> debbugs.gnu.org
Date: Sat, 05 Apr 2014 23:22:02 +0000

Thank you for filing a new bug report with debbugs.gnu.org.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
 bug-coreutils@HIDDEN

If you wish to submit further information on this problem, please
send it to 17196 <at> debbugs.gnu.org.

Please do not send mail to help-debbugs@HIDDEN unless you wish
to report a problem with the Bug-tracking system.

--=20
17196: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D17196
GNU Bug Tracking System
Contact help-debbugs@HIDDEN with problems


Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#17196: UTF-8 printf string formating  problem
Resent-From: =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Sun, 06 Apr 2014 10:16:01 +0000
Resent-Message-ID: <handler.17196.B17196.139677935028639 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 17196
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: Jan Novak <jn@HIDDEN>
Cc: 17196 <at> debbugs.gnu.org
Received: via spool by 17196-submit <at> debbugs.gnu.org id=B17196.139677935028639
          (code B ref 17196); Sun, 06 Apr 2014 10:16:01 +0000
Received: (at 17196) by debbugs.gnu.org; 6 Apr 2014 10:15:50 +0000
Received: from localhost ([127.0.0.1]:37447 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WWk7C-0007Rr-2C
	for submit <at> debbugs.gnu.org; Sun, 06 Apr 2014 06:15:50 -0400
Received: from mail1.vodafone.ie ([213.233.128.43]:17840)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <P@HIDDEN>) id 1WWk79-0007Rf-Q1
 for 17196 <at> debbugs.gnu.org; Sun, 06 Apr 2014 06:15:48 -0400
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApQBAEUnQVNtT6Td/2dsb2JhbAANS4NBg2HBBoErgxkBAQEEIw8BRhALDQEKAgIFFgsCAgkDAgECAUUGDQEHAQEXh2MIqg12oXoXgSmNSAeCb4FJAQOfVI53
Received: from unknown (HELO [192.168.1.79]) ([109.79.164.221])
 by mail1.vodafone.ie with ESMTP; 06 Apr 2014 11:15:45 +0100
Message-ID: <53412952.1040506@HIDDEN>
Date: Sun, 06 Apr 2014 11:15:46 +0100
From: =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:17.0) Gecko/20130110 Thunderbird/17.0.2
MIME-Version: 1.0
References: <53408EFF.7050601@HIDDEN>
In-Reply-To: <53408EFF.7050601@HIDDEN>
X-Enigmail-Version: 1.6
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

On 04/06/2014 12:17 AM, Jan Novak wrote:
> Hello,
> 
> printf string format counts bytes instead of chars, which leads to broken output ...
> (the same problem occurs with bash built in printf)
> 
> 
> just try this:
> 
> $ echo $LANG
> us_US.UTF-8
> 
> 
> $ printf "|%3s|\n" "a"
> |  a|
> 
> $ printf "|%3s|\n" "á"     (char is a-acute)
> | á|
> 
> expected output:
> |  á|
> 
> Is there some easy solution ?
> 
> TIA for the answer

Yes printf follows the C standard which only considers bytes.
awk does respect characters in width specifiers though:

  $ awk 'BEGIN{printf "|%3s|\n", "á"}'
  |  á|

I don't think we'd be able to change the current operation of printf
due to backwards compat reasons? Though we might be able to somehow leverage
the existing multibyte character aware alignment/truncation code in:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=gl/lib/mbsalign.c;hb=HEAD

thanks,
Pádraig.




Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#17196: UTF-8 printf string formating  problem
Resent-From: =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Sun, 06 Apr 2014 18:14:02 +0000
Resent-Message-ID: <handler.17196.B17196.139680800629081 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 17196
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: Jan Novak <jn@HIDDEN>
Cc: 17196 <at> debbugs.gnu.org
Received: via spool by 17196-submit <at> debbugs.gnu.org id=B17196.139680800629081
          (code B ref 17196); Sun, 06 Apr 2014 18:14:02 +0000
Received: (at 17196) by debbugs.gnu.org; 6 Apr 2014 18:13:26 +0000
Received: from localhost ([127.0.0.1]:38323 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WWrZN-0007Yy-TT
	for submit <at> debbugs.gnu.org; Sun, 06 Apr 2014 14:13:26 -0400
Received: from mail1.vodafone.ie ([213.233.128.43]:63816)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <P@HIDDEN>) id 1WWrZL-0007Yl-De
 for 17196 <at> debbugs.gnu.org; Sun, 06 Apr 2014 14:13:24 -0400
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApQBALyYQVNtT6Td/2dsb2JhbAANS4civX+DDoErgxkBAQEEIw8BRhALDQEKAgIFFgsCAgkDAgECAUUGDQEHAQEXh2OoSXaiFReBKY1IB4JvgUkBA59Ujnc
Received: from unknown (HELO [192.168.1.79]) ([109.79.164.221])
 by mail1.vodafone.ie with ESMTP; 06 Apr 2014 19:13:21 +0100
Message-ID: <53419941.7090105@HIDDEN>
Date: Sun, 06 Apr 2014 19:13:21 +0100
From: =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:17.0) Gecko/20130110 Thunderbird/17.0.2
MIME-Version: 1.0
References: <53408EFF.7050601@HIDDEN> <53412952.1040506@HIDDEN>
In-Reply-To: <53412952.1040506@HIDDEN>
X-Enigmail-Version: 1.6
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

On 04/06/2014 11:15 AM, Pádraig Brady wrote:
> On 04/06/2014 12:17 AM, Jan Novak wrote:
>> Hello,
>>
>> printf string format counts bytes instead of chars, which leads to broken output ...
>> (the same problem occurs with bash built in printf)
>>
>>
>> just try this:
>>
>> $ echo $LANG
>> us_US.UTF-8
>>
>>
>> $ printf "|%3s|\n" "a"
>> |  a|
>>
>> $ printf "|%3s|\n" "á"     (char is a-acute)
>> | á|
>>
>> expected output:
>> |  á|
>>
>> Is there some easy solution ?
>>
>> TIA for the answer
> 
> Yes printf follows the C standard which only considers bytes.
> awk does respect characters in width specifiers though:
> 
>   $ awk 'BEGIN{printf "|%3s|\n", "á"}'
>   |  á|

Jan points out to me the the awk solution is not portable
to mawk 1.3.3 at least. I used GNU Awk 3.1.8 above.

Pádraig.





Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#17196: UTF-8 printf string formating  problem
Resent-From: Bob Proulx <bob@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Sun, 06 Apr 2014 18:25:02 +0000
Resent-Message-ID: <handler.17196.B17196.139680869330260 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 17196
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: 17196 <at> debbugs.gnu.org
Cc: Jan Novak <jn@HIDDEN>
Received: via spool by 17196-submit <at> debbugs.gnu.org id=B17196.139680869330260
          (code B ref 17196); Sun, 06 Apr 2014 18:25:02 +0000
Received: (at 17196) by debbugs.gnu.org; 6 Apr 2014 18:24:53 +0000
Received: from localhost ([127.0.0.1]:38329 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WWrkS-0007s0-9M
	for submit <at> debbugs.gnu.org; Sun, 06 Apr 2014 14:24:52 -0400
Received: from joseki.proulx.com ([216.17.153.58]:48570)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <bob@HIDDEN>) id 1WWrkO-0007rk-QD
 for 17196 <at> debbugs.gnu.org; Sun, 06 Apr 2014 14:24:50 -0400
Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119])
 by joseki.proulx.com (Postfix) with ESMTP id 9224721233;
 Sun,  6 Apr 2014 12:24:47 -0600 (MDT)
Received: by hysteria.proulx.com (Postfix, from userid 1000)
 id 62F292DC9A; Sun,  6 Apr 2014 12:24:47 -0600 (MDT)
Date: Sun, 6 Apr 2014 12:24:47 -0600
From: Bob Proulx <bob@HIDDEN>
Message-ID: <20140406182447.GA1381@HIDDEN>
References: <53408EFF.7050601@HIDDEN>
 <53412952.1040506@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <53412952.1040506@HIDDEN>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Spam-Score: -0.3 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.3 (/)

Pádraig Brady wrote:
> Yes printf follows the C standard which only considers bytes.
> ...
> I don't think we'd be able to change the current operation of printf
> due to backwards compat reasons? Though we might be able to somehow leverage
> the existing multibyte character aware alignment/truncation code in:
> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=gl/lib/mbsalign.c;hb=HEAD

Dan Douglas pointed out in the corresponding discussion in bug-bash
that ksh uses the L modifier.

  http://lists.gnu.org/archive/html/bug-bash/2014-04/msg00021.html

  Dan Douglas wrote:
  > ksh93 already has this feature using the "L" modifier:
  > 
  > ksh -c "printf '%.3Ls\n' $'\u2605\u2605\u2605\u2605\u2605'"
  > ★★★

At least there is prior art for it.

Bob




Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#17196: UTF-8 printf string formating  problem
Resent-From: =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Mon, 07 Apr 2014 13:09:02 +0000
Resent-Message-ID: <handler.17196.B17196.13968760931433 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 17196
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: Bob Proulx <bob@HIDDEN>
Cc: 17196 <at> debbugs.gnu.org, Jan Novak <jn@HIDDEN>
Received: via spool by 17196-submit <at> debbugs.gnu.org id=B17196.13968760931433
          (code B ref 17196); Mon, 07 Apr 2014 13:09:02 +0000
Received: (at 17196) by debbugs.gnu.org; 7 Apr 2014 13:08:13 +0000
Received: from localhost ([127.0.0.1]:38921 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WX9HY-0000N1-MP
	for submit <at> debbugs.gnu.org; Mon, 07 Apr 2014 09:08:13 -0400
Received: from mail2.vodafone.ie ([213.233.128.44]:10186)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <P@HIDDEN>) id 1WX9HV-0000Mp-MS
 for 17196 <at> debbugs.gnu.org; Mon, 07 Apr 2014 09:08:10 -0400
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApUBAFCiQlNtTJL0/2dsb2JhbAANTINBg2G5WYc3gTeDGQEBAQQBAiAPAUYQCQINCwICBRYLAgIJAwIBAgEWLwYNAQcBAYd6CI0JmyJ2oiAXgSmNSAeCb4FJAQOWBIQLhUWOdw
Received: from unknown (HELO [192.168.1.79]) ([109.76.146.244])
 by mail2.vodafone.ie with ESMTP; 07 Apr 2014 14:08:08 +0100
Message-ID: <5342A337.9000407@HIDDEN>
Date: Mon, 07 Apr 2014 14:08:07 +0100
From: =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:17.0) Gecko/20130110 Thunderbird/17.0.2
MIME-Version: 1.0
References: <53408EFF.7050601@HIDDEN> <53412952.1040506@HIDDEN>
 <20140406182447.GA1381@HIDDEN>
In-Reply-To: <20140406182447.GA1381@HIDDEN>
X-Enigmail-Version: 1.6
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

On 04/06/2014 07:24 PM, Bob Proulx wrote:
> Pádraig Brady wrote:
>> Yes printf follows the C standard which only considers bytes.
>> ...
>> I don't think we'd be able to change the current operation of printf
>> due to backwards compat reasons? Though we might be able to somehow leverage
>> the existing multibyte character aware alignment/truncation code in:
>> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=gl/lib/mbsalign.c;hb=HEAD
> 
> Dan Douglas pointed out in the corresponding discussion in bug-bash
> that ksh uses the L modifier.
> 
>   http://lists.gnu.org/archive/html/bug-bash/2014-04/msg00021.html
> 
>   Dan Douglas wrote:
>   > ksh93 already has this feature using the "L" modifier:
>   > 
>   > ksh -c "printf '%.3Ls\n' $'\u2605\u2605\u2605\u2605\u2605'"
>   > ★★★
> 
> At least there is prior art for it.

So we can count bytes, chars or cells (graphemes).

Thinking a bit more about it, I think shell level printf
should be dealing in text of the current encoding and counting cells.
In the edge case where you want to deal in bytes one can do:
  LC_ALL=C printf ...

I see that ksh behaves as I would expect and counts cells,
though requires the explicit %L enabler:
  $ ksh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
  á★★
  $ ksh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'"
  A★
  $ ksh -c "printf '%.3Ls\n' $'AA\u2605\u2605\u2605'"
  A

zsh seems to just count characters:
  $ zsh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
  á★
  $ zsh -c "printf '%.3s\n' $'a\u0301\u2605\u2605\u2605'"
  á★
  $ zsh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'"
  A★★

I see that dash gives invalid directive for any of %ls %Ls %S.

Pity there is no consensus here.
Personally I would go for:
  printf '%3s' 'blah'  # count cells
  printf '%3Ls' 'blah' # count chars
  LANG=C '%3Ls' 'blah' # count bytes
  LANG=C '%3s' 'blah'  # count bytes

Pádraig.





Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#17196: UTF-8 printf string formating  problem
Resent-From: Jan Novak <jn@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Mon, 07 Apr 2014 21:42:02 +0000
Resent-Message-ID: <handler.17196.B17196.139690687126574 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 17196
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>,  Bob Proulx <bob@HIDDEN>
Cc: 17196 <at> debbugs.gnu.org
Received: via spool by 17196-submit <at> debbugs.gnu.org id=B17196.139690687126574
          (code B ref 17196); Mon, 07 Apr 2014 21:42:02 +0000
Received: (at 17196) by debbugs.gnu.org; 7 Apr 2014 21:41:11 +0000
Received: from localhost ([127.0.0.1]:39963 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WXHHy-0006uX-Th
	for submit <at> debbugs.gnu.org; Mon, 07 Apr 2014 17:41:11 -0400
Received: from smtp1.gts.sk ([195.168.0.153]:49961 helo=smtp5.gts.sk)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <jn@HIDDEN>) id 1WXHHv-0006uJ-H1
 for 17196 <at> debbugs.gnu.org; Mon, 07 Apr 2014 17:41:08 -0400
Received: from localhost (localhost [127.0.0.1])
 by smtp5.gts.sk (Postfix) with ESMTP id EBF68E805D;
 Mon,  7 Apr 2014 23:41:05 +0200 (CEST)
X-Virus-Scanned: amavisd-new at nextra.sk
Received: from smtp5.gts.sk ([195.168.0.153])
 by localhost (smtp.gts.sk [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 9YEBQzx29SJh; Mon,  7 Apr 2014 23:41:04 +0200 (CEST)
Received: from [10.1.2.4] (188-167-225-220.dynamic.chello.sk [188.167.225.220])
 (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits))
 (No client certificate requested)
 (Authenticated sender: nkame@HIDDEN)
 by smtp5.gts.sk (Postfix) with ESMTPSA id 352F0E8006;
 Mon,  7 Apr 2014 23:41:04 +0200 (CEST)
Message-ID: <53431B6F.1040108@HIDDEN>
Date: Mon, 07 Apr 2014 23:41:03 +0200
From: Jan Novak <jn@HIDDEN>
User-Agent: Mozilla/5.0 (X11; Linux i686;
 rv:24.0) Gecko/20100101 Thunderbird/24.4.0
MIME-Version: 1.0
References: <53408EFF.7050601@HIDDEN> <53412952.1040506@HIDDEN>
 <20140406182447.GA1381@HIDDEN> <5342A337.9000407@HIDDEN>
In-Reply-To: <5342A337.9000407@HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Score: -0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.0 (/)

Pádraig Brady wrote:
> Pity there is no consensus here.
> Personally I would go for:
>    printf '%3s' 'blah'  # count cells
>    printf '%3Ls' 'blah' # count chars
>    LANG=C '%3Ls' 'blah' # count bytes
>    LANG=C '%3s' 'blah'  # count bytes

I vote for it ...
it is excellent idea, that "standard" notation works properly in localized environment !
(because this is exactly what users expect)

Thanks !
novak




Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#17196: UTF-8 printf string formating  problem
Resent-From: Eric Blake <eblake@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Mon, 07 Apr 2014 21:58:02 +0000
Resent-Message-ID: <handler.17196.B17196.139690783128140 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 17196
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>, Bob Proulx <bob@HIDDEN>
Cc: 17196 <at> debbugs.gnu.org, Austin Group <austin-group-l@HIDDEN>, Jan Novak <jn@HIDDEN>
Received: via spool by 17196-submit <at> debbugs.gnu.org id=B17196.139690783128140
          (code B ref 17196); Mon, 07 Apr 2014 21:58:02 +0000
Received: (at 17196) by debbugs.gnu.org; 7 Apr 2014 21:57:11 +0000
Received: from localhost ([127.0.0.1]:39976 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WXHXS-0007Jn-5M
	for submit <at> debbugs.gnu.org; Mon, 07 Apr 2014 17:57:10 -0400
Received: from mx1.redhat.com ([209.132.183.28]:50461)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <eblake@HIDDEN>) id 1WXHXO-0007Jd-Is
 for 17196 <at> debbugs.gnu.org; Mon, 07 Apr 2014 17:57:08 -0400
Received: from int-mx02.intmail.prod.int.phx2.redhat.com
 (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12])
 by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s37Lv4Hw005827
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
 Mon, 7 Apr 2014 17:57:05 -0400
Received: from [10.3.113.181] (ovpn-113-181.phx2.redhat.com [10.3.113.181])
 by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id
 s37Lv3Y2001250; Mon, 7 Apr 2014 17:57:04 -0400
Message-ID: <53431F2F.8060701@HIDDEN>
Date: Mon, 07 Apr 2014 15:57:03 -0600
From: Eric Blake <eblake@HIDDEN>
Organization: Red Hat, Inc.
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:24.0) Gecko/20100101 Thunderbird/24.4.0
MIME-Version: 1.0
References: <53408EFF.7050601@HIDDEN>
 <53412952.1040506@HIDDEN>	<20140406182447.GA1381@HIDDEN>
 <5342A337.9000407@HIDDEN>
In-Reply-To: <5342A337.9000407@HIDDEN>
X-Enigmail-Version: 1.6
OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg
Content-Type: multipart/signed; micalg=pgp-sha256;
 protocol="application/pgp-signature";
 boundary="IT8uTs5CGj9Cnq7XtFEWmDVt7rWHNjtJ8"
X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12
X-Spam-Score: -5.3 (-----)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.3 (-----)

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--IT8uTs5CGj9Cnq7XtFEWmDVt7rWHNjtJ8
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

[adding the Austin Group]

On 04/07/2014 07:08 AM, P=C3=A1draig Brady wrote:
> On 04/06/2014 07:24 PM, Bob Proulx wrote:
>> P=C3=A1draig Brady wrote:
>>> Yes printf follows the C standard which only considers bytes.
>>> ...
>>> I don't think we'd be able to change the current operation of printf
>>> due to backwards compat reasons? Though we might be able to somehow l=
everage
>>> the existing multibyte character aware alignment/truncation code in:
>>> http://git.sv.gnu.org/gitweb/?p=3Dcoreutils.git;a=3Dblob;f=3Dgl/lib/m=
bsalign.c;hb=3DHEAD
>>
>> Dan Douglas pointed out in the corresponding discussion in bug-bash
>> that ksh uses the L modifier.
>>
>>   http://lists.gnu.org/archive/html/bug-bash/2014-04/msg00021.html
>>
>>   Dan Douglas wrote:
>>   > ksh93 already has this feature using the "L" modifier:
>>   >=20
>>   > ksh -c "printf '%.3Ls\n' $'\u2605\u2605\u2605\u2605\u2605'"
>>   > =E2=98=85=E2=98=85=E2=98=85
>>
>> At least there is prior art for it.
>=20
> So we can count bytes, chars or cells (graphemes).
>=20
> Thinking a bit more about it, I think shell level printf
> should be dealing in text of the current encoding and counting cells.
> In the edge case where you want to deal in bytes one can do:
>   LC_ALL=3DC printf ...
>=20
> I see that ksh behaves as I would expect and counts cells,
> though requires the explicit %L enabler:
>   $ ksh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
>   a=CC=81=E2=98=85=E2=98=85
>   $ ksh -c "printf '%.3Ls\n' $'=EF=BC=A1\u2605\u2605\u2605'"
>   =EF=BC=A1=E2=98=85
>   $ ksh -c "printf '%.3Ls\n' $'=EF=BC=A1=EF=BC=A1\u2605\u2605\u2605'"
>   =EF=BC=A1
>=20
> zsh seems to just count characters:
>   $ zsh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
>   a=CC=81=E2=98=85
>   $ zsh -c "printf '%.3s\n' $'a\u0301\u2605\u2605\u2605'"
>   a=CC=81=E2=98=85
>   $ zsh -c "printf '%.3Ls\n' $'=EF=BC=A1\u2605\u2605\u2605'"
>   =EF=BC=A1=E2=98=85=E2=98=85
>=20
> I see that dash gives invalid directive for any of %ls %Ls %S.
>=20
> Pity there is no consensus here.
> Personally I would go for:
>   printf '%3s' 'blah'  # count cells
>   printf '%3Ls' 'blah' # count chars
>   LANG=3DC '%3Ls' 'blah' # count bytes
>   LANG=3DC '%3s' 'blah'  # count bytes

Hmm.  POSIX requires support for %ls (aka %S) according to byte counts,
and currently states that %Ls is undefined.  But I would LOVE to have a
standardized spelling for counting characters instead of bytes.  The
extension %Ls looks like a good candidate for standardization, precisely
because counting characters when printing a multibyte string is more
useful than counting bytes (you do NOT want to end in the middle of a
multibyte character), and because ksh offers it as existing practice.

Your idea for counting "cells" (by which I'm assuming you mean one or
more characters that all display within the same cell of the terminal,
as if the end user saw only one grapheme), on the other hand, does not
seem to have any precedence, and I would strongly object to having %s
count by cells because %s already has a standardized (if unfortunate)
meaning of counting by bytes.  Maybe yet another extension is warranted
(perhaps %LLs?) as a new notion for counting by cells instead of
characters, but it's harder to justify that without existing practice.

--=20
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


--IT8uTs5CGj9Cnq7XtFEWmDVt7rWHNjtJ8
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBCAAGBQJTQx8vAAoJEKeha0olJ0NqbWkH/AtqespL088wPpB5djiIJwc6
L4oyBo3wMGOdB3XIV4eeJzGm9shYMA9aVw+8y1VH/5xTi52FqTmy0EkVsJ/nDrb0
ZU3OyXQC5U5s/ufcgY5oIo0IBVSduetbR0rgG1/I7rNyqiLV0+AK5RJcwDcAxmaT
5mhrpYMnKHIhDwKBlZ+Fm224o8jDHvg46C7R2XmHCAQ5ayKfw6mMYqyyup0pHDyO
/Bu8dhdLmIsj+prRw5JkqvyEO1gfo0rJC005kktqD4zr3NWpkwDSG7O8CAW67ZMV
G305iLrgEkr6knbmLt/BjDci6OyPvmNqSYataieBWkmUKoYl4GPjfY9sQsi93Fw=
=vBNo
-----END PGP SIGNATURE-----

--IT8uTs5CGj9Cnq7XtFEWmDVt7rWHNjtJ8--




Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#17196: UTF-8 printf string formating  problem
Resent-From: =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Tue, 08 Apr 2014 00:12:01 +0000
Resent-Message-ID: <handler.17196.B17196.13969158838877 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 17196
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: Eric Blake <eblake@HIDDEN>
Cc: 17196 <at> debbugs.gnu.org, Austin Group <austin-group-l@HIDDEN>, Bob Proulx <bob@HIDDEN>, Jan Novak <jn@HIDDEN>
Received: via spool by 17196-submit <at> debbugs.gnu.org id=B17196.13969158838877
          (code B ref 17196); Tue, 08 Apr 2014 00:12:01 +0000
Received: (at 17196) by debbugs.gnu.org; 8 Apr 2014 00:11:23 +0000
Received: from localhost ([127.0.0.1]:40018 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WXJdG-0002J3-TE
	for submit <at> debbugs.gnu.org; Mon, 07 Apr 2014 20:11:23 -0400
Received: from mail2.vodafone.ie ([213.233.128.44]:3379)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <P@HIDDEN>) id 1WXJdE-0002Iu-KO
 for 17196 <at> debbugs.gnu.org; Mon, 07 Apr 2014 20:11:17 -0400
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApUBAHY9Q1NtTJL0/2dsb2JhbAANTINBg2G5bIc3gT2DGQEBAQMBAQIgDwFGBQsJAg0BCgICBRYLAgIJAwIBAgEWLwYNAQcBAYdtDQiMc5sidqIwF4EpjUgHgm+BSQEDlgSEC4VFjnc
Received: from unknown (HELO [192.168.1.79]) ([109.76.146.244])
 by mail2.vodafone.ie with ESMTP; 08 Apr 2014 01:11:14 +0100
Message-ID: <53433EA1.4010204@HIDDEN>
Date: Tue, 08 Apr 2014 01:11:13 +0100
From: =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:17.0) Gecko/20130110 Thunderbird/17.0.2
MIME-Version: 1.0
References: <53408EFF.7050601@HIDDEN>
 <53412952.1040506@HIDDEN>	<20140406182447.GA1381@HIDDEN>
 <5342A337.9000407@HIDDEN> <53431F2F.8060701@HIDDEN>
In-Reply-To: <53431F2F.8060701@HIDDEN>
X-Enigmail-Version: 1.6
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

On 04/07/2014 10:57 PM, Eric Blake wrote:
> [adding the Austin Group]
> 
> On 04/07/2014 07:08 AM, Pádraig Brady wrote:
>> On 04/06/2014 07:24 PM, Bob Proulx wrote:
>>> Pádraig Brady wrote:
>>>> Yes printf follows the C standard which only considers bytes.
>>>> ...
>>>> I don't think we'd be able to change the current operation of printf
>>>> due to backwards compat reasons? Though we might be able to somehow leverage
>>>> the existing multibyte character aware alignment/truncation code in:
>>>> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=gl/lib/mbsalign.c;hb=HEAD
>>>
>>> Dan Douglas pointed out in the corresponding discussion in bug-bash
>>> that ksh uses the L modifier.
>>>
>>>   http://lists.gnu.org/archive/html/bug-bash/2014-04/msg00021.html
>>>
>>>   Dan Douglas wrote:
>>>   > ksh93 already has this feature using the "L" modifier:
>>>   > 
>>>   > ksh -c "printf '%.3Ls\n' $'\u2605\u2605\u2605\u2605\u2605'"
>>>   > ★★★
>>>
>>> At least there is prior art for it.
>>
>> So we can count bytes, chars or cells (graphemes).
>>
>> Thinking a bit more about it, I think shell level printf
>> should be dealing in text of the current encoding and counting cells.
>> In the edge case where you want to deal in bytes one can do:
>>   LC_ALL=C printf ...
>>
>> I see that ksh behaves as I would expect and counts cells,
>> though requires the explicit %L enabler:
>>   $ ksh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
>>   á★★
>>   $ ksh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'"
>>   A★
>>   $ ksh -c "printf '%.3Ls\n' $'AA\u2605\u2605\u2605'"
>>   A
>>
>> zsh seems to just count characters:
>>   $ zsh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
>>   á★
>>   $ zsh -c "printf '%.3s\n' $'a\u0301\u2605\u2605\u2605'"
>>   á★
>>   $ zsh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'"
>>   A★★
>>
>> I see that dash gives invalid directive for any of %ls %Ls %S.
>>
>> Pity there is no consensus here.
>> Personally I would go for:
>>   printf '%3s' 'blah'  # count cells
>>   printf '%3Ls' 'blah' # count chars
>>   LANG=C '%3Ls' 'blah' # count bytes
>>   LANG=C '%3s' 'blah'  # count bytes
> 
> Hmm.  POSIX requires support for %ls (aka %S) according to byte counts,
> and currently states that %Ls is undefined.  But I would LOVE to have a
> standardized spelling for counting characters instead of bytes.  The
> extension %Ls looks like a good candidate for standardization, precisely
> because counting characters when printing a multibyte string is more
> useful than counting bytes (you do NOT want to end in the middle of a
> multibyte character), and because ksh offers it as existing practice.

Note ksh seems to count cells with %Ls

> Your idea for counting "cells" (by which I'm assuming you mean one or
> more characters that all display within the same cell of the terminal,
> as if the end user saw only one grapheme), on the other hand, does not
> seem to have any precedence, and I would strongly object to having %s
> count by cells because %s already has a standardized (if unfortunate)
> meaning of counting by bytes.  Maybe yet another extension is warranted
> (perhaps %LLs?) as a new notion for counting by cells instead of
> characters, but it's harder to justify that without existing practice.

At the shell level I expect that the vast majority
of uses would prefer to be specifying cell counts.
I thought there might not be much backwards compat issues
with doing that, especially since zsh and gawk adjust
the meaning of %s according to the locale
(albeit for char rather than cell count).

But it's a fair point that there may be scripts
that don't consider the zsh behavior.

If we had to make it explicit for backwards compat reasons,
then I suppose counting by characters is the least useful,
so we could just standardize the existing ksh behavior and have:

   printf '%3s' 'blah'  # count bytes
   printf '%3Ls' 'blah' # count cells
   LANG=C '%3Ls' 'blah' # count bytes

This has the disadvantage of not degrading gracefully
on dash for example where %Ls is rejected.

thanks,
Pádraig.




Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#17196: UTF-8 printf string formating  problem
Resent-From: Eric Blake <eblake@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Tue, 08 Apr 2014 01:29:01 +0000
Resent-Message-ID: <handler.17196.B17196.139692049816503 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 17196
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>
Cc: 17196 <at> debbugs.gnu.org, Austin Group <austin-group-l@HIDDEN>, Bob Proulx <bob@HIDDEN>, Jan Novak <jn@HIDDEN>
Received: via spool by 17196-submit <at> debbugs.gnu.org id=B17196.139692049816503
          (code B ref 17196); Tue, 08 Apr 2014 01:29:01 +0000
Received: (at 17196) by debbugs.gnu.org; 8 Apr 2014 01:28:18 +0000
Received: from localhost ([127.0.0.1]:40037 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WXKpl-0004I6-Kj
	for submit <at> debbugs.gnu.org; Mon, 07 Apr 2014 21:28:18 -0400
Received: from mx1.redhat.com ([209.132.183.28]:19405)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <eblake@HIDDEN>) id 1WXKph-0004Hu-Rm
 for 17196 <at> debbugs.gnu.org; Mon, 07 Apr 2014 21:28:15 -0400
Received: from int-mx13.intmail.prod.int.phx2.redhat.com
 (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26])
 by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s381SBwj003254
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
 Mon, 7 Apr 2014 21:28:12 -0400
Received: from [10.3.113.181] (ovpn-113-181.phx2.redhat.com [10.3.113.181])
 by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id
 s381SAUH012178; Mon, 7 Apr 2014 21:28:10 -0400
Message-ID: <534350AA.2050803@HIDDEN>
Date: Mon, 07 Apr 2014 19:28:10 -0600
From: Eric Blake <eblake@HIDDEN>
Organization: Red Hat, Inc.
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:24.0) Gecko/20100101 Thunderbird/24.4.0
MIME-Version: 1.0
References: <53408EFF.7050601@HIDDEN>
 <53412952.1040506@HIDDEN>	<20140406182447.GA1381@HIDDEN>
 <5342A337.9000407@HIDDEN> <53431F2F.8060701@HIDDEN>
 <53433EA1.4010204@HIDDEN>
In-Reply-To: <53433EA1.4010204@HIDDEN>
X-Enigmail-Version: 1.6
OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg
Content-Type: multipart/signed; micalg=pgp-sha256;
 protocol="application/pgp-signature";
 boundary="WKOPh9dwtCRoU95obFdvxAM1SCC5dAhQe"
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26
X-Spam-Score: -5.3 (-----)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.3 (-----)

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--WKOPh9dwtCRoU95obFdvxAM1SCC5dAhQe
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 04/07/2014 06:11 PM, P=C3=A1draig Brady wrote:

>=20
> If we had to make it explicit for backwards compat reasons,
> then I suppose counting by characters is the least useful,
> so we could just standardize the existing ksh behavior and have:
>=20
>    printf '%3s' 'blah'  # count bytes
>    printf '%3Ls' 'blah' # count cells
>    LANG=3DC '%3Ls' 'blah' # count bytes

If we add %3Ls to the shell, we should also add it to libc's printf(3),
which means coordinating with the C committee.

>=20
> This has the disadvantage of not degrading gracefully
> on dash for example where %Ls is rejected.

If a future version of the standard mandates behavior for %Ls, I suspect
dash would be made compliant fairly quickly - the dash maintainers
strive hard to comply with POSIX.

--=20
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


--WKOPh9dwtCRoU95obFdvxAM1SCC5dAhQe
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBCAAGBQJTQ1CqAAoJEKeha0olJ0Nq1fMH/iocyOefBelzJjRFQe9OpSZH
U4Od8i/T8FNt+2kaUbaYud8Hq7hlciSdp1vbB1GFur89qQ9hH5fzvQMEdZyhaazx
Rurfq8nT1hBjUkNbbb60TYovJY71Pqkmuop32BrmpwYNoM/K2cthcHD9RO7djXQ0
lN/zAEFtrs7/ETJT2/FrieIBci98bCjggEMQ15rbkpTPZ6sWJLk03aHqpDZKQ/+j
8GD7fZJwCKWV4g3Rn13Qc+enT9Wnxx1L5Y+6P5fGbx7pxPD6mK3pUmyCewwjFong
iKM9H7fb2iUaWphMlefooeWhnvtvb38E9Srm78N0ZQsIH/iMbTknOfT07I5mw48=
=XKN5
-----END PGP SIGNATURE-----

--WKOPh9dwtCRoU95obFdvxAM1SCC5dAhQe--




Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#17196: UTF-8 printf string formating  problem
Resent-From: Steffen Nurpmeso <sdaoden@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Wed, 09 Apr 2014 15:48:03 +0000
Resent-Message-ID: <handler.17196.B17196.139705844421657 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 17196
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: Eric Blake <eblake@HIDDEN>
Cc: 17196 <at> debbugs.gnu.org, =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>, Bob Proulx <bob@HIDDEN>, Austin Group <austin-group-l@HIDDEN>, Jan Novak <jn@HIDDEN>
Received: via spool by 17196-submit <at> debbugs.gnu.org id=B17196.139705844421657
          (code B ref 17196); Wed, 09 Apr 2014 15:48:03 +0000
Received: (at 17196) by debbugs.gnu.org; 9 Apr 2014 15:47:24 +0000
Received: from localhost ([127.0.0.1]:39200 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WXuig-0005dD-In
	for submit <at> debbugs.gnu.org; Wed, 09 Apr 2014 11:47:23 -0400
Received: from forward4l.mail.yandex.net ([84.201.143.137]:47067)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <sdaoden@HIDDEN>) id 1WXrwp-00010T-Jg
 for 17196 <at> debbugs.gnu.org; Wed, 09 Apr 2014 08:49:48 -0400
Received: from smtp1h.mail.yandex.net (smtp1h.mail.yandex.net [84.201.187.144])
 by forward4l.mail.yandex.net (Yandex) with ESMTP id A5BE81441127;
 Wed,  9 Apr 2014 16:49:39 +0400 (MSK)
Received: from smtp1h.mail.yandex.net (localhost [127.0.0.1])
 by smtp1h.mail.yandex.net (Yandex) with ESMTP id B63851340F6C;
 Wed,  9 Apr 2014 16:49:38 +0400 (MSK)
Received: from unknown (unknown [82.113.106.166])
 by smtp1h.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id WBC3dR9mYn-naD4f4Cf; 
 Wed,  9 Apr 2014 16:49:37 +0400
 (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
 (Client certificate not present)
X-Yandex-Uniq: a0da012c-a10d-40b9-bc00-e1c953c90020
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.com; s=mail;
 t=1397047777; bh=CIw9qbQeAYBBogQWUXflnowQwxxIj8pfP4P/KUAWE2g=;
 h=Date:From:To:Cc:Subject:Message-ID:References:In-Reply-To:
 User-Agent:MIME-Version:Content-Type:Content-Transfer-Encoding;
 b=pBmJsEEBPEZU+9UxD87VZlFFasK4VUWKRpiwP1g+mz4W283R/aJarhrLG5STNS1TU
 GXyTUrA9CwVw5K6khosOA3krKyIWUPzmP7blmBxi0GdXWDhrk4gHU2gRXt5J7hz8ea
 qsq+F4t0/cVps584v90Jv8hDIyaPVLRzhhcpiaxc=
Authentication-Results: smtp1h.mail.yandex.net; dkim=pass header.i=@yandex.com
Date: Wed, 09 Apr 2014 14:49:37 +0200
From: Steffen Nurpmeso <sdaoden@HIDDEN>
Message-ID: <20140409134937.G1Sjvh4wKZUfnofJrM0R7RoW@HIDDEN>
References: <53408EFF.7050601@HIDDEN> <53412952.1040506@HIDDEN>
 <20140406182447.GA1381@HIDDEN>
 <5342A337.9000407@HIDDEN> <53431F2F.8060701@HIDDEN>
In-Reply-To: <53431F2F.8060701@HIDDEN>
User-Agent: s-nail v14.6.4-1-ga39836e
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: 0.0 (/)
X-Mailman-Approved-At: Wed, 09 Apr 2014 11:47:20 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

Eric Blake <eblake@HIDDEN> wrote:
 |>>   Dan Douglas wrote:
 |>>> ksh93 already has this feature using the "L" modifier:
 |>>>=20
 |>>> ksh -c "printf '%.3Ls\n' $'\u2605\u2605\u2605\u2605\u2605'"
 |>>> =E2=98=85=E2=98=85=E2=98=85
 |>>
 |>> At least there is prior art for it.
 |>=20
 |> So we can count bytes, chars or cells (graphemes).
 |>=20
 |> Thinking a bit more about it, I think shell level printf
 |> should be dealing in text of the current encoding and counting cells.
 |> In the edge case where you want to deal in bytes one can do:
 |>   LC_ALL=3DC printf ...
 |>=20
 |> I see that ksh behaves as I would expect and counts cells,
 |> though requires the explicit %L enabler:
 |>   $ ksh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
 |>   a=CC=81=E2=98=85=E2=98=85
 |>   $ ksh -c "printf '%.3Ls\n' $'=EF=BC=A1\u2605\u2605\u2605'"
 |>   =EF=BC=A1=E2=98=85
 |>   $ ksh -c "printf '%.3Ls\n' $'=EF=BC=A1=EF=BC=A1\u2605\u2605\u2605'"
 |>   =EF=BC=A1
 |>=20
 |> zsh seems to just count characters:
 |>   $ zsh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
 |>   a=CC=81=E2=98=85
 |>   $ zsh -c "printf '%.3s\n' $'a\u0301\u2605\u2605\u2605'"
 |>   a=CC=81=E2=98=85
 |>   $ zsh -c "printf '%.3Ls\n' $'=EF=BC=A1\u2605\u2605\u2605'"
 |>   =EF=BC=A1=E2=98=85=E2=98=85
 |>=20
 |> I see that dash gives invalid directive for any of %ls %Ls %S.
 |>=20
 |> Pity there is no consensus here.
 |> Personally I would go for:
 |>   printf '%3s' 'blah'  # count cells
 |>   printf '%3Ls' 'blah' # count chars
 |>   LANG=3DC '%3Ls' 'blah' # count bytes
 |>   LANG=3DC '%3s' 'blah'  # count bytes
 |
 |Hmm.  POSIX requires support for %ls (aka %S) according to byte counts,
 |and currently states that %Ls is undefined.  But I would LOVE to have a
 |standardized spelling for counting characters instead of bytes.  The
 |extension %Ls looks like a good candidate for standardization, precisely
 |because counting characters when printing a multibyte string is more
 |useful than counting bytes (you do NOT want to end in the middle of a
 |multibyte character), and because ksh offers it as existing practice.
 |
 |Your idea for counting "cells" (by which I'm assuming you mean one or
 |more characters that all display within the same cell of the terminal,
 |as if the end user saw only one grapheme), on the other hand, does not
 |seem to have any precedence, and I would strongly object to having %s
 |count by cells because %s already has a standardized (if unfortunate)
 |meaning of counting by bytes.  Maybe yet another extension is warranted
 |(perhaps %LLs?) as a new notion for counting by cells instead of
 |characters, but it's harder to justify that without existing practice.

I see you are trying to invent the word character for code points
and reserve the term "graphem" for user-perceived characters.
This goes in line with the GNU library which has the existing
practice to let wcwidth(3) return the value 1 for accents and
other combining code points as well as so-called (Unicode)
noncharacters.  And who would call wcwidth(3) on something that is
not to be drawn onto the screen directly afterwards.  And, of
course, which terminal will perform the composition of code points
written via STD I/O to characters on its own.
I think for quite a while it is up to the input methods to combine
into something precomposed in order to let POSIX programs finally
work with it.

--steffen




Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#17196: UTF-8 printf string formating  problem
Resent-From: Rich Felker <dalias@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Thu, 10 Apr 2014 07:57:02 +0000
Resent-Message-ID: <handler.17196.B17196.13971165845058 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 17196
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: Steffen Nurpmeso <sdaoden@HIDDEN>
Cc: 17196 <at> debbugs.gnu.org, Eric Blake <eblake@HIDDEN>, Bob Proulx <bob@HIDDEN>, Jan Novak <jn@HIDDEN>, Austin Group <austin-group-l@HIDDEN>, =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>
Received: via spool by 17196-submit <at> debbugs.gnu.org id=B17196.13971165845058
          (code B ref 17196); Thu, 10 Apr 2014 07:57:02 +0000
Received: (at 17196) by debbugs.gnu.org; 10 Apr 2014 07:56:24 +0000
Received: from localhost ([127.0.0.1]:39544 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WY9qQ-0001JU-UV
	for submit <at> debbugs.gnu.org; Thu, 10 Apr 2014 03:56:23 -0400
Received: from 216-12-86-13.cv.mvl.ntelos.net ([216.12.86.13]:44012
 helo=brightrain.aerifal.cx) by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <dalias@HIDDEN>) id 1WY9qN-0001JE-H0
 for 17196 <at> debbugs.gnu.org; Thu, 10 Apr 2014 03:56:20 -0400
Received: from dalias by brightrain.aerifal.cx with local (Exim 3.15 #2)
 id 1WY9qE-0005Ha-00; Thu, 10 Apr 2014 07:56:10 +0000
Date: Thu, 10 Apr 2014 03:56:10 -0400
Message-ID: <20140410075610.GO26358@HIDDEN>
References: <53408EFF.7050601@HIDDEN> <53412952.1040506@HIDDEN>
 <20140406182447.GA1381@HIDDEN>
 <5342A337.9000407@HIDDEN> <53431F2F.8060701@HIDDEN>
 <20140409134937.G1Sjvh4wKZUfnofJrM0R7RoW@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20140409134937.G1Sjvh4wKZUfnofJrM0R7RoW@HIDDEN>
User-Agent: Mutt/1.5.21 (2010-09-15)
From: Rich Felker <dalias@HIDDEN>
X-Spam-Score: 0.4 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.4 (/)

On Wed, Apr 09, 2014 at 02:49:37PM +0200, Steffen Nurpmeso wrote:
> Eric Blake <eblake@HIDDEN> wrote:
>  |>>   Dan Douglas wrote:
>  |>>> ksh93 already has this feature using the "L" modifier:
>  |>>> 
>  |>>> ksh -c "printf '%.3Ls\n' $'\u2605\u2605\u2605\u2605\u2605'"
>  |>>> ★★★
>  |>>
>  |>> At least there is prior art for it.
>  |> 
>  |> So we can count bytes, chars or cells (graphemes).
>  |> 
>  |> Thinking a bit more about it, I think shell level printf
>  |> should be dealing in text of the current encoding and counting cells.
>  |> In the edge case where you want to deal in bytes one can do:
>  |>   LC_ALL=C printf ...
>  |> 
>  |> I see that ksh behaves as I would expect and counts cells,
>  |> though requires the explicit %L enabler:
>  |>   $ ksh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
>  |>   á★★
>  |>   $ ksh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'"
>  |>   A★
>  |>   $ ksh -c "printf '%.3Ls\n' $'AA\u2605\u2605\u2605'"
>  |>   A
>  |> 
>  |> zsh seems to just count characters:
>  |>   $ zsh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
>  |>   á★
>  |>   $ zsh -c "printf '%.3s\n' $'a\u0301\u2605\u2605\u2605'"
>  |>   á★
>  |>   $ zsh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'"
>  |>   A★★
>  |> 
>  |> I see that dash gives invalid directive for any of %ls %Ls %S.
>  |> 
>  |> Pity there is no consensus here.
>  |> Personally I would go for:
>  |>   printf '%3s' 'blah'  # count cells
>  |>   printf '%3Ls' 'blah' # count chars
>  |>   LANG=C '%3Ls' 'blah' # count bytes
>  |>   LANG=C '%3s' 'blah'  # count bytes
>  |
>  |Hmm.  POSIX requires support for %ls (aka %S) according to byte counts,
>  |and currently states that %Ls is undefined.  But I would LOVE to have a
>  |standardized spelling for counting characters instead of bytes.  The
>  |extension %Ls looks like a good candidate for standardization, precisely
>  |because counting characters when printing a multibyte string is more
>  |useful than counting bytes (you do NOT want to end in the middle of a
>  |multibyte character), and because ksh offers it as existing practice.
>  |
>  |Your idea for counting "cells" (by which I'm assuming you mean one or
>  |more characters that all display within the same cell of the terminal,
>  |as if the end user saw only one grapheme), on the other hand, does not
>  |seem to have any precedence, and I would strongly object to having %s
>  |count by cells because %s already has a standardized (if unfortunate)
>  |meaning of counting by bytes.  Maybe yet another extension is warranted
>  |(perhaps %LLs?) as a new notion for counting by cells instead of
>  |characters, but it's harder to justify that without existing practice.
> 
> I see you are trying to invent the word character for code points
> and reserve the term "graphem" for user-perceived characters.
> This goes in line with the GNU library which has the existing
> practice to let wcwidth(3) return the value 1 for accents and
> other combining code points as well as so-called (Unicode)
> noncharacters.  And who would call wcwidth(3) on something that is
> not to be drawn onto the screen directly afterwards.  And, of
> course, which terminal will perform the composition of code points
> written via STD I/O to characters on its own.
> I think for quite a while it is up to the input methods to combine
> into something precomposed in order to let POSIX programs finally
> work with it.

Many languages do not have precomposed forms for all the character
sequences they need, and for some, it would not even be practical to
have precomposed forms, and would force the use of complex input
methods instead of simple keyboard maps.

Rich




Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#17196: UTF-8 printf string formating  problem
Resent-From: Steffen Nurpmeso <sdaoden@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Thu, 10 Apr 2014 16:17:04 +0000
Resent-Message-ID: <handler.17196.B17196.13971465933240 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 17196
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: Rich Felker <dalias@HIDDEN>
Cc: 17196 <at> debbugs.gnu.org, Eric Blake <eblake@HIDDEN>, Bob Proulx <bob@HIDDEN>, Jan Novak <jn@HIDDEN>, Austin Group <austin-group-l@HIDDEN>, =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>
Received: via spool by 17196-submit <at> debbugs.gnu.org id=B17196.13971465933240
          (code B ref 17196); Thu, 10 Apr 2014 16:17:04 +0000
Received: (at 17196) by debbugs.gnu.org; 10 Apr 2014 16:16:33 +0000
Received: from localhost ([127.0.0.1]:39942 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WYHeS-0000qA-6P
	for submit <at> debbugs.gnu.org; Thu, 10 Apr 2014 12:16:32 -0400
Received: from forward10l.mail.yandex.net ([84.201.143.143]:56157)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <sdaoden@HIDDEN>) id 1WYHeO-0000pt-BG
 for 17196 <at> debbugs.gnu.org; Thu, 10 Apr 2014 12:16:30 -0400
Received: from smtp4o.mail.yandex.net (smtp4o.mail.yandex.net [37.140.190.29])
 by forward10l.mail.yandex.net (Yandex) with ESMTP id 69B9EBA0CBD;
 Thu, 10 Apr 2014 20:16:21 +0400 (MSK)
Received: from smtp4o.mail.yandex.net (localhost [127.0.0.1])
 by smtp4o.mail.yandex.net (Yandex) with ESMTP id 9260123216B2;
 Thu, 10 Apr 2014 20:16:20 +0400 (MSK)
Received: from unknown (unknown [89.204.139.192])
 by smtp4o.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id r76UwoaVWs-GIC4GxRE; 
 Thu, 10 Apr 2014 20:16:19 +0400
 (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
 (Client certificate not present)
X-Yandex-Uniq: 8f680bf8-3e00-4234-9a76-8cb266ba010c
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.com; s=mail;
 t=1397146580; bh=/2u0mCxLCUU2rUNpki+IBTdyRqHu0/oZLifGzqhY8Uw=;
 h=Date:From:To:Cc:Subject:Message-ID:References:In-Reply-To:
 User-Agent:MIME-Version:Content-Type;
 b=bVCvIg9HuCudcBkgzpK3b/GVkA77j4sSkPZhCrjPHKqyP2QI7tzxBBk1vLIeaVLOY
 a738WypoLW5AAvhswwi8sgQasG2D7jxaRxcWmrgf/O0ErByQifzZ+WlUOJsLr7K9Ew
 rukIqnkwW0Se9MEtgLtjmdJ5jgZkXHypfa5U7Iho=
Authentication-Results: smtp4o.mail.yandex.net; dkim=pass header.i=@yandex.com
Date: Thu, 10 Apr 2014 18:16:24 +0200
From: Steffen Nurpmeso <sdaoden@HIDDEN>
Message-ID: <20140410171624.an/caJUtgdHJiK1DmeoKZPSP@HIDDEN>
References: <53408EFF.7050601@HIDDEN> <53412952.1040506@HIDDEN>
 <20140406182447.GA1381@HIDDEN>
 <5342A337.9000407@HIDDEN> <53431F2F.8060701@HIDDEN>
 <20140409134937.G1Sjvh4wKZUfnofJrM0R7RoW@HIDDEN>
 <20140410075610.GO26358@HIDDEN>
In-Reply-To: <20140410075610.GO26358@HIDDEN>
User-Agent: s-nail v14.6.4-1-ga39836e
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="=_01397146584=-hLlRJmGE22qxLp6/BPA3cw5+rq+yWU=_"
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

This is a multi-part message in MIME format.

--=_01397146584=-hLlRJmGE22qxLp6/BPA3cw5+rq+yWU=_
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Rich Felker <dalias@HIDDEN> wrote:
 |On Wed, Apr 09, 2014 at 02:49:37PM +0200, Steffen Nurpmeso wrote:
 |> Eric Blake <eblake@HIDDEN> wrote:
 |>|Hmm.  POSIX requires support for %ls (aka %S) according to byte counts,
 |>|and currently states that %Ls is undefined.  But I would LOVE to have a
 |>|standardized spelling for counting characters instead of bytes.  The
 |>|extension %Ls looks like a good candidate for standardization, precisel=
y
 |>|because counting characters when printing a multibyte string is more
 |>|useful than counting bytes (you do NOT want to end in the middle of a
 |>|multibyte character), and because ksh offers it as existing practice.
 |>|
 |>|Your idea for counting "cells" (by which I'm assuming you mean one or
 |>|more characters that all display within the same cell of the terminal,
 |>|as if the end user saw only one grapheme), on the other hand, does not
 |>|seem to have any precedence, and I would strongly object to having %s
 [.]
 |> I see you are trying to invent the word character for code points
 |> and reserve the term "graphem" for user-perceived characters.
 |> This goes in line with the GNU library which has the existing
 |> practice to let wcwidth(3) return the value 1 for accents and
 |> other combining code points as well as so-called (Unicode)
 |> noncharacters.  And who would call wcwidth(3) on something that is
 |> not to be drawn onto the screen directly afterwards.  And, of
 |> course, which terminal will perform the composition of code points
 |> written via STD I/O to characters on its own.
 |> I think for quite a while it is up to the input methods to combine
 |> into something precomposed in order to let POSIX programs finally
 |> work with it.
 |
 |Many languages do not have precomposed forms for all the character
 |sequences they need, and for some, it would not even be practical to
 |have precomposed forms, and would force the use of complex input
 |methods instead of simple keyboard maps.

And of course with UTF-8 decomposed forms of characters from an
immense number of languages can occur in at least theory, in,
e.g., a text file.
The german U+00F6 (LATIN SMALL LETTER U WITH DIAERESIS) could very
well be =C2=AB=C3=BC=C2=BB but also U+0076 U+0308 =C2=ABu =CC=88=C2=BB, dep=
endent on where it
came from.  And note that my vim(1) composed U+00F6 when i tried
to input the latter string automatically, i had to separate, enter
each, and join them together to get at =C2=ABu=C2=BB plus, actually non-,
combining diaeresis.  (In fact actually =C2=ABcombining with a space=C2=BB.=
)
Of course a wcwidth(3) of 1 for U+0308 is much better than 0 when
it really produces something visual.

Even better would nonetheless be the great picture with
a termios(4) IUTF8 flag, some extended xywidth(3) that returns
a tuple of {[EastAsianWidth indication,] is-combining,
width-if-non-combining} and best even some composition function.
I don't think that =C2=ABuser-perceived characters don't have any
precedence=C2=BB.  A whole lot of development in the past decade on the
winner side (that is, the other :) was exactly that -- making
software barrier-free.
If POSIX beams itself onto UTF-8 it should really consider to
offer a way to be able to act on what the user really deals with.
And that is, in the Unicode world -- and isn't that what the bug
report is about --, not necessarily a mbrlen(3)-division of bytes.

--steffen

--=_01397146584=-hLlRJmGE22qxLp6/BPA3cw5+rq+yWU=_
Content-Type: message/rfc822
Content-Disposition: inline
Content-Description: Original message content

Received: from mxfront3h.mail.yandex.net ([127.0.0.1])
	by mxfront3h.mail.yandex.net with LMTP id uF50cqbZ
	for <sdaoden@HIDDEN>; Thu, 10 Apr 2014 11:56:15 +0400
Received: from 216-12-86-13.cv.mvl.ntelos.net (216-12-86-13.cv.mvl.ntelos.net [216.12.86.13])
	by mxfront3h.mail.yandex.net (nwsmtp/Yandex) with SMTP id rYLYuCwMqF-uEAeEXKa;
	Thu, 10 Apr 2014 11:56:14 +0400
X-Yandex-Uniq: 655f0aa3-4efb-4152-ab26-2bb01fe7b98d
Received: from dalias by brightrain.aerifal.cx with local (Exim 3.15 #2)
	id 1WY9qE-0005Ha-00; Thu, 10 Apr 2014 07:56:10 +0000
Date: Thu, 10 Apr 2014 03:56:10 -0400
To: Steffen Nurpmeso <sdaoden@HIDDEN>
Cc: Eric Blake <eblake@HIDDEN>, 17196 <at> debbugs.gnu.org,
	Austin Group <austin-group-l@HIDDEN>,
	Bob Proulx <bob@HIDDEN>, Jan Novak <jn@HIDDEN>,
	=?utf-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>
Subject: Re: bug#17196: UTF-8 printf string formating  problem
Message-ID: <20140410075610.GO26358@HIDDEN>
References: <53408EFF.7050601@HIDDEN>
 <53412952.1040506@HIDDEN>
 <20140406182447.GA1381@HIDDEN>
 <5342A337.9000407@HIDDEN>
 <53431F2F.8060701@HIDDEN>
 <20140409134937.G1Sjvh4wKZUfnofJrM0R7RoW@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20140409134937.G1Sjvh4wKZUfnofJrM0R7RoW@HIDDEN>
User-Agent: Mutt/1.5.21 (2010-09-15)
From: Rich Felker <dalias@HIDDEN>
Return-Path: dalias@HIDDEN
X-Yandex-Forward: 1431d05c8f532bcc8fea61a74badcb33
Status: RO

On Wed, Apr 09, 2014 at 02:49:37PM +0200, Steffen Nurpmeso wrote:
> Eric Blake <eblake@HIDDEN> wrote:
>  |>>   Dan Douglas wrote:
>  |>>> ksh93 already has this feature using the "L" modifier:
>  |>>> 
>  |>>> ksh -c "printf '%.3Ls\n' $'\u2605\u2605\u2605\u2605\u2605'"
>  |>>> ★★★
>  |>>
>  |>> At least there is prior art for it.
>  |> 
>  |> So we can count bytes, chars or cells (graphemes).
>  |> 
>  |> Thinking a bit more about it, I think shell level printf
>  |> should be dealing in text of the current encoding and counting cells.
>  |> In the edge case where you want to deal in bytes one can do:
>  |>   LC_ALL=C printf ...
>  |> 
>  |> I see that ksh behaves as I would expect and counts cells,
>  |> though requires the explicit %L enabler:
>  |>   $ ksh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
>  |>   á★★
>  |>   $ ksh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'"
>  |>   A★
>  |>   $ ksh -c "printf '%.3Ls\n' $'AA\u2605\u2605\u2605'"
>  |>   A
>  |> 
>  |> zsh seems to just count characters:
>  |>   $ zsh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
>  |>   á★
>  |>   $ zsh -c "printf '%.3s\n' $'a\u0301\u2605\u2605\u2605'"
>  |>   á★
>  |>   $ zsh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'"
>  |>   A★★
>  |> 
>  |> I see that dash gives invalid directive for any of %ls %Ls %S.
>  |> 
>  |> Pity there is no consensus here.
>  |> Personally I would go for:
>  |>   printf '%3s' 'blah'  # count cells
>  |>   printf '%3Ls' 'blah' # count chars
>  |>   LANG=C '%3Ls' 'blah' # count bytes
>  |>   LANG=C '%3s' 'blah'  # count bytes
>  |
>  |Hmm.  POSIX requires support for %ls (aka %S) according to byte counts,
>  |and currently states that %Ls is undefined.  But I would LOVE to have a
>  |standardized spelling for counting characters instead of bytes.  The
>  |extension %Ls looks like a good candidate for standardization, precisely
>  |because counting characters when printing a multibyte string is more
>  |useful than counting bytes (you do NOT want to end in the middle of a
>  |multibyte character), and because ksh offers it as existing practice.
>  |
>  |Your idea for counting "cells" (by which I'm assuming you mean one or
>  |more characters that all display within the same cell of the terminal,
>  |as if the end user saw only one grapheme), on the other hand, does not
>  |seem to have any precedence, and I would strongly object to having %s
>  |count by cells because %s already has a standardized (if unfortunate)
>  |meaning of counting by bytes.  Maybe yet another extension is warranted
>  |(perhaps %LLs?) as a new notion for counting by cells instead of
>  |characters, but it's harder to justify that without existing practice.
> 
> I see you are trying to invent the word character for code points
> and reserve the term "graphem" for user-perceived characters.
> This goes in line with the GNU library which has the existing
> practice to let wcwidth(3) return the value 1 for accents and
> other combining code points as well as so-called (Unicode)
> noncharacters.  And who would call wcwidth(3) on something that is
> not to be drawn onto the screen directly afterwards.  And, of
> course, which terminal will perform the composition of code points
> written via STD I/O to characters on its own.
> I think for quite a while it is up to the input methods to combine
> into something precomposed in order to let POSIX programs finally
> work with it.

Many languages do not have precomposed forms for all the character
sequences they need, and for some, it would not even be practical to
have precomposed forms, and would force the use of complex input
methods instead of simple keyboard maps.

Rich


--=_01397146584=-hLlRJmGE22qxLp6/BPA3cw5+rq+yWU=_--




Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#17196: UTF-8 printf string formating  problem
Resent-From: Chet Ramey <chet.ramey@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Thu, 10 Apr 2014 18:12:01 +0000
Resent-Message-ID: <handler.17196.B17196.139715346711156 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 17196
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: Steffen Nurpmeso <sdaoden@HIDDEN>, Rich Felker <dalias@HIDDEN>
Cc: 17196 <at> debbugs.gnu.org, chet.ramey@HIDDEN, =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>, Bob Proulx <bob@HIDDEN>, Jan Novak <jn@HIDDEN>, Austin Group <austin-group-l@HIDDEN>, Eric Blake <eblake@HIDDEN>
Reply-To: chet.ramey@HIDDEN
Received: via spool by 17196-submit <at> debbugs.gnu.org id=B17196.139715346711156
          (code B ref 17196); Thu, 10 Apr 2014 18:12:01 +0000
Received: (at 17196) by debbugs.gnu.org; 10 Apr 2014 18:11:07 +0000
Received: from localhost ([127.0.0.1]:44791 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WYJRJ-0002to-Fc
	for submit <at> debbugs.gnu.org; Thu, 10 Apr 2014 14:11:06 -0400
Received: from mpv2.tis.cwru.edu ([129.22.105.37]:11566)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <chet.ramey@HIDDEN>) id 1WYJRB-0002t1-3t
 for 17196 <at> debbugs.gnu.org; Thu, 10 Apr 2014 14:11:02 -0400
Received: from mpv6.tis.CWRU.Edu (EHLO mpv6.cwru.edu) ([129.22.104.221])
 by mpv2.tis.cwru.edu (MOS 4.3.5-GA FastPath queued)
 with ESMTP id BDG14791; Thu, 10 Apr 2014 14:10:39 -0400 (EDT)
Received: from caleb.INS.CWRU.Edu (EHLO caleb.ins.cwru.edu) ([129.22.8.211])
 by mpv6.cwru.edu (MOS 4.3.5-GA FastPath queued)
 with ESMTP id AJH10974 (AUTH cpr);
 Thu, 10 Apr 2014 14:10:29 -0400 (EDT)
Message-ID: <5346DE92.9020004@HIDDEN>
Date: Thu, 10 Apr 2014 14:10:26 -0400
From: Chet Ramey <chet.ramey@HIDDEN>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9;
 rv:24.0) Gecko/20100101 Thunderbird/24.4.0
MIME-Version: 1.0
References: <53408EFF.7050601@HIDDEN> <53412952.1040506@HIDDEN>
 <20140406182447.GA1381@HIDDEN> <5342A337.9000407@HIDDEN>
 <53431F2F.8060701@HIDDEN>
 <20140409134937.G1Sjvh4wKZUfnofJrM0R7RoW@HIDDEN>
 <20140410075610.GO26358@HIDDEN>
 <20140410171624.an/caJUtgdHJiK1DmeoKZPSP@HIDDEN>
In-Reply-To: <20140410171624.an/caJUtgdHJiK1DmeoKZPSP@HIDDEN>
X-Enigmail-Version: 1.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Junkmail-Status: score=10/50, host=mpv6.cwru.edu
X-Junkmail-Whitelist: YES (by domain whitelist at mpv2.tis.cwru.edu)
X-Spam-Score: -0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.0 (/)

On 4/10/14, 12:16 PM, Steffen Nurpmeso wrote:

> Even better would nonetheless be the great picture with
> a termios(4) IUTF8 flag, some extended xywidth(3) that returns
> a tuple of {[EastAsianWidth indication,] is-combining,
> width-if-non-combining} and best even some composition function.

But we have always been at war with EastAsia!

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU    chet@HIDDEN    http://cnswww.cns.cwru.edu/~chet/




Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#17196: UTF-8 printf string formating  problem
Resent-From: Steffen Nurpmeso <sdaoden@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Fri, 11 Apr 2014 10:17:01 +0000
Resent-Message-ID: <handler.17196.B17196.139721139414326 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 17196
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: chet.ramey@HIDDEN
Cc: 17196 <at> debbugs.gnu.org, =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>, Rich Felker <dalias@HIDDEN>, Bob Proulx <bob@HIDDEN>, Jan Novak <jn@HIDDEN>, Austin Group <austin-group-l@HIDDEN>, Eric Blake <eblake@HIDDEN>
Received: via spool by 17196-submit <at> debbugs.gnu.org id=B17196.139721139414326
          (code B ref 17196); Fri, 11 Apr 2014 10:17:01 +0000
Received: (at 17196) by debbugs.gnu.org; 11 Apr 2014 10:16:34 +0000
Received: from localhost ([127.0.0.1]:45202 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WYYVd-0003iz-FJ
	for submit <at> debbugs.gnu.org; Fri, 11 Apr 2014 06:16:33 -0400
Received: from forward7l.mail.yandex.net ([84.201.143.140]:36493)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <sdaoden@HIDDEN>) id 1WYYVZ-0003iN-Cq
 for 17196 <at> debbugs.gnu.org; Fri, 11 Apr 2014 06:16:31 -0400
Received: from smtp3h.mail.yandex.net (smtp3h.mail.yandex.net [84.201.186.20])
 by forward7l.mail.yandex.net (Yandex) with ESMTP id 9B562BC0CD3;
 Fri, 11 Apr 2014 14:16:21 +0400 (MSK)
Received: from smtp3h.mail.yandex.net (localhost [127.0.0.1])
 by smtp3h.mail.yandex.net (Yandex) with ESMTP id 6A4581B42685;
 Fri, 11 Apr 2014 14:16:20 +0400 (MSK)
Received: from unknown (unknown [89.204.130.136])
 by smtp3h.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id aicjDkOG4s-GI5WWnkl; 
 Fri, 11 Apr 2014 14:16:19 +0400
 (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
 (Client certificate not present)
X-Yandex-Uniq: a0e620ee-628c-4bf3-a359-9abdf75c88a8
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.com; s=mail;
 t=1397211380; bh=nz2MoLGIO2JYvfrGDO7XeE09n04cOIPAGjaZG4ykzqc=;
 h=Date:From:To:Cc:Subject:Message-ID:References:In-Reply-To:
 User-Agent:MIME-Version:Content-Type:Content-Transfer-Encoding;
 b=Ti5Ca3FysnnwnBuY/Fd5aCWPAYU/inHj4IvhDMay0u9OuDmINGRbmApwNu+7Yblsl
 Jz6/mC7WuDqHXD6S4i9nZQy/Mqn8+2p1V8uVfZpvBiYQweQ/M5YGGR//LigMVwp5UY
 OUd4pA4JjRioqaUir/EATe5BUbXp/ToDisCRPjTk=
Authentication-Results: smtp3h.mail.yandex.net; dkim=pass header.i=@yandex.com
Date: Fri, 11 Apr 2014 12:16:15 +0200
From: Steffen Nurpmeso <sdaoden@HIDDEN>
Message-ID: <20140411111615.ho9kmtrCAOTLmdWnrbsIp1DI@HIDDEN>
References: <53408EFF.7050601@HIDDEN> <53412952.1040506@HIDDEN>
 <20140406182447.GA1381@HIDDEN>
 <5342A337.9000407@HIDDEN> <53431F2F.8060701@HIDDEN>
 <20140409134937.G1Sjvh4wKZUfnofJrM0R7RoW@HIDDEN>
 <20140410075610.GO26358@HIDDEN>
 <20140410171624.an/caJUtgdHJiK1DmeoKZPSP@HIDDEN>
 <5346DE92.9020004@HIDDEN>
In-Reply-To: <5346DE92.9020004@HIDDEN>
User-Agent: s-nail v14.6.4-1-ga39836e
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

Hello,

Chet Ramey <chet.ramey@HIDDEN> wrote:
 |On 4/10/14, 12:16 PM, Steffen Nurpmeso wrote:
 |
 |> Even better would nonetheless be the great picture with
 |> a termios(4) IUTF8 flag, some extended xywidth(3) that returns
 |> a tuple of {[EastAsianWidth indication,] is-combining,
 |> width-if-non-combining} and best even some composition function.
 |
 |But we have always been at war with EastAsia!

I see you really would love to get a hand from POSIX too:

  ?0[steffen@sherwood bash-4.3]$ grep -r UNICODE_COMB .                    =
                                                                        =20
  ./lib/readline/display.c:      if (t > 0 && UNICODE_COMBINING_CHAR (wc) &=
& WCWIDTH (wc) =3D=3D 0)
  ./lib/readline/rlmbutil.h:#define UNICODE_COMBINING_CHAR(x) ((x) >=3D 768=
 && (x) <=3D 879)
  ./lib/readline/rlmbutil.h:#  define WCWIDTH(wc) ((_rl_utf8locale && UNICO=
DE_COMBINING_CHAR(wc)) ? 0 : wcwidth(wc))

And sorry for not making this clear for those who never dealt with
the problem (which is probably not uncommon for filesystem or
other kernel hackers): `EastAsianWidth' refers to a property of
Unicode and ISO 10646:

  # EastAsianWidth-6.3.0.txt
  # Date: 2013-02-05, 20:09:00 GMT [KW, LI]
  #
  # East Asian Width Properties
  #
  # This file is an informative contributory data file in the
  # Unicode Character Database.
  #
  # Copyright (c) 1991-2013 Unicode, Inc.
  # For terms of use, see http://www.unicode.org/terms_of_use.html

--steffen

...
To be honest i must admit i first was pissed, so let me append the
original first part of this message, please:

  and so the landslide had brought it down.
  But i would quote Paul Vixie, who stated in a todays' message

    gentlemen and ladies, we have met the enemy, and they are our
    egos.

    vixie

  From my point of view it's the matter of culture and philosophy
  (including religion) how to deal with that very problem.
  And i can assure you that Jehovas Witnesses, which visit me
  regulary for some years now, like to drink a bit of my Buddhistic
  tea.

Paul Vixie is correct.
I am stupid.
With greetings from someone who will undergo his 42nd birthday soon




Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#17196: UTF-8 printf string formating  problem
Resent-From: Chet Ramey <chet.ramey@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Fri, 11 Apr 2014 12:27:02 +0000
Resent-Message-ID: <handler.17196.B17196.13972191647410 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 17196
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: Steffen Nurpmeso <sdaoden@HIDDEN>
Cc: 17196 <at> debbugs.gnu.org, chet.ramey@HIDDEN, =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>, Rich Felker <dalias@HIDDEN>, Bob Proulx <bob@HIDDEN>, Jan Novak <jn@HIDDEN>, Austin Group <austin-group-l@HIDDEN>, Eric Blake <eblake@HIDDEN>
Reply-To: chet.ramey@HIDDEN
Received: via spool by 17196-submit <at> debbugs.gnu.org id=B17196.13972191647410
          (code B ref 17196); Fri, 11 Apr 2014 12:27:02 +0000
Received: (at 17196) by debbugs.gnu.org; 11 Apr 2014 12:26:04 +0000
Received: from localhost ([127.0.0.1]:45241 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WYaWw-0001vM-Dy
	for submit <at> debbugs.gnu.org; Fri, 11 Apr 2014 08:26:04 -0400
Received: from mpv1.tis.cwru.edu ([129.22.105.36]:19616)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <chet.ramey@HIDDEN>) id 1WYaWp-0001uc-2I
 for 17196 <at> debbugs.gnu.org; Fri, 11 Apr 2014 08:25:59 -0400
Received: from mpv5.tis.CWRU.Edu (EHLO mpv5.cwru.edu) ([129.22.105.51])
 by mpv1.tis.cwru.edu (MOS 4.3.5-GA FastPath queued)
 with ESMTP id BFC56788; Fri, 11 Apr 2014 08:25:38 -0400 (EDT)
Received: from caleb.INS.CWRU.Edu (EHLO caleb.ins.cwru.edu) ([129.22.8.211])
 by mpv5.cwru.edu (MOS 4.3.5-GA FastPath queued)
 with ESMTP id ATQ66868 (AUTH cpr);
 Fri, 11 Apr 2014 08:25:22 -0400 (EDT)
Message-ID: <5347DF27.50702@HIDDEN>
Date: Fri, 11 Apr 2014 08:25:11 -0400
From: Chet Ramey <chet.ramey@HIDDEN>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9;
 rv:24.0) Gecko/20100101 Thunderbird/24.4.0
MIME-Version: 1.0
References: <53408EFF.7050601@HIDDEN> <53412952.1040506@HIDDEN>
 <20140406182447.GA1381@HIDDEN> <5342A337.9000407@HIDDEN>
 <53431F2F.8060701@HIDDEN>
 <20140409134937.G1Sjvh4wKZUfnofJrM0R7RoW@HIDDEN>
 <20140410075610.GO26358@HIDDEN>
 <20140410171624.an/caJUtgdHJiK1DmeoKZPSP@HIDDEN>
 <5346DE92.9020004@HIDDEN>
 <20140411111615.ho9kmtrCAOTLmdWnrbsIp1DI@HIDDEN>
In-Reply-To: <20140411111615.ho9kmtrCAOTLmdWnrbsIp1DI@HIDDEN>
X-Enigmail-Version: 1.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Junkmail-Status: score=10/50, host=mpv5.cwru.edu
X-Junkmail-Whitelist: YES (by domain whitelist at mpv1.tis.cwru.edu)
X-Spam-Score: -0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.0 (/)

On 4/11/14, 6:16 AM, Steffen Nurpmeso wrote:
> Hello,
> 
> Chet Ramey <chet.ramey@HIDDEN> wrote:
>  |On 4/10/14, 12:16 PM, Steffen Nurpmeso wrote:
>  |
>  |> Even better would nonetheless be the great picture with
>  |> a termios(4) IUTF8 flag, some extended xywidth(3) that returns
>  |> a tuple of {[EastAsianWidth indication,] is-combining,
>  |> width-if-non-combining} and best even some composition function.
>  |
>  |But we have always been at war with EastAsia!
> 
> I see you really would love to get a hand from POSIX too:

I'm sorry, I realize that was rather obscure.  It's from "1984", by George
Orwell.  It's a central theme to the book.  The quote was an attempt to
inject levity into the discussion.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU    chet@HIDDEN    http://cnswww.cns.cwru.edu/~chet/




Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#17196: UTF-8 printf string formating  problem
Resent-From: Steffen Nurpmeso <sdaoden@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Fri, 11 Apr 2014 13:42:01 +0000
Resent-Message-ID: <handler.17196.B17196.139722366819913 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 17196
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: chet.ramey@HIDDEN
Cc: 17196 <at> debbugs.gnu.org, =?UTF-8?Q?P=C3=A1draig?= Brady <P@HIDDEN>, Rich Felker <dalias@HIDDEN>, Bob Proulx <bob@HIDDEN>, Jan Novak <jn@HIDDEN>, Austin Group <austin-group-l@HIDDEN>, Eric Blake <eblake@HIDDEN>
Received: via spool by 17196-submit <at> debbugs.gnu.org id=B17196.139722366819913
          (code B ref 17196); Fri, 11 Apr 2014 13:42:01 +0000
Received: (at 17196) by debbugs.gnu.org; 11 Apr 2014 13:41:08 +0000
Received: from localhost ([127.0.0.1]:45289 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WYbha-0005B4-Nf
	for submit <at> debbugs.gnu.org; Fri, 11 Apr 2014 09:41:07 -0400
Received: from forward7l.mail.yandex.net ([84.201.143.140]:42991)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <sdaoden@HIDDEN>) id 1WYbhO-00059w-Gk
 for 17196 <at> debbugs.gnu.org; Fri, 11 Apr 2014 09:40:56 -0400
Received: from smtp4h.mail.yandex.net (smtp4h.mail.yandex.net [84.201.186.21])
 by forward7l.mail.yandex.net (Yandex) with ESMTP id 08AE8BC121D;
 Fri, 11 Apr 2014 17:40:46 +0400 (MSK)
Received: from smtp4h.mail.yandex.net (localhost [127.0.0.1])
 by smtp4h.mail.yandex.net (Yandex) with ESMTP id B3F682C372C;
 Fri, 11 Apr 2014 17:40:45 +0400 (MSK)
Received: from unknown (unknown [89.204.130.136])
 by smtp4h.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id 9aGtRyfnHc-ehhehFk0; 
 Fri, 11 Apr 2014 17:40:44 +0400
 (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
 (Client certificate not present)
X-Yandex-Uniq: fefe3f1e-da47-4b6e-9f5e-ec97a4eeadf9
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.com; s=mail;
 t=1397223645; bh=iKgWkZ1dAoJGcmbN1EZ9S055yr2JEhe3s9NOxejS4+w=;
 h=Date:From:To:Cc:Subject:Message-ID:References:In-Reply-To:
 User-Agent:MIME-Version:Content-Type:Content-Transfer-Encoding;
 b=QxxGnYULunGfb82f84Eqj3sRYGuu9/mlfCwoVCTo1DT2909HLr6mHoncI8ahkR9tp
 KkiM0cm2yyw2dEpdj7VboS0bMDd0L6uF6rRWocOTYmmTve5S8OOmr5pBMnts3PJ0+p
 enBLx5Nd7yyi7Hyhv+gEx2woEGJXU9NtJNekZLq4=
Authentication-Results: smtp4h.mail.yandex.net; dkim=pass header.i=@yandex.com
Date: Fri, 11 Apr 2014 15:40:41 +0200
From: Steffen Nurpmeso <sdaoden@HIDDEN>
Message-ID: <20140411144041.KmpeitNBK3J2xP1tlaqPyJ+P@HIDDEN>
References: <53408EFF.7050601@HIDDEN> <53412952.1040506@HIDDEN>
 <20140406182447.GA1381@HIDDEN>
 <5342A337.9000407@HIDDEN> <53431F2F.8060701@HIDDEN>
 <20140409134937.G1Sjvh4wKZUfnofJrM0R7RoW@HIDDEN>
 <20140410075610.GO26358@HIDDEN>
 <20140410171624.an/caJUtgdHJiK1DmeoKZPSP@HIDDEN>
 <5346DE92.9020004@HIDDEN>
 <20140411111615.ho9kmtrCAOTLmdWnrbsIp1DI@HIDDEN>
 <5347DF27.50702@HIDDEN>
In-Reply-To: <5347DF27.50702@HIDDEN>
User-Agent: s-nail v14.6.4-1-ga39836e
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

Chet Ramey <chet.ramey@HIDDEN> wrote:
 |On 4/11/14, 6:16 AM, Steffen Nurpmeso wrote:
 |> Hello,
 |>=20
 |> Chet Ramey <chet.ramey@HIDDEN> wrote:
 |>|On 4/10/14, 12:16 PM, Steffen Nurpmeso wrote:
 |>|
 |>|> Even better would nonetheless be the great picture with
 |>|> a termios(4) IUTF8 flag, some extended xywidth(3) that returns
 |>|> a tuple of {[EastAsianWidth indication,] is-combining,
 |>|> width-if-non-combining} and best even some composition function.
 |>|
 |>|But we have always been at war with EastAsia!
 |>=20
 |> I see you really would love to get a hand from POSIX too:
 |
 |I'm sorry, I realize that was rather obscure.  It's from "1984", by Georg=
e
 |Orwell.  It's a central theme to the book.  The quote was an attempt to

oh, ah, yes.  So.. i got it right without getting it right.

Interestingly, yesterday started a retrospective work on Walter
Benjamin (<http://www.eingedenken.de/enter.html> --
"rememberance"): an artist (Christoph Korn) walked hist last trip
from Banyuls-sur-Mer (France) to Portbou (Spain; where he
committed suicide due to the impossibility to reach the U.S.),
following a fixated time frame (monotonic tick, so to say) after
which he spoke thesis of Benjamin (like, e.g., "There is no
document of civilization which is not at the same time a document
of barbarism."), followed by holding in and taking a (steady cam)
video of the recent leg.  Association with Paul Klees "Angelus
Novus" is desired (from both parties).

 |inject levity into the discussion.

That was easy.

--steffen




Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#17196: UTF-8 printf string formating  problem
Resent-From: Leslie S Satenstein <lsatenstein@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Fri, 09 May 2014 02:17:02 +0000
Resent-Message-ID: <handler.17196.B17196.139960181515375 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 17196
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: Jan Novak <jn@HIDDEN>
Cc: "17196 <at> debbugs.gnu.org" <17196 <at> debbugs.gnu.org>
Reply-To: Leslie S Satenstein <lsatenstein@HIDDEN>
Received: via spool by 17196-submit <at> debbugs.gnu.org id=B17196.139960181515375
          (code B ref 17196); Fri, 09 May 2014 02:17:02 +0000
Received: (at 17196) by debbugs.gnu.org; 9 May 2014 02:16:55 +0000
Received: from localhost ([127.0.0.1]:56448 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WiaMo-0003zq-NU
	for submit <at> debbugs.gnu.org; Thu, 08 May 2014 22:16:55 -0400
Received: from nm10-vm0.bullet.mail.bf1.yahoo.com ([98.139.213.147]:21407)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <lsatenstein@HIDDEN>) id 1WiaMk-0003zX-Sj
 for 17196 <at> debbugs.gnu.org; Thu, 08 May 2014 22:16:52 -0400
Received: from [98.139.212.153] by nm10.bullet.mail.bf1.yahoo.com with NNFMP;
 09 May 2014 02:16:45 -0000
Received: from [98.139.212.238] by tm10.bullet.mail.bf1.yahoo.com with NNFMP;
 09 May 2014 02:16:45 -0000
Received: from [127.0.0.1] by omp1047.mail.bf1.yahoo.com with NNFMP;
 09 May 2014 02:16:45 -0000
X-Yahoo-Newman-Property: ymail-3
X-Yahoo-Newman-Id: 366228.35034.bm@HIDDEN
Received: (qmail 10977 invoked by uid 60001); 9 May 2014 02:16:45 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024;
 t=1399601805; bh=s2lLlN9FNGWVV+Ys4HgZtPKK5Yo/Je4MtFyMdDot+Cg=;
 h=References:Message-ID:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type;
 b=iUTBJytk3qfsoXNT1/ZssEUkHrFaV8vbdaD3BrOWlWSEaZcdIfWkvKw33wlp6LYbjB6PjUFZjJaxqcp+515qfQd2QIG07mPjaMihbKdq23Rdw/rJTd3xyHkYBhR1b/3Pv+QzQPifA60WRVr9BMOTQYAcDd5E9MYSn6+PLDaLPaQ=
X-YMail-OSG: kyO7YnUVM1lmX3HJKnrvaCeLXs.fewIl6ziBVKZGmLuxukr
 RZA5_MvUG4pBIByhIg_C0of8q8uVZCJJRsleHLSvMLYZ5702uNu9rfc.C9Ju
 ETVQK5NKkGjwFWi.o4uhyNEJtILw985MGWW3gdgjbMdLJQUJvrIuzCQdf2o0
 eygarJdIkhwccj_bRLQRzO40nj_NTKqM72f1naaaC0d.9GKOANyipWBwwmxE
 _aRwqxoH8wJZILB6suB08ILFQlU.MiU105or8kn9ctnRNup5Q6k06MzeMb8P
 _sMSWdM11vYHmQxxXjU_f_q6FGWxBQYDijzTc97bYX2gMiIeJvVCRwTyr7Fz
 MZJgFZVM7RY8M6YAT7QUtMbfDj7P42d_OGTk2e.YOEJjsl0NR4zyCNqVgIyz
 s8D3sQ__BaExDXnXflngUJsAiTP1lWfJLt51ITcYU1LjIBsO2jmgFoPcgbdL
 nZY1M54Hi7H3dSfT6q17iGrMsNw4ZXRxJojCGHb2lqc.RnWDGwlgYw1CjC6j
 FKBGtJdmt4HODujXKiG1nVx5yvlH51gVqqCqIGIzL9CSdFg--
Received: from [70.49.120.43] by web142606.mail.bf1.yahoo.com via HTTP;
 Thu, 08 May 2014 19:16:45 PDT
X-Rocket-MIMEInfo: 002.001,
 UGVyaGFwcyBwcmludGYoKSBuZWVkcyBzb21lIHdpZGUgY2hhcmFjdGVyIGV4dGVuc2lvbnMgdmlhICVuZXcgY2hhcmFjdGVycwoKwqAKUmVnYXJkcyAKCsKgTGVzbGllCgpNci4gTGVzbGllIFNhdGVuc3RlaW4KU0VOVCBGUk9NIE1ZIE9QRU4gU09VUkNFIExJTlVYIFNZU1RFTS4KCgoKCj5fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwo.IEZyb206IFDDoWRyYWlnIEJyYWR5IDxQQGRyYWlnQnJhZHkuY29tPgo.VG86IEphbiBOb3ZhayA8am5AdHVyYm8uc2s.IAo.Q2M6IDE3MTk2QGRlYmJ1Z3MuZ24BMAEBAQE-
X-Mailer: YahooMailWebService/0.8.188.663
References: <53408EFF.7050601@HIDDEN> <53412952.1040506@HIDDEN>
Message-ID: <1399601805.73330.YahooMailNeo@HIDDEN>
Date: Thu, 8 May 2014 19:16:45 -0700 (PDT)
From: Leslie S Satenstein <lsatenstein@HIDDEN>
In-Reply-To: <53412952.1040506@HIDDEN>
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="562241088-351124307-1399601805=:73330"
X-Spam-Score: -0.6 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.6 (/)

--562241088-351124307-1399601805=:73330
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

Perhaps printf() needs some wide character extensions via %new characters=
=0A=0A=A0=0ARegards =0A=0A=A0Leslie=0A=0AMr. Leslie Satenstein=0ASENT FROM =
MY OPEN SOURCE LINUX SYSTEM.=0A=0A=0A=0A=0A>_______________________________=
_=0A> From: P=E1draig Brady <P@HIDDEN>=0A>To: Jan Novak <jn@HIDDEN=
k> =0A>Cc: 17196 <at> debbugs.gnu.org =0A>Sent: Sunday, April 6, 2014 6:15 AM=0A=
>Subject: bug#17196: UTF-8 printf string formating  problem=0A> =0A>=0A>On =
04/06/2014 12:17 AM, Jan Novak wrote:=0A>> Hello,=0A>> =0A>> printf string =
format counts bytes instead of chars, which leads to broken output ...=0A>>=
 (the same problem occurs with bash built in printf)=0A>> =0A>> =0A>> just =
try this:=0A>> =0A>> $ echo $LANG=0A>> us_US.UTF-8=0A>> =0A>> =0A>> $ print=
f "|%3s|\n" "a"=0A>> |=A0 a|=0A>> =0A>> $ printf "|%3s|\n" "=E1"=A0 =A0  (c=
har is a-acute)=0A>> | =E1|=0A>> =0A>> expected output:=0A>> |=A0 =E1|=0A>>=
 =0A>> Is there some easy solution ?=0A>> =0A>> TIA for the answer=0A>=0A>Y=
es printf follows the C standard which only considers bytes.=0A>awk does re=
spect characters in width specifiers though:=0A>=0A>=A0 $ awk 'BEGIN{printf=
 "|%3s|\n", "=E1"}'=0A>=A0 |=A0 =E1|=0A>=0A>I don't think we'd be able to c=
hange the current operation of printf=0A>due to backwards compat reasons? T=
hough we might be able to somehow leverage=0A>the existing multibyte charac=
ter aware alignment/truncation code in:=0A>http://git.sv.gnu.org/gitweb/?p=
=3Dcoreutils.git;a=3Dblob;f=3Dgl/lib/mbsalign.c;hb=3DHEAD=0A>=0A>thanks,=0A=
>P=E1draig.=0A>=0A>=0A>=0A>=0A>=0A>
--562241088-351124307-1399601805=:73330
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

<html><body><div style=3D"color:#000; background-color:#fff; font-family:He=
lveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;fo=
nt-size:14pt"><div><span>Perhaps printf() needs some wide character extensi=
ons via %new characters<br></span></div><div>&nbsp;</div><div><div><div><di=
v><div><div><div><span style=3D"" lang=3D"FR-CA">Regards</span>  <div><b><f=
ont size=3D"2"><br></font><font size=3D"2">&nbsp;Leslie</font><br></b></div=
> <div><font color=3D"green"><b><font size=3D"1">Mr. Leslie Satenstein</fon=
t></b></font><font style=3D"color:rgb(191, 0, 95);" color=3D"green" size=3D=
"1"><span style=3D"font-weight:bold;"></span></font><br></div><font color=
=3D"green" size=3D"2"><b>SENT FROM MY OPEN SOURCE LINUX SYSTEM.</b><br></fo=
nt><br><font face=3D"lucida console, sans-serif" size=3D"1"><b><font color=
=3D"black"><span style=3D"font-weight:bold;font-size:13.5pt;color:black;"><=
/span></font></b></font></div></div></div></div></div></div></div><div><br>=
</div><blockquote
 style=3D"border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; margin=
-top: 5px; padding-left: 5px;">  <div style=3D"font-family: HelveticaNeue, =
Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif; font-size: 14p=
t;"> <div style=3D"font-family: HelveticaNeue, Helvetica Neue, Helvetica, A=
rial, Lucida Grande, sans-serif; font-size: 12pt;"> <div dir=3D"ltr"> <hr s=
ize=3D"1">  <font face=3D"Arial" size=3D"2"> <b><span style=3D"font-weight:=
bold;">From:</span></b> P=E1draig Brady &lt;P@HIDDEN&gt;<br> <b><sp=
an style=3D"font-weight: bold;">To:</span></b> Jan Novak &lt;jn@HIDDEN&gt=
; <br><b><span style=3D"font-weight: bold;">Cc:</span></b> 17196@HIDDEN=
u.org <br> <b><span style=3D"font-weight: bold;">Sent:</span></b> Sunday, A=
pril 6, 2014 6:15 AM<br> <b><span style=3D"font-weight: bold;">Subject:</sp=
an></b> bug#17196: UTF-8 printf string formating  problem<br> </font> </div=
> <div class=3D"y_msg_container"><br>On 04/06/2014 12:17 AM, Jan Novak wrot=
e:<br>&gt;
 Hello,<br>&gt; <br>&gt; printf string format counts bytes instead of chars=
, which leads to broken output ...<br>&gt; (the same problem occurs with ba=
sh built in printf)<br>&gt; <br>&gt; <br>&gt; just try this:<br>&gt; <br>&g=
t; $ echo $LANG<br>&gt; us_US.UTF-8<br>&gt; <br>&gt; <br>&gt; $ printf "|%3=
s|\n" "a"<br>&gt; |&nbsp; a|<br>&gt; <br>&gt; $ printf "|%3s|\n" "=E1"&nbsp=
; &nbsp;  (char is a-acute)<br>&gt; | =E1|<br>&gt; <br>&gt; expected output=
:<br>&gt; |&nbsp; =E1|<br>&gt; <br>&gt; Is there some easy solution ?<br>&g=
t; <br>&gt; TIA for the answer<br><br>Yes printf follows the C standard whi=
ch only considers bytes.<br>awk does respect characters in width specifiers=
 though:<br><br>&nbsp; $ awk 'BEGIN{printf "|%3s|\n", "=E1"}'<br>&nbsp; |&n=
bsp; =E1|<br><br>I don't think we'd be able to change the current operation=
 of printf<br>due to backwards compat reasons? Though we might be able to s=
omehow leverage<br>the existing multibyte character aware
 alignment/truncation code in:<br><a href=3D"http://git.sv.gnu.org/gitweb/?=
p=3Dcoreutils.git;a=3Dblob;f=3Dgl/lib/mbsalign.c;hb=3DHEAD" target=3D"_blan=
k">http://git.sv.gnu.org/gitweb/?p=3Dcoreutils.git;a=3Dblob;f=3Dgl/lib/mbsa=
lign.c;hb=3DHEAD</a><br><br>thanks,<br>P=E1draig.<br><br><br><br><br><br></=
div> </div> </div> </blockquote><div></div>   </div></body></html>
--562241088-351124307-1399601805=:73330--




Message received at control <at> debbugs.gnu.org:


Received: (at control) by debbugs.gnu.org; 20 Oct 2018 03:19:16 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Oct 19 23:19:16 2018
Received: from localhost ([127.0.0.1]:60105 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gDhnH-0003pH-WF
	for submit <at> debbugs.gnu.org; Fri, 19 Oct 2018 23:19:16 -0400
Received: from mail-it1-f182.google.com ([209.85.166.182]:33924)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <assafgordon@HIDDEN>) id 1gDhnG-0003p5-BR
 for control <at> debbugs.gnu.org; Fri, 19 Oct 2018 23:19:14 -0400
Received: by mail-it1-f182.google.com with SMTP id l127-v6so5569590ith.1
 for <control <at> debbugs.gnu.org>; Fri, 19 Oct 2018 20:19:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=to:from:message-id:date:user-agent:mime-version:content-language
 :content-transfer-encoding;
 bh=TimM+Phc4YjonbIzGIX7g9mo2yjTgS0zyRUPx9auorU=;
 b=b6kaVXoKIAgDyYijS9eTIyeYL2ilYrVINVKnmDzEUp+5y3M9NZeAbD4J2n5LI71Bup
 wV54+GVUF95eI87Avvs0+zqQq1VSFXnzfun4YEoJ+xaivtwhdTjYSOPQdwUHrY8UWxTl
 dwckjSv79NXL9qDykk7hGNf/saPMVetdJChVlHqeVU8NUKROCl+7O/VBziMda5cEGZCc
 Azd6QjXWGV70BlLXNFaOY5Y8b9MusIeAX+BenOAadWCeaSnAwAsBMYbh4MYQFNWEXe4F
 BquWCCxwtyKjHt5Xw10ag90s2aW9+qMt1x+1Bb86t5rAZXQ+qQrA7aE+T4krvS3my/Sl
 1cLA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:to:from:message-id:date:user-agent:mime-version
 :content-language:content-transfer-encoding;
 bh=TimM+Phc4YjonbIzGIX7g9mo2yjTgS0zyRUPx9auorU=;
 b=QPd8PMib/7wpjoyc1HSvZ2JjQ5DYoqQYKXJRjlQ7cAjDBhZHW7mRTwTLsdDZpqCSFx
 WWebM5TQh1Xeojf+lgKzaeaBSzi25xj3qSeU7FMiuikPQTXmXspLT7YERb/zudvslIYF
 yQdyogIOOvDkHzpU4HsDXQKIU8hf3YXRnlKtp40GdkSsYJjTRVJPAi5gmzsxMm+FUw8A
 hbzIdjuWtlkiClf/tHzMAsqwMadf8rshd/EGNQUMupZcPVrTlteNM5rF5KtrtLOUJ5EQ
 3xnGTPaJlSIBsoIyL94YdOT6de2k+47OOQ7gN4oxBVSPPumYpQkqIU5VmoOQOz15EtDX
 8OdQ==
X-Gm-Message-State: ABuFfog0Z5ybvVzXNBbLfDf9p6kQdzBjA+NAIIPs0eNFoyGGUrwzC/GE
 NRmDOr5ezimRxxYB608PwMCCsXhv
X-Google-Smtp-Source: ACcGV63ftLEuamw0s0qVbf8WduzVb86d/GI4/bjaPSN+/UFsEdX0xdJxthDpkHSGId5rlgnKb0zwOg==
X-Received: by 2002:a02:212a:: with SMTP id
 e42-v6mr2774668jaa.59.1540005548167; 
 Fri, 19 Oct 2018 20:19:08 -0700 (PDT)
Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38])
 by smtp.googlemail.com with ESMTPSA id
 q205-v6sm2994062itc.2.2018.10.19.20.19.06
 for <control <at> debbugs.gnu.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Fri, 19 Oct 2018 20:19:06 -0700 (PDT)
To: control <at> debbugs.gnu.org
From: Assaf Gordon <assafgordon@HIDDEN>
Message-ID: <596e1d96-4e3e-029e-4823-583fcf40d915@HIDDEN>
Date: Fri, 19 Oct 2018 21:19:05 -0600
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Score: 2.0 (++)
X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 the administrator of that system for details.
 Content preview: severity 17196 wishlist retitle 17196 multibyte: printf: %s
 counts bytes instead of characters [...] 
 Content analysis details:   (2.0 points, 10.0 required)
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at http://www.dnswl.org/, no
 trust [209.85.166.182 listed in list.dnswl.org]
 -0.0 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
 [209.85.166.182 listed in wl.mailspike.net]
 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
 (assafgordon[at]gmail.com)
 -0.0 SPF_PASS               SPF: sender matches SPF record
 1.8 MISSING_SUBJECT        Missing Subject: header
 0.2 NO_SUBJECT             Extra score for no subject
X-Debbugs-Envelope-To: control
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 1.0 (+)

severity 17196 wishlist
retitle 17196 multibyte: printf: %s counts bytes instead of characters






Message received at control <at> debbugs.gnu.org:


Received: (at control) by debbugs.gnu.org; 20 Oct 2018 03:19:16 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Oct 19 23:19:16 2018
Received: from localhost ([127.0.0.1]:60105 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gDhnH-0003pH-WF
	for submit <at> debbugs.gnu.org; Fri, 19 Oct 2018 23:19:16 -0400
Received: from mail-it1-f182.google.com ([209.85.166.182]:33924)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <assafgordon@HIDDEN>) id 1gDhnG-0003p5-BR
 for control <at> debbugs.gnu.org; Fri, 19 Oct 2018 23:19:14 -0400
Received: by mail-it1-f182.google.com with SMTP id l127-v6so5569590ith.1
 for <control <at> debbugs.gnu.org>; Fri, 19 Oct 2018 20:19:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=to:from:message-id:date:user-agent:mime-version:content-language
 :content-transfer-encoding;
 bh=TimM+Phc4YjonbIzGIX7g9mo2yjTgS0zyRUPx9auorU=;
 b=b6kaVXoKIAgDyYijS9eTIyeYL2ilYrVINVKnmDzEUp+5y3M9NZeAbD4J2n5LI71Bup
 wV54+GVUF95eI87Avvs0+zqQq1VSFXnzfun4YEoJ+xaivtwhdTjYSOPQdwUHrY8UWxTl
 dwckjSv79NXL9qDykk7hGNf/saPMVetdJChVlHqeVU8NUKROCl+7O/VBziMda5cEGZCc
 Azd6QjXWGV70BlLXNFaOY5Y8b9MusIeAX+BenOAadWCeaSnAwAsBMYbh4MYQFNWEXe4F
 BquWCCxwtyKjHt5Xw10ag90s2aW9+qMt1x+1Bb86t5rAZXQ+qQrA7aE+T4krvS3my/Sl
 1cLA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:to:from:message-id:date:user-agent:mime-version
 :content-language:content-transfer-encoding;
 bh=TimM+Phc4YjonbIzGIX7g9mo2yjTgS0zyRUPx9auorU=;
 b=QPd8PMib/7wpjoyc1HSvZ2JjQ5DYoqQYKXJRjlQ7cAjDBhZHW7mRTwTLsdDZpqCSFx
 WWebM5TQh1Xeojf+lgKzaeaBSzi25xj3qSeU7FMiuikPQTXmXspLT7YERb/zudvslIYF
 yQdyogIOOvDkHzpU4HsDXQKIU8hf3YXRnlKtp40GdkSsYJjTRVJPAi5gmzsxMm+FUw8A
 hbzIdjuWtlkiClf/tHzMAsqwMadf8rshd/EGNQUMupZcPVrTlteNM5rF5KtrtLOUJ5EQ
 3xnGTPaJlSIBsoIyL94YdOT6de2k+47OOQ7gN4oxBVSPPumYpQkqIU5VmoOQOz15EtDX
 8OdQ==
X-Gm-Message-State: ABuFfog0Z5ybvVzXNBbLfDf9p6kQdzBjA+NAIIPs0eNFoyGGUrwzC/GE
 NRmDOr5ezimRxxYB608PwMCCsXhv
X-Google-Smtp-Source: ACcGV63ftLEuamw0s0qVbf8WduzVb86d/GI4/bjaPSN+/UFsEdX0xdJxthDpkHSGId5rlgnKb0zwOg==
X-Received: by 2002:a02:212a:: with SMTP id
 e42-v6mr2774668jaa.59.1540005548167; 
 Fri, 19 Oct 2018 20:19:08 -0700 (PDT)
Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38])
 by smtp.googlemail.com with ESMTPSA id
 q205-v6sm2994062itc.2.2018.10.19.20.19.06
 for <control <at> debbugs.gnu.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Fri, 19 Oct 2018 20:19:06 -0700 (PDT)
To: control <at> debbugs.gnu.org
From: Assaf Gordon <assafgordon@HIDDEN>
Message-ID: <596e1d96-4e3e-029e-4823-583fcf40d915@HIDDEN>
Date: Fri, 19 Oct 2018 21:19:05 -0600
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Score: 2.0 (++)
X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 the administrator of that system for details.
 Content preview: severity 17196 wishlist retitle 17196 multibyte: printf: %s
 counts bytes instead of characters [...] 
 Content analysis details:   (2.0 points, 10.0 required)
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at http://www.dnswl.org/, no
 trust [209.85.166.182 listed in list.dnswl.org]
 -0.0 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
 [209.85.166.182 listed in wl.mailspike.net]
 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
 (assafgordon[at]gmail.com)
 -0.0 SPF_PASS               SPF: sender matches SPF record
 1.8 MISSING_SUBJECT        Missing Subject: header
 0.2 NO_SUBJECT             Extra score for no subject
X-Debbugs-Envelope-To: control
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 1.0 (+)

severity 17196 wishlist
retitle 17196 multibyte: printf: %s counts bytes instead of characters







Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.