GNU bug report logs - #12192
multibyte: tr: TR operates on bytes, not characters

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: coreutils; Severity: wishlist; Reported by: Michael Stummvoll <michael@HIDDEN>; merged with #9365, #9569, #10880, #13362; dated Mon, 13 Aug 2012 13:02:02 UTC; Maintainer for coreutils is bug-coreutils@HIDDEN.
Changed bug title to 'multibyte: tr: TR operates on bytes, not characters' from 'tr - bytes vs characters' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.
Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.
Forcibly Merged 9365 9569 10880 12192 13362. Request was from Pádraig Brady <P@HIDDEN> to control <at> debbugs.gnu.org. Full text available.
Forcibly Merged 9365 9569 10880 12192. Request was from Jim Meyering <jim@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 12192 <at> debbugs.gnu.org:


Received: (at 12192) by debbugs.gnu.org; 15 Sep 2012 10:30:00 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Sep 15 06:30:00 2012
Received: from localhost ([127.0.0.1]:34896 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1TCpdP-0008Uy-VD
	for submit <at> debbugs.gnu.org; Sat, 15 Sep 2012 06:30:00 -0400
Received: from mx.meyering.net ([88.168.87.75]:49067)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <jim@HIDDEN>)
	id 1TCpdN-0008Un-W7; Sat, 15 Sep 2012 06:29:58 -0400
Received: from rho.meyering.net (rho.meyering.net [127.0.0.1])
	by rho.meyering.net (Acme Bit-Twister) with ESMTP id E49B7601F7;
	Sat, 15 Sep 2012 12:28:54 +0200 (CEST)
From: Jim Meyering <jim@HIDDEN>
To: Michael Stummvoll <michael@HIDDEN>
Subject: Re: bug#12192: tr - bytes vs characters
In-Reply-To: <20120813145222.0450a1a8@eddie> (Michael Stummvoll's message of
	"Mon, 13 Aug 2012 14:52:22 +0200")
References: <20120813145222.0450a1a8@eddie>
Date: Sat, 15 Sep 2012 12:28:54 +0200
Message-ID: <87wqzvvau1.fsf@HIDDEN>
Lines: 37
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.4 (--)
X-Debbugs-Envelope-To: 12192
Cc: 12192 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -2.4 (--)

forcemerge 12192 9365
thanks

Michael Stummvoll wrote:
> Hi gnu folks,
>
> as already known, tr cannot handle multibyte-encodings like utf-8:
>
>> mst@eddie:~$ echo "foo" | tr o =F6
>> f=C3=C3
>
> i know, that multibyte encoding support is not needed for
> posix-compilance, BUT:
>
> the manpage of tr says the following:
>
>> Translate, squeeze, and/or delete characters from standard input,
>> writing to standard output.
>
> and thats the inconsistence imho.
>
> The typical interpretation of "character" in such a context means one
> character on display. regardless which encoding is used or how many
> bytes are used to display this. So, if tr realy translates "characters"
> it should preserve the encoding. If it doesn't do, it does not
> translate "characters" but "bytes". So there I see two ways:
>
> - add multybyte-encoding support to tr
> or
> - change the manpage and helptext to not say "characters" but "bytes"
>
> since it doesn't seem that somebody want to add the support to tr, an
> update of the manpage would be the easier way to ensure the consistence.

Thanks for the report.
I'm merging this issue with the others that relate to tr
and multi-byte support.




Information forwarded to bug-coreutils@HIDDEN:
bug#12192; Package coreutils. Full text available.

Message received at 12192 <at> debbugs.gnu.org:


Received: (at 12192) by debbugs.gnu.org; 17 Aug 2012 12:12:45 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Aug 17 08:12:45 2012
Received: from localhost ([127.0.0.1]:35038 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1T2LPx-0000Ht-IA
	for submit <at> debbugs.gnu.org; Fri, 17 Aug 2012 08:12:45 -0400
Received: from wolf.stummi.org ([78.47.79.60]:56169)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <michael@HIDDEN>) id 1T2LPv-0000Hm-Ka
	for 12192 <at> debbugs.gnu.org; Fri, 17 Aug 2012 08:12:44 -0400
Received: from eddie (dslb-088-072-034-000.pools.arcor-ip.net [88.72.34.0])
	by wolf.stummi.org (Postfix) with ESMTPSA id D68EE1401FD;
	Fri, 17 Aug 2012 14:03:46 +0200 (CEST)
Date: Fri, 17 Aug 2012 14:03:42 +0200
From: Michael Stummvoll <michael@HIDDEN>
To: Paul Eggert <eggert@HIDDEN>
Subject: Re: bug#12192: tr - bytes vs characters
Message-ID: <20120817140342.03812f2b@eddie>
In-Reply-To: <5029BBE2.1030407@HIDDEN>
References: <20120813145222.0450a1a8@eddie> <502906FA.3040803@HIDDEN>
	<5029BBE2.1030407@HIDDEN>
X-Mailer: Claws Mail 3.8.1 (GTK+ 2.24.10; i486-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Spam-Score: -1.9 (-)
X-Debbugs-Envelope-To: 12192
Cc: 12192 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -1.9 (-)

Hi there,
> But yes, the main thing is for someone to contribute
> correct, easy-to-maintain, and efficient code.

Just for the record, if any day somebody wants to attend this

I just noticed, that the "tr" from 9base can handle utf-8 correctly.
9base is a unix-port of the plan9 utils: http://tools.suckless.org/9base

i didn't took an closer look yet to the sources neither from gnu tr nor
from 9base tr. But may somebody other could benefit from there.

Kind Regards,
Michael






Information forwarded to bug-coreutils@HIDDEN:
bug#12192; Package coreutils. Full text available.

Message received at 12192 <at> debbugs.gnu.org:


Received: (at 12192) by debbugs.gnu.org; 14 Aug 2012 07:53:06 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Aug 14 03:53:06 2012
Received: from localhost ([127.0.0.1]:55002 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1T1Bw1-0002qK-I1
	for submit <at> debbugs.gnu.org; Tue, 14 Aug 2012 03:53:06 -0400
Received: from smtp.cs.ucla.edu ([131.179.128.62]:53558)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <eggert@HIDDEN>) id 1T1Bw0-0002qC-30
	for 12192 <at> debbugs.gnu.org; Tue, 14 Aug 2012 03:53:05 -0400
Received: from localhost (localhost.localdomain [127.0.0.1])
	by smtp.cs.ucla.edu (Postfix) with ESMTP id 81620A60005;
	Tue, 14 Aug 2012 00:44:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu
Received: from smtp.cs.ucla.edu ([127.0.0.1])
	by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id pcBwhj5pT7Wj; Tue, 14 Aug 2012 00:44:26 -0700 (PDT)
Received: from [10.10.73.118] (unknown [208.181.80.18])
	by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 0FF43A60004;
	Tue, 14 Aug 2012 00:44:26 -0700 (PDT)
Message-ID: <502A01D8.1080604@HIDDEN>
Date: Tue, 14 Aug 2012 00:44:24 -0700
From: Paul Eggert <eggert@HIDDEN>
User-Agent: Mozilla/5.0 (X11; Linux i686;
	rv:14.0) Gecko/20120714 Thunderbird/14.0
MIME-Version: 1.0
To: Eric Blake <eblake@HIDDEN>
Subject: Re: bug#12192: tr - bytes vs characters
References: <20120813145222.0450a1a8@eddie> <502906FA.3040803@HIDDEN>
	<5029BBE2.1030407@HIDDEN> <5029E358.7000301@HIDDEN>
In-Reply-To: <5029E358.7000301@HIDDEN>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Score: -1.9 (-)
X-Debbugs-Envelope-To: 12192
Cc: 12192 <at> debbugs.gnu.org, Michael Stummvoll <michael@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -1.9 (-)

On 08/13/2012 10:34 PM, Eric Blake wrote:
> But POSIX _does_ require that tr be
> locale-aware, and therefore if an implementation provides multibyte
> locales (which most desktop glibc-based GNU/Linux systems do), then tr
> should honor those locales, including multibyte character support.

All this is absolutely correct; but still, if the issue is merely POSIX
conformance, these glibc-based GNU/Linux systems do conform to POSIX,
since the POSIX-conformance document for these systems can state that
the supported locales are merely the single-byte locales.  Admittedly this
is legal hairsplitting, but if POSIX compliance is the issue
then one is in legal-hairsplitting mode already....




Information forwarded to bug-coreutils@HIDDEN:
bug#12192; Package coreutils. Full text available.

Message received at 12192 <at> debbugs.gnu.org:


Received: (at 12192) by debbugs.gnu.org; 14 Aug 2012 05:43:09 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Aug 14 01:43:09 2012
Received: from localhost ([127.0.0.1]:54765 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1T19uG-0005dk-Pj
	for submit <at> debbugs.gnu.org; Tue, 14 Aug 2012 01:43:09 -0400
Received: from mx1.redhat.com ([209.132.183.28]:22795)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <eblake@HIDDEN>) id 1T19uD-0005db-K4
	for 12192 <at> debbugs.gnu.org; Tue, 14 Aug 2012 01:43:07 -0400
Received: from int-mx01.intmail.prod.int.phx2.redhat.com
	(int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q7E5YLmR003362
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Tue, 14 Aug 2012 01:34:21 -0400
Received: from [10.3.113.122] (ovpn-113-122.phx2.redhat.com [10.3.113.122])
	by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP
	id q7E5YG5X014002; Tue, 14 Aug 2012 01:34:16 -0400
Message-ID: <5029E358.7000301@HIDDEN>
Date: Mon, 13 Aug 2012 23:34:16 -0600
From: Eric Blake <eblake@HIDDEN>
Organization: Red Hat
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:14.0) Gecko/20120717 Thunderbird/14.0
MIME-Version: 1.0
To: Paul Eggert <eggert@HIDDEN>
Subject: Re: bug#12192: tr - bytes vs characters
References: <20120813145222.0450a1a8@eddie> <502906FA.3040803@HIDDEN>
	<5029BBE2.1030407@HIDDEN>
In-Reply-To: <5029BBE2.1030407@HIDDEN>
X-Enigmail-Version: 1.4.3
OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature";
	boundary="------------enig6C0A89109EE1E9A915F7BB30"
X-Scanned-By: MIMEDefang 2.67 on 10.5.11.11
X-Spam-Score: -6.9 (------)
X-Debbugs-Envelope-To: 12192
Cc: 12192 <at> debbugs.gnu.org, Michael Stummvoll <michael@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -6.9 (------)

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig6C0A89109EE1E9A915F7BB30
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 08/13/2012 08:45 PM, Paul Eggert wrote:
> On 08/13/2012 06:54 AM, Eric Blake wrote:
>> POSIX _does_ require multi-byte support
>=20
> The last time I checked, POSIX did not require
> the implementation to provide any multibyte locales.
> Has this changed?

Fair enough - POSIX does not require the existence of a multibyte
locale; an embedded system that provides only single-byte encodings can
still be POSIX-compliant.  But POSIX _does_ require that tr be
locale-aware, and therefore if an implementation provides multibyte
locales (which most desktop glibc-based GNU/Linux systems do), then tr
should honor those locales, including multibyte character support.

>=20
> But yes, the main thing is for someone to contribute
> correct, easy-to-maintain, and efficient code.

We're in violent agreement on this point :)

--=20
Eric Blake   eblake@HIDDEN    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


--------------enig6C0A89109EE1E9A915F7BB30
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBCAAGBQJQKeNYAAoJEKeha0olJ0NqfhMH+gJNQMClvbTFDeBr47+3Xt62
dMuOJoYFEWaBakfAYwAjDja0dDAiV3Dlgdit8pIFV4as9Qi8m2u/7zHjNHWEc818
ao4LaS2AvHyoUcumcnq7IN/YWG6rpCl+W6JNsVQI/xZE36SLTvNl3C9LXFNx8gW5
3wNRVpAxF+Ga62WCBapUvdsr/njx6LohPxU99dovsjDJcG+nrI+Y0iQ+EVd78Q3Z
1So307IIq8NARN15jLFUQNTMV8b59xCjUBlz+80aK1Gr8IxbX1D3oxgFg6PdUaGf
mDp0/kWuK8t2VOWU1WkqePXMu4cf0br1fKsyqrpv2K4SO1U//9uzclktaRxinP0=
=8DF9
-----END PGP SIGNATURE-----

--------------enig6C0A89109EE1E9A915F7BB30--




Information forwarded to bug-coreutils@HIDDEN:
bug#12192; Package coreutils. Full text available.

Message received at 12192 <at> debbugs.gnu.org:


Received: (at 12192) by debbugs.gnu.org; 14 Aug 2012 02:54:40 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Aug 13 22:54:39 2012
Received: from localhost ([127.0.0.1]:54578 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1T17HD-0008Jy-Mo
	for submit <at> debbugs.gnu.org; Mon, 13 Aug 2012 22:54:39 -0400
Received: from smtp.cs.ucla.edu ([131.179.128.62]:44924)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <eggert@HIDDEN>) id 1T17HA-0008Jj-5c
	for 12192 <at> debbugs.gnu.org; Mon, 13 Aug 2012 22:54:37 -0400
Received: from localhost (localhost.localdomain [127.0.0.1])
	by smtp.cs.ucla.edu (Postfix) with ESMTP id 0B2DAA60005;
	Mon, 13 Aug 2012 19:46:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu
Received: from smtp.cs.ucla.edu ([127.0.0.1])
	by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id iQhgQd4yoq6K; Mon, 13 Aug 2012 19:45:59 -0700 (PDT)
Received: from [10.10.73.118] (unknown [208.181.80.18])
	by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 948A7A60004;
	Mon, 13 Aug 2012 19:45:59 -0700 (PDT)
Message-ID: <5029BBE2.1030407@HIDDEN>
Date: Mon, 13 Aug 2012 19:45:54 -0700
From: Paul Eggert <eggert@HIDDEN>
User-Agent: Mozilla/5.0 (X11; Linux i686;
	rv:14.0) Gecko/20120714 Thunderbird/14.0
MIME-Version: 1.0
To: Eric Blake <eblake@HIDDEN>
Subject: Re: bug#12192: tr - bytes vs characters
References: <20120813145222.0450a1a8@eddie> <502906FA.3040803@HIDDEN>
In-Reply-To: <502906FA.3040803@HIDDEN>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Score: -1.9 (-)
X-Debbugs-Envelope-To: 12192
Cc: 12192 <at> debbugs.gnu.org, Michael Stummvoll <michael@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -1.9 (-)

On 08/13/2012 06:54 AM, Eric Blake wrote:
> POSIX _does_ require multi-byte support

The last time I checked, POSIX did not require
the implementation to provide any multibyte locales.
Has this changed?

But yes, the main thing is for someone to contribute
correct, easy-to-maintain, and efficient code.




Information forwarded to bug-coreutils@HIDDEN:
bug#12192; Package coreutils. Full text available.

Message received at 12192 <at> debbugs.gnu.org:


Received: (at 12192) by debbugs.gnu.org; 13 Aug 2012 14:02:41 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Aug 13 10:02:41 2012
Received: from localhost ([127.0.0.1]:53505 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1T0vE8-0005jH-Sc
	for submit <at> debbugs.gnu.org; Mon, 13 Aug 2012 10:02:41 -0400
Received: from mx1.redhat.com ([209.132.183.28]:48770)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <eblake@HIDDEN>) id 1T0vE5-0005j9-Kb
	for 12192 <at> debbugs.gnu.org; Mon, 13 Aug 2012 10:02:39 -0400
Received: from int-mx01.intmail.prod.int.phx2.redhat.com
	(int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q7DDs3Ud003524
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Mon, 13 Aug 2012 09:54:03 -0400
Received: from [10.3.113.122] (ovpn-113-122.phx2.redhat.com [10.3.113.122])
	by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP
	id q7DDs2QN024785; Mon, 13 Aug 2012 09:54:03 -0400
Message-ID: <502906FA.3040803@HIDDEN>
Date: Mon, 13 Aug 2012 07:54:02 -0600
From: Eric Blake <eblake@HIDDEN>
Organization: Red Hat
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:14.0) Gecko/20120717 Thunderbird/14.0
MIME-Version: 1.0
To: Michael Stummvoll <michael@HIDDEN>
Subject: Re: bug#12192: tr - bytes vs characters
References: <20120813145222.0450a1a8@eddie>
In-Reply-To: <20120813145222.0450a1a8@eddie>
X-Enigmail-Version: 1.4.3
OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature";
	boundary="------------enig5917B7A51DC11C249879F585"
X-Scanned-By: MIMEDefang 2.67 on 10.5.11.11
X-Spam-Score: -6.9 (------)
X-Debbugs-Envelope-To: 12192
Cc: 12192 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -6.9 (------)

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig5917B7A51DC11C249879F585
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 08/13/2012 06:52 AM, Michael Stummvoll wrote:
> Hi gnu folks,
>=20
> as already known, tr cannot handle multibyte-encodings like utf-8:
>=20
>> mst@eddie:~$ echo "foo" | tr o =C3=B6
>> f=C3=83=C3=83
>=20
> i know, that multibyte encoding support is not needed for
> posix-compilance,

Actually, POSIX _does_ require multi-byte support; it's just that no one
has yet contributed code for this upstream that is easy enough to
maintain and without penalizing single-byte locales.  Patches are welcome=
=2E

--=20
Eric Blake   eblake@HIDDEN    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


--------------enig5917B7A51DC11C249879F585
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBCAAGBQJQKQb6AAoJEKeha0olJ0NqEUoH/2dJx4tSPEIc1FCQ/ubCqbrs
7MMBxT8gAGXq7jjZKXH3kdvxlx5GXfQsSTPHDjITygo08XPD3Ng4UGTfjj8lhB7v
2YRc47C3n3eQ8Wq0bWU2oQ7NEllF9oAxSlKzS+y6MF0D9NQPTTF+C+AKmJLE3REe
a+rYi53N32ng0UM/pZOX2mVuRZcYv7piizkKZbqyGl1z0LKU5+UBd2//cGGq394W
QPdG49a+KmHicJ9Nw2sQRD+vPAj71+Qy/SGpCScEt+G5ak2T9BesUBTAAYvTGTPH
G8+TtOjfV+okJ4cMCk15IK1tX/douTHTSyMBbW4m6yjry09DXyiHFw3WPFA9BfA=
=TRe7
-----END PGP SIGNATURE-----

--------------enig5917B7A51DC11C249879F585--




Information forwarded to bug-coreutils@HIDDEN:
bug#12192; Package coreutils. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 13 Aug 2012 13:01:25 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Aug 13 09:01:25 2012
Received: from localhost ([127.0.0.1]:52807 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1T0uGq-0004D1-Vr
	for submit <at> debbugs.gnu.org; Mon, 13 Aug 2012 09:01:25 -0400
Received: from eggs.gnu.org ([208.118.235.92]:44414)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <michael@HIDDEN>) id 1T0uGo-0004Cs-9L
	for submit <at> debbugs.gnu.org; Mon, 13 Aug 2012 09:01:23 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <michael@HIDDEN>) id 1T0u8S-0004Sr-7i
	for submit <at> debbugs.gnu.org; Mon, 13 Aug 2012 08:52:49 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI
	autolearn=unavailable version=3.3.2
Received: from lists.gnu.org ([208.118.235.17]:46589)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <michael@HIDDEN>) id 1T0u8S-0004Sl-4l
	for submit <at> debbugs.gnu.org; Mon, 13 Aug 2012 08:52:44 -0400
Received: from eggs.gnu.org ([208.118.235.92]:45797)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <michael@HIDDEN>) id 1T0u8M-0005et-9Z
	for bug-coreutils@HIDDEN; Mon, 13 Aug 2012 08:52:43 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <michael@HIDDEN>) id 1T0u8K-0004S9-Mb
	for bug-coreutils@HIDDEN; Mon, 13 Aug 2012 08:52:38 -0400
Received: from wolf.stummi.org ([78.47.79.60]:41632)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <michael@HIDDEN>) id 1T0u8K-0004S1-GD
	for bug-coreutils@HIDDEN; Mon, 13 Aug 2012 08:52:36 -0400
Received: from eddie (dslb-088-072-034-000.pools.arcor-ip.net [88.72.34.0])
	by wolf.stummi.org (Postfix) with ESMTPSA id 03819140696
	for <bug-coreutils@HIDDEN>; Mon, 13 Aug 2012 14:52:31 +0200 (CEST)
Date: Mon, 13 Aug 2012 14:52:22 +0200
From: Michael Stummvoll <michael@HIDDEN>
To: bug-coreutils@HIDDEN
Subject: tr - bytes vs characters
Message-ID: <20120813145222.0450a1a8@eddie>
X-Mailer: Claws Mail 3.8.1 (GTK+ 2.24.10; i486-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3)
X-Received-From: 208.118.235.17
X-Spam-Score: -6.9 (------)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -6.9 (------)

Hi gnu folks,

as already known, tr cannot handle multibyte-encodings like utf-8:

> mst@eddie:~$ echo "foo" | tr o =C3=B6
> f=C3=83=C3=83

i know, that multibyte encoding support is not needed for
posix-compilance, BUT:

the manpage of tr says the following:=20

> Translate, squeeze, and/or delete characters from standard input,
> writing to standard output.

and thats the inconsistence imho.

The typical interpretation of "character" in such a context means one
character on display. regardless which encoding is used or how many
bytes are used to display this. So, if tr realy translates "characters"
it should preserve the encoding. If it doesn't do, it does not
translate "characters" but "bytes". So there I see two ways:

- add multybyte-encoding support to tr
or
- change the manpage and helptext to not say "characters" but "bytes"

since it doesn't seem that somebody want to add the support to tr, an
update of the manpage would be the easier way to ensure the consistence.

Kind regards,
Michael




Acknowledgement sent to Michael Stummvoll <michael@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-coreutils@HIDDEN. Full text available.
Report forwarded to bug-coreutils@HIDDEN:
bug#12192; Package coreutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Mon, 15 Oct 2018 14:15:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.