GNU logs - #10880, boring messages


Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#10880: instead of characters, tr works on bytes
Resent-From: "Marton Kadar" <marton.kadar@HIDDEN>
Original-Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Fri, 24 Feb 2012 17:31:02 +0000
Resent-Message-ID: <handler.10880.B.133010461716857 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 10880
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: 10880 <at> debbugs.gnu.org
X-Debbugs-Original-To: bug-coreutils@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.133010461716857
          (code B ref -1); Fri, 24 Feb 2012 17:31:02 +0000
Received: (at submit) by debbugs.gnu.org; 24 Feb 2012 17:30:17 +0000
Received: from localhost ([127.0.0.1]:54414 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1S0yyG-0004No-VL
	for submit <at> debbugs.gnu.org; Fri, 24 Feb 2012 12:30:17 -0500
Received: from eggs.gnu.org ([208.118.235.92]:56312)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <marton.kadar@HIDDEN>) id 1S0wMJ-0008HQ-Jf
	for submit <at> debbugs.gnu.org; Fri, 24 Feb 2012 09:42:57 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <marton.kadar@HIDDEN>) id 1S0wJP-0002ff-Qc
	for submit <at> debbugs.gnu.org; Fri, 24 Feb 2012 09:40:19 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=unavailable version=3.3.2
Received: from lists.gnu.org ([140.186.70.17]:37480)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <marton.kadar@HIDDEN>) id 1S0wJP-0002fH-Nb
	for submit <at> debbugs.gnu.org; Fri, 24 Feb 2012 09:39:55 -0500
Received: from eggs.gnu.org ([208.118.235.92]:38765)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <marton.kadar@HIDDEN>) id 1S0wJG-00023V-3W
	for bug-coreutils@HIDDEN; Fri, 24 Feb 2012 09:39:51 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <marton.kadar@HIDDEN>) id 1S0wJE-0002Vr-2b
	for bug-coreutils@HIDDEN; Fri, 24 Feb 2012 09:39:49 -0500
Received: from mailout-us.gmx.com ([74.208.5.67]:44151)
	by eggs.gnu.org with smtp (Exim 4.71)
	(envelope-from <marton.kadar@HIDDEN>) id 1S0wJD-0002VW-Rr
	for bug-coreutils@HIDDEN; Fri, 24 Feb 2012 09:39:44 -0500
Received: (qmail 28362 invoked by uid 0); 24 Feb 2012 14:29:13 -0000
Received: from 145.236.252.34 by rms-us010.v300.gmx.net with HTTP
Content-Type: text/plain; charset="utf-8"
Date: Fri, 24 Feb 2012 09:29:12 -0500
From: "Marton Kadar" <marton.kadar@HIDDEN>
Message-ID: <20120224142912.107150@HIDDEN>
MIME-Version: 1.0
X-Authenticated: #77717673
X-Flags: 0001
X-Mailer: GMX.com Web Mailer
x-registered: 0
Content-Transfer-Encoding: 8bit
X-GMX-UID: 7/s1b79I3zOlOMiDynAhP75+IGRvb8BK
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
	recognized.
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3)
X-Received-From: 140.186.70.17
X-Spam-Score: -1.9 (-)
X-Mailman-Approved-At: Fri, 24 Feb 2012 12:30:14 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -1.9 (-)

Don't know which is the official way to report a bug in 'tr'
so I will copy to this list too. CC me on replies as I am not
subscribing.

> ----- Original Message -----
> From: Marton Kadar
> Sent: 02/24/12 03:18 PM
> To: 9365 <at> debbugs.gnu.org
> Subject: Example
> 
> Environment for Hungary where á and í are proper lowercase letters
> but for example Spanish has these letters too:
> 
> $ set | grep ^L
> LANG=hu_HU.UTF-8
> LC_ALL=hu_HU.UTF-8
> LINES=73
> LOGNAME=kadar1marto518
> 
> Now let's see the bytestream for the following string
> (which means flood in Hungarian):
> 
> $ echo árvíz | od -c
> 0000000 303 241   r   v 303 255   z  \n
> 0000010
> 
> Let us try to delete a character and see if it worked:
> 
> $ echo árvíz | tr -d á | od -c
> 0000000   r   v 255   z  \n
> 0000005
> 
> Correct expected behavior would rather be:
> 
> $ echo árvíz | tr -d á | od -c
> 0000000   r   v 303 255   z  \n
> 0000006
> 
> I'll check the source for tr myself although never coded in C.
> This should be a trivial fix. The problem is especially annoying
> as we currently have no real simple and good general purpose case
> conversion tool. (correct me if I'm wrong, but tr should be this
> tool).
> 
> Marton Kadar





Message sent:


Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Mailer: MIME-tools 5.428 (Entity 5.428)
Content-Type: text/plain; charset=utf-8
X-Loop: help-debbugs@HIDDEN
From: help-debbugs@HIDDEN (GNU bug Tracking System)
To: "Marton Kadar" <marton.kadar@HIDDEN>
Subject: bug#10880: Acknowledgement (instead of characters, tr works on bytes)
Message-ID: <handler.10880.B.133010461716857.ack <at> debbugs.gnu.org>
References: <20120224142912.107150@HIDDEN>
X-Gnu-PR-Message: ack 10880
X-Gnu-PR-Package: coreutils
Reply-To: 10880 <at> debbugs.gnu.org
Date: Fri, 24 Feb 2012 17:31:02 +0000

Thank you for filing a new bug report with debbugs.gnu.org.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
 bug-coreutils@HIDDEN

If you wish to submit further information on this problem, please
send it to 10880 <at> debbugs.gnu.org.

Please do not send mail to help-debbugs@HIDDEN unless you wish
to report a problem with the Bug-tracking system.

--=20
10880: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D10880
GNU Bug Tracking System
Contact help-debbugs@HIDDEN with problems


Message received at request <at> debbugs.gnu.org:


Received: (at request) by debbugs.gnu.org; 24 Feb 2012 18:32:15 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Feb 24 13:32:15 2012
Received: from localhost ([127.0.0.1]:54472 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1S0zwA-0007Wc-SR
	for submit <at> debbugs.gnu.org; Fri, 24 Feb 2012 13:32:13 -0500
Received: from smtp.cs.ucla.edu ([131.179.128.62]:59984)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <eggert@HIDDEN>) id 1S0zw7-0007WL-Ot
	for request <at> debbugs.gnu.org; Fri, 24 Feb 2012 13:32:08 -0500
Received: from localhost (localhost.localdomain [127.0.0.1])
	by smtp.cs.ucla.edu (Postfix) with ESMTP id 791CEA60003;
	Fri, 24 Feb 2012 10:29:27 -0800 (PST)
X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu
Received: from smtp.cs.ucla.edu ([127.0.0.1])
	by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id O4kyIogK4dig; Fri, 24 Feb 2012 10:29:27 -0800 (PST)
Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200])
	by smtp.cs.ucla.edu (Postfix) with ESMTPSA id E2FA4A60002;
	Fri, 24 Feb 2012 10:29:26 -0800 (PST)
Message-ID: <4F47D706.2010609@HIDDEN>
Date: Fri, 24 Feb 2012 10:29:26 -0800
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:10.0) Gecko/20120131 Thunderbird/10.0
MIME-Version: 1.0
To: request <at> debbugs.gnu.org
Subject: Re: bug#10880: instead of characters, tr works on bytes
References: <20120224142912.107150@HIDDEN>
In-Reply-To: <20120224142912.107150@HIDDEN>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Score: -1.9 (-)
X-Debbugs-Envelope-To: request
Cc: Marton Kadar <marton.kadar@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -1.9 (-)

forcemerge 9365 10880
thanks

To add info to your existing bug report, you can send email
to <9365 <at> debbugs.gnu.org>; that way, the system doesn't
mistakenly open a new ticket for a new bug report.




Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#10880: instead of characters, tr works on bytes
Resent-From: Eric Blake <eblake@HIDDEN>
Original-Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Sat, 25 Feb 2012 03:32:01 +0000
Resent-Message-ID: <handler.10880.B10880.133014068523626 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 10880
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: Marton Kadar <marton.kadar@HIDDEN>
Cc: 10880 <at> debbugs.gnu.org
Received: via spool by 10880-submit <at> debbugs.gnu.org id=B10880.133014068523626
          (code B ref 10880); Sat, 25 Feb 2012 03:32:01 +0000
Received: (at 10880) by debbugs.gnu.org; 25 Feb 2012 03:31:25 +0000
Received: from localhost ([127.0.0.1]:54859 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1S18M0-000690-SP
	for submit <at> debbugs.gnu.org; Fri, 24 Feb 2012 22:31:25 -0500
Received: from mx1.redhat.com ([209.132.183.28]:48399)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <eblake@HIDDEN>) id 1S18Lx-00068r-KW
	for 10880 <at> debbugs.gnu.org; Fri, 24 Feb 2012 22:31:23 -0500
Received: from int-mx12.intmail.prod.int.phx2.redhat.com
	(int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q1P3Sh1X020233
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Fri, 24 Feb 2012 22:28:43 -0500
Received: from [10.3.113.113] (ovpn-113-113.phx2.redhat.com [10.3.113.113])
	by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP
	id q1P3Sgp5012220; Fri, 24 Feb 2012 22:28:42 -0500
Message-ID: <4F485569.2040002@HIDDEN>
Date: Fri, 24 Feb 2012 20:28:41 -0700
From: Eric Blake <eblake@HIDDEN>
Organization: Red Hat
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:10.0.1) Gecko/20120209 Thunderbird/10.0.1
MIME-Version: 1.0
References: <20120224142912.107150@HIDDEN>
In-Reply-To: <20120224142912.107150@HIDDEN>
X-Enigmail-Version: 1.3.5
OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature";
	boundary="------------enig9A09A192FFF0EC7AC3962D02"
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25
X-Spam-Score: -6.9 (------)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -6.9 (------)

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig9A09A192FFF0EC7AC3962D02
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 02/24/2012 07:29 AM, Marton Kadar wrote:
> Don't know which is the official way to report a bug in 'tr'
> so I will copy to this list too. CC me on replies as I am not
> subscribing.

Sending mail to coreutils@HIDDEN _is_ what creates a bug on
debbugs.gnu.org, so you have managed to create a duplicate.  Paul Eggert
has already merged 9365, 10880, and 9569, so now, replying to any one of
those three is merely adding information to the same report.

>>
>> Let us try to delete a character and see if it worked:
>>
>> $ echo =C3=A1rv=C3=ADz | tr -d =C3=A1 | od -c
>> 0000000   r   v 255   z  \n
>> 0000005

Please keep in mind that upstream coreutils is not yet converted over to
multibyte support.  This is evidence of one of the places that multibyte
support is required, and therefore, where you cannot expect things to
work yet.  No one has yet contributed a maintainable patch that does not
penalize single-byte locales, at least not upstream.  Several distros
have their own UTF-8 patches that they apply, but then, this would be a
bug you report to your distro and not upstream.

>> I'll check the source for tr myself although never coded in C.
>> This should be a trivial fix.

Alas, dealing with multibyte characters without penalizing single-byte
locales is NOT trivial, or it would have been done long ago.

--=20
Eric Blake   eblake@HIDDEN    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


--------------enig9A09A192FFF0EC7AC3962D02
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBCAAGBQJPSFVqAAoJEKeha0olJ0NqBiMH/17qYuhYpdzTtDsqUEQAYl2I
VFrnMIB7qMKkx1JoliWErIhNB9c2BPCo1fDcwpzpWPg6WF3MDicSVCprX4oFXqoP
ekzGlcfeQVQ1HOLigXjuegmc9+uHkCFmX/9GEYqUQzz54zklVDpQS8UZTRzaB8db
I/pVTsKVlnOLaN71f/CCALIbPx1428QXXfAslqF3vxqKGjOtXdNoSq6u96fuXocp
FS+9uKezPv8b7CgebMQnAU5hnY3f1N3HZM7+xXBEIuvjlPccqiI8DiS8N4hSb1Xi
02U3GbwBLcvnWjQyHqxHnf1/pfIQdJUirg/5/GgzqUHmwHEpm2DoftIBbHQyceg=
=1w/w
-----END PGP SIGNATURE-----

--------------enig9A09A192FFF0EC7AC3962D02--




Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#10880: instead of characters, tr works on bytes
References: <20120224142912.107150@HIDDEN>
In-Reply-To: <20120224142912.107150@HIDDEN>
Resent-From: "Marton Kadar" <marton.kadar@HIDDEN>
Original-Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Sat, 25 Feb 2012 22:11:01 +0000
Resent-Message-ID: <handler.10880.B10880.133020781510934 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 10880
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: 10880 <at> debbugs.gnu.org
Received: via spool by 10880-submit <at> debbugs.gnu.org id=B10880.133020781510934
          (code B ref 10880); Sat, 25 Feb 2012 22:11:01 +0000
Received: (at 10880) by debbugs.gnu.org; 25 Feb 2012 22:10:15 +0000
Received: from localhost ([127.0.0.1]:56466 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1S1Pok-0002qJ-KA
	for submit <at> debbugs.gnu.org; Sat, 25 Feb 2012 17:10:14 -0500
Received: from mailout-us.gmx.com ([74.208.5.67]:47702)
	by debbugs.gnu.org with smtp (Exim 4.72)
	(envelope-from <marton.kadar@HIDDEN>) id 1S1Poi-0002q9-Ea
	for 10880 <at> debbugs.gnu.org; Sat, 25 Feb 2012 17:10:13 -0500
Received: (qmail 10440 invoked by uid 0); 25 Feb 2012 22:07:29 -0000
Received: from 79.122.6.148 by rms-us009.v300.gmx.net with HTTP
Content-Type: text/plain; charset="utf-8"
Date: Sat, 25 Feb 2012 17:07:27 -0500
From: "Marton Kadar" <marton.kadar@HIDDEN>
Message-ID: <20120225220727.107140@HIDDEN>
MIME-Version: 1.0
X-Authenticated: #77717673
X-Flags: 0001
X-Mailer: GMX.com Web Mailer
x-registered: 0
Content-Transfer-Encoding: 8bit
X-GMX-UID: yD47b75I3zOlOMiDynAh9Pt+IGRvbwAE
X-Spam-Score: -1.9 (-)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -1.9 (-)

> ----- Original Message -----
> From: Eric Blake
> Sent: 02/25/12 04:28 AM
> To: Marton Kadar
> Subject: Re: bug#10880: instead of characters, tr works on bytes
> 
> On 02/24/2012 07:29 AM, Marton Kadar wrote:
> > Don't know which is the official way to report a bug in 'tr'
> > so I will copy to this list too. CC me on replies as I am not
> > subscribing.
> 
> Sending mail to coreutils@HIDDEN _is_ what creates a bug on
> debbugs.gnu.org, so you have managed to create a duplicate. Paul Eggert
> has already merged 9365, 10880, and 9569, so now, replying to any one of
> those three is merely adding information to the same report.
> 
> >>
> >> Let us try to delete a character and see if it worked:
> >>
> >> $ echo árvíz | tr -d á | od -c
> >> 0000000 r v 255 z \n
> >> 0000005
> 
> Please keep in mind that upstream coreutils is not yet converted over to
> multibyte support. This is evidence of one of the places that multibyte
> support is required, and therefore, where you cannot expect things to
> work yet. No one has yet contributed a maintainable patch that does not
> penalize single-byte locales, at least not upstream. Several distros
> have their own UTF-8 patches that they apply, but then, this would be a
> bug you report to your distro and not upstream.
> 
> >> I'll check the source for tr myself although never coded in C.
> >> This should be a trivial fix.
> 
> Alas, dealing with multibyte characters without penalizing single-byte
> locales is NOT trivial, or it would have been done long ago.

"Penalizing" single-byte locales - did you mean in performance or in functionality?
I understand that a generalized algorithm would probably be slower than one tuned for the single byte case.

But I suspect that you are also referring to some functional implication, as avoiding a solely performance related penalty in text handling command line utilities can never be a justifiable reason for incorrect functionality.

Besides, the execution path (sigle byte specific or generalized multibyte capable) can be determined at program startup, so in the worst case there can be a tr and a tr-slow-but-multibyte version, former calling the latter when so directed by the locale settings.

A minimal "solution" could also be to put a warning on each affected program's man page: "Multibyte locales currently unsupported!". It is not always immediately apparent, what the problem is, as in many special cases it happens to work as expected, then in very similar other cases it doesn't.

> 
> -- 
> Eric Blake eblake@HIDDEN +1-919-301-3266
> Libvirt virtualization library http://libvirt.org





Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#10880: instead of characters, tr works on bytes
Resent-From: Paul Eggert <eggert@HIDDEN>
Original-Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Sat, 25 Feb 2012 23:24:02 +0000
Resent-Message-ID: <handler.10880.B10880.133021220917366 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 10880
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: Marton Kadar <marton.kadar@HIDDEN>
Cc: 10880 <at> debbugs.gnu.org
Received: via spool by 10880-submit <at> debbugs.gnu.org id=B10880.133021220917366
          (code B ref 10880); Sat, 25 Feb 2012 23:24:02 +0000
Received: (at 10880) by debbugs.gnu.org; 25 Feb 2012 23:23:29 +0000
Received: from localhost ([127.0.0.1]:56518 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1S1Qxc-0004W2-Tb
	for submit <at> debbugs.gnu.org; Sat, 25 Feb 2012 18:23:29 -0500
Received: from smtp.cs.ucla.edu ([131.179.128.62]:50255)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <eggert@HIDDEN>) id 1S1Qxa-0004Vr-1x
	for 10880 <at> debbugs.gnu.org; Sat, 25 Feb 2012 18:23:27 -0500
Received: from localhost (localhost.localdomain [127.0.0.1])
	by smtp.cs.ucla.edu (Postfix) with ESMTP id 58F5139E800E;
	Sat, 25 Feb 2012 15:20:43 -0800 (PST)
X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu
Received: from smtp.cs.ucla.edu ([127.0.0.1])
	by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id vv629Xoi7A4R; Sat, 25 Feb 2012 15:20:43 -0800 (PST)
Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net
	[71.189.109.235])
	by smtp.cs.ucla.edu (Postfix) with ESMTPSA id E9BF739E800C;
	Sat, 25 Feb 2012 15:20:42 -0800 (PST)
Message-ID: <4F496CCC.5010408@HIDDEN>
Date: Sat, 25 Feb 2012 15:20:44 -0800
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
User-Agent: Mozilla/5.0 (X11; Linux i686;
	rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2
MIME-Version: 1.0
References: <20120224142912.107150@HIDDEN> <20120225220727.107140@HIDDEN>
In-Reply-To: <20120225220727.107140@HIDDEN>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Score: -1.9 (-)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -1.9 (-)

On 02/25/2012 02:07 PM, Marton Kadar wrote:

> the execution path (sigle byte specific or generalized
> multibyte capable) can be determined at program startup, so in the
> worst case there can be a tr and a tr-slow-but-multibyte version,
> former calling the latter when so directed by the locale settings.

Something like that should work, yes.  Unfortunately so far nobody has
volunteered to do it.  The task would not be trivial.  We don't want
to maintain two copies of the code, one for single-byte and one for
multibyte, as that'd be a maintenance problem.  Instead, we'd like to
have just one copy of the code, which is easy to read and which
compiles into either unibyte or multibyte versions.

> avoiding a solely performance related penalty in text handling
> command line utilities can never be a justifiable reason for
> incorrect functionality.

As far as I know there is no requirement in POSIX that applications
must support multibyte locales, and there's no documentation claiming
that the utilities in question support multibyte location, so this is
not a bug; it's a feature request.

My opinion about this may be colored by an experience I had yesterday
with the latest version of GNU sed.  Single-byte it worked fine;
multibyte it was so slow that I gave up.  We don't want this to
happen with the core utilities.





Message sent to bug-coreutils@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#10880: instead of characters, tr works on bytes
Resent-From: Chris Jones <cjns1989@HIDDEN>
Original-Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Resent-CC: bug-coreutils@HIDDEN
Resent-Date: Mon, 27 Feb 2012 06:16:01 +0000
Resent-Message-ID: <handler.10880.B.13303233172909 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 10880
X-GNU-PR-Package: coreutils
X-GNU-PR-Keywords: 
To: 10880 <at> debbugs.gnu.org
X-Debbugs-Original-To: bug-coreutils@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.13303233172909
          (code B ref -1); Mon, 27 Feb 2012 06:16:01 +0000
Received: (at submit) by debbugs.gnu.org; 27 Feb 2012 06:15:17 +0000
Received: from localhost ([127.0.0.1]:58761 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1S1trg-0000ks-KR
	for submit <at> debbugs.gnu.org; Mon, 27 Feb 2012 01:15:16 -0500
Received: from eggs.gnu.org ([208.118.235.92]:49075)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <cjns1989@HIDDEN>) id 1S1tre-0000kh-BE
	for submit <at> debbugs.gnu.org; Mon, 27 Feb 2012 01:15:15 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <cjns1989@HIDDEN>) id 1S1tot-0003Md-JF
	for submit <at> debbugs.gnu.org; Mon, 27 Feb 2012 01:12:24 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.1 required=5.0 tests=BAYES_00,BODY_8BITS,
	FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM autolearn=no version=3.3.2
Received: from lists.gnu.org ([208.118.235.17]:36939)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <cjns1989@HIDDEN>) id 1S1tot-0003Ln-FG
	for submit <at> debbugs.gnu.org; Mon, 27 Feb 2012 01:12:23 -0500
Received: from eggs.gnu.org ([208.118.235.92]:46282)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <cjns1989@HIDDEN>) id 1S1tor-0000nR-I6
	for bug-coreutils@HIDDEN; Mon, 27 Feb 2012 01:12:22 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <cjns1989@HIDDEN>) id 1S1tON-0005GB-KO
	for bug-coreutils@HIDDEN; Mon, 27 Feb 2012 00:45:00 -0500
Received: from mta1.srv.hcvlny.cv.net ([167.206.4.196]:55790)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <cjns1989@HIDDEN>) id 1S1tON-0005FR-Gj
	for bug-coreutils@HIDDEN; Mon, 27 Feb 2012 00:44:59 -0500
Received: from pavo.local (ool-457112ca.dyn.optonline.net [69.113.18.202])
	by mta1.srv.hcvlny.cv.net
	(Sun Java System Messaging Server 6.2-8.04 (built Feb 28 2007))
	with ESMTP id <0M0100JJ5EMX5NB0@HIDDEN> for
	bug-coreutils@HIDDEN; Mon, 27 Feb 2012 00:44:58 -0500 (EST)
Received: from gavron by pavo.local with local (Exim 4.69)
	(envelope-from <cjns1989@HIDDEN>)
	id 1S1tOL-0004lo-1S	for bug-coreutils@HIDDEN;
	Mon, 27 Feb 2012 00:44:57 -0500
Date: Mon, 27 Feb 2012 00:44:56 -0500
From: Chris Jones <cjns1989@HIDDEN>
In-reply-to: <20120224142912.107150@HIDDEN>
Mail-followup-to: bug-coreutils@HIDDEN
Message-id: <20120227054456.GA3559@HIDDEN>
MIME-version: 1.0
Content-type: text/plain; charset=utf-8
Content-transfer-encoding: QUOTED-PRINTABLE
Content-disposition: inline
References: <20120224142912.107150@HIDDEN>
User-Agent: Mutt/1.5.18 (2008-05-17)
X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3)
X-Received-From: 208.118.235.17
X-Spam-Score: 1.8 (+)
X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has
 identified this incoming email as possible spam.  The original message
 has been attached to this so you can view it (if it isn't spam) or label
 similar future email.  If you have any questions, see
 the administrator of that system for details.
 
 Content preview:  On Fri, Feb 24, 2012 at 09:29:12AM EST, Marton Kadar wrote:
    [..] > > $ set | grep ^L > > LANG=hu_HU.UTF-8 > > LC_ALL=hu_HU.UTF-8 > >
   LINES=73 > > LOGNAME=kadar1marto518 > > > > Now let's see the bytestream for
    the following string > > (which means flood in Hungarian): > > > > $ echo
    =?UTF-8?Q?=C3=A1rv=C3=ADz?= | od -c > > 0000000 303 241 =?UTF-8?Q?=C2?= r =?UTF-8?Q?=C2?= v 303 255 =?UTF-8?Q?=C2?= z =?UTF-8?Q?=C2?= \n > > 0000010
   > > > > Let us try to delete a character and see if it worked: > > > > $ echo
    =?UTF-8?Q?=C3=A1rv=C3=ADz?= | tr -d =?UTF-8?Q?=C3=A1?= | od -c > > 0000000 =?UTF-8?Q?=C2?= r =?UTF-8?Q?=C2?= v 255 =?UTF-8?Q?=C2?= z =?UTF-8?Q?=C2?= \n > > 0000005 [...]
    
 
 Content analysis details:   (1.8 points, 10.0 required)
 
  pts rule name              description
 ---- ---------------------- --------------------------------------------------
  0.0 FREEMAIL_FROM          Sender email is commonly abused enduser mail provider
                             (cjns1989[at]gmail.com)
  2.7 RCVD_IN_PSBL           RBL: Received via a relay in PSBL
                             [208.118.235.92 listed in psbl.surriel.com]
  0.8 SPF_NEUTRAL            SPF: sender does not match SPF record (neutral)
  0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in
                             digit (cjns1989[at]gmail.com)
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
                             [score: 0.0000]
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: 1.8 (+)
X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has
 identified this incoming email as possible spam.  The original message
 has been attached to this so you can view it (if it isn't spam) or label
 similar future email.  If you have any questions, see
 the administrator of that system for details.
 
 Content preview:  On Fri, Feb 24, 2012 at 09:29:12AM EST, Marton Kadar wrote:
    [..] > > $ set | grep ^L > > LANG=hu_HU.UTF-8 > > LC_ALL=hu_HU.UTF-8 > >
   LINES=73 > > LOGNAME=kadar1marto518 > > > > Now let's see the bytestream for
    the following string > > (which means flood in Hungarian): > > > > $ echo
    =?UTF-8?Q?=C3=A1rv=C3=ADz?= | od -c > > 0000000 303 241 =?UTF-8?Q?=C2?= r =?UTF-8?Q?=C2?= v 303 255 =?UTF-8?Q?=C2?= z =?UTF-8?Q?=C2?= \n > > 0000010
   > > > > Let us try to delete a character and see if it worked: > > > > $ echo
    =?UTF-8?Q?=C3=A1rv=C3=ADz?= | tr -d =?UTF-8?Q?=C3=A1?= | od -c > > 0000000 =?UTF-8?Q?=C2?= r =?UTF-8?Q?=C2?= v 255 =?UTF-8?Q?=C2?= z =?UTF-8?Q?=C2?= \n > > 0000005 [...]
    
 
 Content analysis details:   (1.8 points, 10.0 required)
 
  pts rule name              description
 ---- ---------------------- --------------------------------------------------
  0.0 FREEMAIL_FROM          Sender email is commonly abused enduser mail provider
                             (cjns1989[at]gmail.com)
  2.7 RCVD_IN_PSBL           RBL: Received via a relay in PSBL
                             [208.118.235.92 listed in psbl.surriel.com]
  0.8 SPF_NEUTRAL            SPF: sender does not match SPF record (neutral)
  0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in
                             digit (cjns1989[at]gmail.com)
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
                             [score: 0.0000]

On Fri, Feb 24, 2012 at 09:29:12AM EST, Marton Kadar wrote:

[..]

> > $ set | grep ^L
> > LANG=3Dhu_HU.UTF-8
> > LC_ALL=3Dhu_HU.UTF-8
> > LINES=3D73
> > LOGNAME=3Dkadar1marto518
> >=20
> > Now let's see the bytestream for the following string
> > (which means flood in Hungarian):
> >=20
> > $ echo =C3=A1rv=C3=ADz | od -c
> > 0000000 303 241 =C2=A0 r =C2=A0 v 303 255 =C2=A0 z =C2=A0\n
> > 0000010
> >=20
> > Let us try to delete a character and see if it worked:
> >=20
> > $ echo =C3=A1rv=C3=ADz | tr -d =C3=A1 | od -c
> > 0000000 =C2=A0 r =C2=A0 v 255 =C2=A0 z =C2=A0\n
> > 0000005

[..]

Try this for size...

$ echo =C3=A1rv=C3=ADz | od -t x1z -w16=20
$ echo =C3=A1rv=C3=ADz | tr -d =C3=A9 | od -t x1z -w16=20

$ echo =C3=A1rv=C3=ADz | tr -d =C3=A9 > /tmp/u.txt
$ isutf8 /tmp/u.txt

And there is not even an =E2=80=98=C3=A9=E2=80=99 in =E2=80=98=C3=
=A1rv=C3=ADz=E2=80=99..

CJ

P.S. Though you do have to look for it a bit, the coreutils manual
clearly states that only single-byte encodings are supported:=20

http://www.gnu.org/software/coreutils/manual/html_node/tr-invocation.=
html

--=20
Mooo Canada!!!!





Message received at control <at> debbugs.gnu.org:


Received: (at control) by debbugs.gnu.org; 15 Sep 2012 10:30:00 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Sep 15 06:30:00 2012
Received: from localhost ([127.0.0.1]:34898 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1TCpdQ-0008V1-6j
	for submit <at> debbugs.gnu.org; Sat, 15 Sep 2012 06:30:00 -0400
Received: from mx.meyering.net ([88.168.87.75]:49067)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <jim@HIDDEN>)
	id 1TCpdN-0008Un-W7; Sat, 15 Sep 2012 06:29:58 -0400
Received: from rho.meyering.net (rho.meyering.net [127.0.0.1])
	by rho.meyering.net (Acme Bit-Twister) with ESMTP id E49B7601F7;
	Sat, 15 Sep 2012 12:28:54 +0200 (CEST)
From: Jim Meyering <jim@HIDDEN>
To: Michael Stummvoll <michael@HIDDEN>
Subject: Re: bug#12192: tr - bytes vs characters
In-Reply-To: <20120813145222.0450a1a8@eddie> (Michael Stummvoll's message of
	"Mon, 13 Aug 2012 14:52:22 +0200")
References: <20120813145222.0450a1a8@eddie>
Date: Sat, 15 Sep 2012 12:28:54 +0200
Message-ID: <87wqzvvau1.fsf@HIDDEN>
Lines: 37
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.4 (--)
X-Debbugs-Envelope-To: control
Cc: 12192 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -2.4 (--)

forcemerge 12192 9365
thanks

Michael Stummvoll wrote:
> Hi gnu folks,
>
> as already known, tr cannot handle multibyte-encodings like utf-8:
>
>> mst@eddie:~$ echo "foo" | tr o =F6
>> f=C3=C3
>
> i know, that multibyte encoding support is not needed for
> posix-compilance, BUT:
>
> the manpage of tr says the following:
>
>> Translate, squeeze, and/or delete characters from standard input,
>> writing to standard output.
>
> and thats the inconsistence imho.
>
> The typical interpretation of "character" in such a context means one
> character on display. regardless which encoding is used or how many
> bytes are used to display this. So, if tr realy translates "characters"
> it should preserve the encoding. If it doesn't do, it does not
> translate "characters" but "bytes". So there I see two ways:
>
> - add multybyte-encoding support to tr
> or
> - change the manpage and helptext to not say "characters" but "bytes"
>
> since it doesn't seem that somebody want to add the support to tr, an
> update of the manpage would be the easier way to ensure the consistence.

Thanks for the report.
I'm merging this issue with the others that relate to tr
and multi-byte support.




Message received at control <at> debbugs.gnu.org:


Received: (at control) by debbugs.gnu.org; 6 Jan 2013 12:23:25 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Jan 06 07:23:25 2013
Received: from localhost ([127.0.0.1]:47129 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1TrpG9-0005xs-64
	for submit <at> debbugs.gnu.org; Sun, 06 Jan 2013 07:23:25 -0500
Received: from mx1.redhat.com ([209.132.183.28]:20124)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <P@HIDDEN>)
	id 1TrpG4-0005xb-KS; Sun, 06 Jan 2013 07:23:21 -0500
Received: from int-mx12.intmail.prod.int.phx2.redhat.com
	(int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r06CN0bp025753
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Sun, 6 Jan 2013 07:23:01 -0500
Received: from [10.36.116.39] (ovpn-116-39.ams2.redhat.com [10.36.116.39])
	by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP
	id r06CMvfH030424
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
	Sun, 6 Jan 2013 07:22:59 -0500
Message-ID: <50E96CA0.4030802@HIDDEN>
Date: Sun, 06 Jan 2013 12:22:56 +0000
From: =?ISO-8859-1?Q?P=E1draig_Brady?= <P@HIDDEN>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:13.0) Gecko/20120615 Thunderbird/13.0.1
MIME-Version: 1.0
To: Urs Thuermann <urs@HIDDEN>
Subject: Re: bug#13362: tr does not work with UTF-8 locales
References: <ygf38yfalsz.fsf@HIDDEN>
In-Reply-To: <ygf38yfalsz.fsf@HIDDEN>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id
	r06CN0bp025753
X-Spam-Score: -4.2 (----)
X-Debbugs-Envelope-To: control
Cc: 13362 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -5.0 (-----)

forcemerge 13362 9365
thanks

On 01/05/2013 11:53 AM, Urs Thuermann wrote:
> The tr utility from coreutils-8.20 does not handle multi-byte
> characters in UTF-8 correctly.  It seems the arguments and standard
> input are read byte-by-byte instead of character-by-character.

We all agree that this is an issue.
Someone just needs to get the time to implement it.

thanks,
P=E1draig.




Message received at control <at> debbugs.gnu.org:


Received: (at control) by debbugs.gnu.org; 15 Oct 2018 14:06:07 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Oct 15 10:06:07 2018
Received: from localhost ([127.0.0.1]:50838 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gC3VX-0006u9-Ec
	for submit <at> debbugs.gnu.org; Mon, 15 Oct 2018 10:06:07 -0400
Received: from mail-pf1-f181.google.com ([209.85.210.181]:45055)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <assafgordon@HIDDEN>) id 1gC3VV-0006tg-UX
 for control <at> debbugs.gnu.org; Mon, 15 Oct 2018 10:06:06 -0400
Received: by mail-pf1-f181.google.com with SMTP id r9-v6so9729297pff.11
 for <control <at> debbugs.gnu.org>; Mon, 15 Oct 2018 07:06:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=to:from:message-id:date:user-agent:mime-version:content-language
 :content-transfer-encoding;
 bh=tslEaETexaL5Plma/unGTsX6EP4DKD4xeOJEPvCMxto=;
 b=IgYFBuphZeO6ufK2R0ThohvlgQjuhoNFWS1LlcCSo1fBUKysCs/zlabxiMQtbAEcv9
 5lGZ2r6pLOmV6xqrPS+qDt2TeNf9PdqM87PuEWn/4x620EkidrGhTUDmK8QpzgVQzVAQ
 EbQlLs3N5jp0ConLqnjfIhq7aDCn8Xl50ynhkCdy0fANLzrY2EvRmD3m7uMjb5jb+xqH
 xS/hKhgbCyjDaI3jvxizDo6EABMqsRi+3b0sr7dxdtRTv6KUKh7gQ5PU0nHQE8YeQONT
 P9HPFS0vFx2QnpJxGkZmXHf+3XGL8HKXvVuDTlSNkqF/qZTLVqCRMbMCWIvHt6+ZEs5y
 n3hw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:to:from:message-id:date:user-agent:mime-version
 :content-language:content-transfer-encoding;
 bh=tslEaETexaL5Plma/unGTsX6EP4DKD4xeOJEPvCMxto=;
 b=BN3r7OXN8tmVFFClh/JaAHUbBcnAdSXiJ67IUn+im2fR1JlCkyNstmy7CewveTl3rI
 vnSXQ19OdA6Qp1kJFBuaGcoNBRzwMYw41tD8vZGNs6p+KVkRqseq9FvSPMt0jSjTDPMX
 mTl7/OUUicxSa42ymRGVWKwy+eldyLKFF2WgL6mhMSZZXaqgQHErZbsyshfA9Tm1AP9n
 m5S7nzWYoRABBjQeUfpMZUOTVocTfC2NLLWQ7t3voQlPJH4k+FeCZC+MqQN/p+8OPxzU
 kFK3LSse0N4xr5hs4MNgeUfpxgSUlJWXlkBAjD74LRDzK9UzaXZnfGEpez1iMWr9MNLu
 oS0A==
X-Gm-Message-State: ABuFfogF5FahDcLajf6++HIxsfi3VRUE8JU/09QQUb42o/IqgSydVRRq
 3Vo6uKUbR8M+YV87LQ8Js2Dg6zzjDqI=
X-Google-Smtp-Source: ACcGV630bCAJkl8c383Nvxde4mlHBbauHW61q0zd37Y2dEOjIhXh0u6sXrtKgN3rwQfCYT0iTqNkyg==
X-Received: by 2002:a62:67c3:: with SMTP id
 t64-v6mr10225425pfj.76.1539612359472; 
 Mon, 15 Oct 2018 07:05:59 -0700 (PDT)
Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38])
 by smtp.googlemail.com with ESMTPSA id
 h77-v6sm21227916pfh.13.2018.10.15.07.05.57
 for <control <at> debbugs.gnu.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 15 Oct 2018 07:05:57 -0700 (PDT)
To: control <at> debbugs.gnu.org
From: Assaf Gordon <assafgordon@HIDDEN>
Message-ID: <b1e0e40e-d722-270a-27d8-972ceaf1ec2b@HIDDEN>
Date: Mon, 15 Oct 2018 08:05:56 -0600
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Score: 2.0 (++)
X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 the administrator of that system for details.
 Content preview: severity 9365 wishlist retitle 9365 multibyte: tr: TR
 operates
 on bytes, not characters retitle 9446 cp: acl preservation problem on FreeBSD
 8.1 severity 9472 wishlist [...] 
 Content analysis details:   (2.0 points, 10.0 required)
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
 (assafgordon[at]gmail.com)
 -0.0 SPF_PASS               SPF: sender matches SPF record
 0.0 RCVD_IN_MSPIKE_H3      RBL: Good reputation (+3)
 [209.85.210.181 listed in wl.mailspike.net]
 -0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at http://www.dnswl.org/, no
 trust [209.85.210.181 listed in list.dnswl.org]
 1.8 MISSING_SUBJECT        Missing Subject: header
 0.2 NO_SUBJECT             Extra score for no subject
 0.0 RCVD_IN_MSPIKE_WL      Mailspike good senders
X-Debbugs-Envelope-To: control
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 1.0 (+)

severity 9365 wishlist
retitle 9365 multibyte: tr: TR operates on bytes, not characters


retitle 9446 cp: acl preservation problem on FreeBSD 8.1

severity 9472 wishlist






Message received at control <at> debbugs.gnu.org:


Received: (at control) by debbugs.gnu.org; 15 Oct 2018 14:06:07 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Oct 15 10:06:07 2018
Received: from localhost ([127.0.0.1]:50838 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1gC3VX-0006u9-Ec
	for submit <at> debbugs.gnu.org; Mon, 15 Oct 2018 10:06:07 -0400
Received: from mail-pf1-f181.google.com ([209.85.210.181]:45055)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <assafgordon@HIDDEN>) id 1gC3VV-0006tg-UX
 for control <at> debbugs.gnu.org; Mon, 15 Oct 2018 10:06:06 -0400
Received: by mail-pf1-f181.google.com with SMTP id r9-v6so9729297pff.11
 for <control <at> debbugs.gnu.org>; Mon, 15 Oct 2018 07:06:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=to:from:message-id:date:user-agent:mime-version:content-language
 :content-transfer-encoding;
 bh=tslEaETexaL5Plma/unGTsX6EP4DKD4xeOJEPvCMxto=;
 b=IgYFBuphZeO6ufK2R0ThohvlgQjuhoNFWS1LlcCSo1fBUKysCs/zlabxiMQtbAEcv9
 5lGZ2r6pLOmV6xqrPS+qDt2TeNf9PdqM87PuEWn/4x620EkidrGhTUDmK8QpzgVQzVAQ
 EbQlLs3N5jp0ConLqnjfIhq7aDCn8Xl50ynhkCdy0fANLzrY2EvRmD3m7uMjb5jb+xqH
 xS/hKhgbCyjDaI3jvxizDo6EABMqsRi+3b0sr7dxdtRTv6KUKh7gQ5PU0nHQE8YeQONT
 P9HPFS0vFx2QnpJxGkZmXHf+3XGL8HKXvVuDTlSNkqF/qZTLVqCRMbMCWIvHt6+ZEs5y
 n3hw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:to:from:message-id:date:user-agent:mime-version
 :content-language:content-transfer-encoding;
 bh=tslEaETexaL5Plma/unGTsX6EP4DKD4xeOJEPvCMxto=;
 b=BN3r7OXN8tmVFFClh/JaAHUbBcnAdSXiJ67IUn+im2fR1JlCkyNstmy7CewveTl3rI
 vnSXQ19OdA6Qp1kJFBuaGcoNBRzwMYw41tD8vZGNs6p+KVkRqseq9FvSPMt0jSjTDPMX
 mTl7/OUUicxSa42ymRGVWKwy+eldyLKFF2WgL6mhMSZZXaqgQHErZbsyshfA9Tm1AP9n
 m5S7nzWYoRABBjQeUfpMZUOTVocTfC2NLLWQ7t3voQlPJH4k+FeCZC+MqQN/p+8OPxzU
 kFK3LSse0N4xr5hs4MNgeUfpxgSUlJWXlkBAjD74LRDzK9UzaXZnfGEpez1iMWr9MNLu
 oS0A==
X-Gm-Message-State: ABuFfogF5FahDcLajf6++HIxsfi3VRUE8JU/09QQUb42o/IqgSydVRRq
 3Vo6uKUbR8M+YV87LQ8Js2Dg6zzjDqI=
X-Google-Smtp-Source: ACcGV630bCAJkl8c383Nvxde4mlHBbauHW61q0zd37Y2dEOjIhXh0u6sXrtKgN3rwQfCYT0iTqNkyg==
X-Received: by 2002:a62:67c3:: with SMTP id
 t64-v6mr10225425pfj.76.1539612359472; 
 Mon, 15 Oct 2018 07:05:59 -0700 (PDT)
Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38])
 by smtp.googlemail.com with ESMTPSA id
 h77-v6sm21227916pfh.13.2018.10.15.07.05.57
 for <control <at> debbugs.gnu.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 15 Oct 2018 07:05:57 -0700 (PDT)
To: control <at> debbugs.gnu.org
From: Assaf Gordon <assafgordon@HIDDEN>
Message-ID: <b1e0e40e-d722-270a-27d8-972ceaf1ec2b@HIDDEN>
Date: Mon, 15 Oct 2018 08:05:56 -0600
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Spam-Score: 2.0 (++)
X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 the administrator of that system for details.
 Content preview: severity 9365 wishlist retitle 9365 multibyte: tr: TR
 operates
 on bytes, not characters retitle 9446 cp: acl preservation problem on FreeBSD
 8.1 severity 9472 wishlist [...] 
 Content analysis details:   (2.0 points, 10.0 required)
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
 (assafgordon[at]gmail.com)
 -0.0 SPF_PASS               SPF: sender matches SPF record
 0.0 RCVD_IN_MSPIKE_H3      RBL: Good reputation (+3)
 [209.85.210.181 listed in wl.mailspike.net]
 -0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at http://www.dnswl.org/, no
 trust [209.85.210.181 listed in list.dnswl.org]
 1.8 MISSING_SUBJECT        Missing Subject: header
 0.2 NO_SUBJECT             Extra score for no subject
 0.0 RCVD_IN_MSPIKE_WL      Mailspike good senders
X-Debbugs-Envelope-To: control
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 1.0 (+)

severity 9365 wishlist
retitle 9365 multibyte: tr: TR operates on bytes, not characters


retitle 9446 cp: acl preservation problem on FreeBSD 8.1

severity 9472 wishlist







Last modified: Mon, 15 Oct 2018 14:15:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.