GNU bug report logs - #13362
multibyte: tr: TR operates on bytes, not characters

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: coreutils; Severity: wishlist; Reported by: Urs Thuermann <urs@HIDDEN>; merged with #9365, #9569, #10880, #12192; dated Sat, 5 Jan 2013 17:28:01 UTC; Maintainer for coreutils is bug-coreutils@HIDDEN.
Changed bug title to 'multibyte: tr: TR operates on bytes, not characters' from 'tr does not work with UTF-8 locales' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.
Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 13362 <at> debbugs.gnu.org:


Received: (at 13362) by debbugs.gnu.org; 27 Jun 2014 17:06:58 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Jun 27 13:06:58 2014
Received: from localhost ([127.0.0.1]:35201 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1X0Zbx-0004fg-Fi
	for submit <at> debbugs.gnu.org; Fri, 27 Jun 2014 13:06:58 -0400
Received: from mout.gmx.com ([74.208.4.200]:56489)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <kubry@HIDDEN>) id 1X0ZWo-0004VL-2y
 for 13362 <at> debbugs.gnu.org; Fri, 27 Jun 2014 13:01:39 -0400
Received: from tp.localnet ([31.221.190.32]) by mail.gmx.com (mrgmxus001) with
 ESMTPSA (Nemesis) id 0LsTQU-1WXsac0FZo-011y6V for
 <13362 <at> debbugs.gnu.org>; Fri, 27 Jun 2014 19:01:27 +0200
From: Ganton <kubry@HIDDEN>
To: 13362 <at> debbugs.gnu.org
Subject: GNU bug report logs - #13362 tr does not work with UTF-8 locales
Date: Fri, 27 Jun 2014 19:01:14 +0200
User-Agent: KMail/1.13.7 (Linux/3.13.0-29-lowlatency; KDE/4.13.1; x86_64; ; )
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit
Message-Id: <201406271901.14362.kubry@HIDDEN>
X-Provags-ID: V03:K0:hSwShdd3h2O0KHYsv7jfc1s7kGh1+0QMoxkaVKynaSaxcLFeAJM
 R3R1ld+cZmYKw8DGxPibbWJHsp0+Q+dF7lGQHpmymdUX8icmu1fSxr4ABOBV+vnLiDstuBX
 Zyhp72PT2zm5bS2zgf6ilUypr1E1GFm+rK20ibdst6xFQPZqJmqotr9oDoHmwsmJ3sVlkfb
 +SzwFdYJprgdwPq2JQy3A==
X-Spam-Score: 1.7 (+)
X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org",
 has
 identified this incoming email as possible spam.  The original message
 has been attached to this so you can view it (if it isn't spam) or label
 similar future email.  If you have any questions, see
 the administrator of that system for details.
 Content preview: Dear sirs: This bugs is causing errors since many years ago
 (at least twelve (!)
 [https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=139861]), 
 and let's face it, if we don't change the point of view it will never get
 solved. Meanwhile, the effects of this bug will keep on damaging the works
 of Linux users, and our reputation. [...] 
 Content analysis details:   (1.7 points, 10.0 required)
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider
 (kubry[at]gmx.com)
 -0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at http://www.dnswl.org/, no
 trust [74.208.4.200 listed in list.dnswl.org]
 -0.0 T_RP_MATCHES_RCVD      Envelope sender domain matches handover relay
 domain
 -0.0 SPF_PASS               SPF: sender matches SPF record
 1.7 DEAR_SOMETHING         BODY: Contains 'Dear (something)'
X-Debbugs-Envelope-To: 13362
X-Mailman-Approved-At: Fri, 27 Jun 2014 13:06:50 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 1.7 (+)
X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has
 identified this incoming email as possible spam.  The original message
 has been attached to this so you can view it (if it isn't spam) or label
 similar future email.  If you have any questions, see
 the administrator of that system for details.
 
 Content preview:  Dear sirs: This bugs is causing errors since many years ago
    (at least twelve (!) [https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=139861]),
    and let's face it, if we don't change the point of view it will never get
    solved. Meanwhile, the effects of this bug will keep on damaging the works
    of Linux users, and our reputation. [...] 
 
 Content analysis details:   (1.7 points, 10.0 required)
 
  pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -0.0 RCVD_IN_DNSWL_NONE     RBL: Sender listed at http://www.dnswl.org/, no
                             trust
                             [74.208.4.200 listed in list.dnswl.org]
  0.0 FREEMAIL_FROM          Sender email is commonly abused enduser mail provider
                             (kubry[at]gmx.com)
 -0.0 T_RP_MATCHES_RCVD      Envelope sender domain matches handover relay
                             domain
 -0.0 SPF_PASS               SPF: sender matches SPF record
  1.7 DEAR_SOMETHING         BODY: Contains 'Dear (something)'

Dear sirs:

This bugs is causing errors since many years ago (at least twelve (!) 
[https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=139861]), and let's face 
it, if we don't change the point of view it will never get solved. Meanwhile, 
the effects of this bug will keep on damaging the works of Linux users, and our 
reputation.

sed can work with utf-8 correctly. What about asking help from sed developers? 
sed developers could even refactor the tr code so that sed code could be used, 
so at least this bug would not keep on causing errors to Linux users. 
Moreover, sed developers may find a better solution than that one.

Thank you.




Information forwarded to bug-coreutils@HIDDEN:
bug#13362; Package coreutils. Full text available.
Forcibly Merged 9365 9569 10880 12192 13362. Request was from Pádraig Brady <P@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 13362 <at> debbugs.gnu.org:


Received: (at 13362) by debbugs.gnu.org; 6 Jan 2013 12:23:25 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Jan 06 07:23:25 2013
Received: from localhost ([127.0.0.1]:47127 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1TrpG8-0005xq-7T
	for submit <at> debbugs.gnu.org; Sun, 06 Jan 2013 07:23:24 -0500
Received: from mx1.redhat.com ([209.132.183.28]:20124)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <P@HIDDEN>)
	id 1TrpG4-0005xb-KS; Sun, 06 Jan 2013 07:23:21 -0500
Received: from int-mx12.intmail.prod.int.phx2.redhat.com
	(int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r06CN0bp025753
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Sun, 6 Jan 2013 07:23:01 -0500
Received: from [10.36.116.39] (ovpn-116-39.ams2.redhat.com [10.36.116.39])
	by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP
	id r06CMvfH030424
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
	Sun, 6 Jan 2013 07:22:59 -0500
Message-ID: <50E96CA0.4030802@HIDDEN>
Date: Sun, 06 Jan 2013 12:22:56 +0000
From: =?ISO-8859-1?Q?P=E1draig_Brady?= <P@HIDDEN>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:13.0) Gecko/20120615 Thunderbird/13.0.1
MIME-Version: 1.0
To: Urs Thuermann <urs@HIDDEN>
Subject: Re: bug#13362: tr does not work with UTF-8 locales
References: <ygf38yfalsz.fsf@HIDDEN>
In-Reply-To: <ygf38yfalsz.fsf@HIDDEN>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id
	r06CN0bp025753
X-Spam-Score: -4.2 (----)
X-Debbugs-Envelope-To: 13362
Cc: 13362 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -5.0 (-----)

forcemerge 13362 9365
thanks

On 01/05/2013 11:53 AM, Urs Thuermann wrote:
> The tr utility from coreutils-8.20 does not handle multi-byte
> characters in UTF-8 correctly.  It seems the arguments and standard
> input are read byte-by-byte instead of character-by-character.

We all agree that this is an issue.
Someone just needs to get the time to implement it.

thanks,
P=E1draig.




Information forwarded to bug-coreutils@HIDDEN:
bug#13362; Package coreutils. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 5 Jan 2013 17:27:22 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Jan 05 12:27:22 2013
Received: from localhost ([127.0.0.1]:45049 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1TrXWi-00078R-Vm
	for submit <at> debbugs.gnu.org; Sat, 05 Jan 2013 12:27:22 -0500
Received: from eggs.gnu.org ([208.118.235.92]:60369)
	by debbugs.gnu.org with esmtp (Exim 4.72)
	(envelope-from <urs@HIDDEN>) id 1TrSfT-00038w-QY
	for submit <at> debbugs.gnu.org; Sat, 05 Jan 2013 07:16:04 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <urs@HIDDEN>) id 1TrSfG-0003yk-9P
	for submit <at> debbugs.gnu.org; Sat, 05 Jan 2013 07:15:51 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-101.9 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,
	USER_IN_WHITELIST autolearn=unavailable version=3.3.2
Received: from lists.gnu.org ([208.118.235.17]:43509)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <urs@HIDDEN>) id 1TrSfG-0003yg-6r
	for submit <at> debbugs.gnu.org; Sat, 05 Jan 2013 07:15:50 -0500
Received: from eggs.gnu.org ([208.118.235.92]:43494)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <urs@HIDDEN>) id 1TrSfF-0001ei-AI
	for bug-coreutils@HIDDEN; Sat, 05 Jan 2013 07:15:50 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <urs@HIDDEN>) id 1TrSfE-0003yM-0f
	for bug-coreutils@HIDDEN; Sat, 05 Jan 2013 07:15:49 -0500
Received: from oker.escape.de ([2a00:1030:1004:107::2]:58315)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <urs@HIDDEN>) id 1TrSfD-0003y5-Kw
	for bug-coreutils@HIDDEN; Sat, 05 Jan 2013 07:15:47 -0500
Received: from oker.escape.de (localhost [127.0.0.1])
	(envelope-sender: urs@HIDDEN)
	by oker.escape.de (8.14.3/8.14.3/$Revision: 1.76 $) with ESMTP id
	r05Bt3uE024607
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <bug-coreutils@HIDDEN>; Sat, 5 Jan 2013 12:55:03 +0100
Received: (from uucp@localhost)
	by oker.escape.de (8.14.3/8.14.3/Submit) with UUCP id r05Bt3Cb024593
	for bug-coreutils@HIDDEN; Sat, 5 Jan 2013 12:55:03 +0100
Received: from janus.isnogud.escape.de (localhost [127.0.0.1])
	by janus.isnogud.escape.de (8.13.8/8.13.8) with ESMTP id r05Br0sh015271
	for <bug-coreutils@HIDDEN>; Sat, 5 Jan 2013 12:53:00 +0100
Received: (from urs@localhost)
	by janus.isnogud.escape.de (8.13.8/8.13.8/Submit) id r05Br0R1015268;
	Sat, 5 Jan 2013 12:53:00 +0100
X-Authentication-Warning: janus.isnogud.escape.de: urs set sender to
	urs@HIDDEN using -f
To: bug-coreutils@HIDDEN
Subject: tr does not work with UTF-8 locales
From: Urs Thuermann <urs@HIDDEN>
Date: 05 Jan 2013 12:53:00 +0100
Message-ID: <ygf38yfalsz.fsf@HIDDEN>
Lines: 44
User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by oker.escape.de id
	r05Bt3uE024607
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
	(bad octet value).
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 208.118.235.17
X-Spam-Score: -4.2 (----)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Sat, 05 Jan 2013 12:27:20 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -5.0 (-----)

The tr utility from coreutils-8.20 does not handle multi-byte
characters in UTF-8 correctly.  It seems the arguments and standard
input are read byte-by-byte instead of character-by-character.

Here are two examples, using the following UTF-8 characters (which are
also available in latin1, since this is what my mail software still
uses):

        =E4 (c3 a4), =F6 (c3 b6), =FC(c3 bc), =BC (c2 bc), =BD (c2 bd)

1. A call to tr -d =FC does not delete that two byte sequence from the
   input but deletes any occurence of c3 or bc:

    urs@bit:~/coreutils-8.20$ locale
    LANG=3DC.UTF-8
    LANGUAGE=3D
    LC_CTYPE=3D"C.UTF-8"
    LC_NUMERIC=3D"C.UTF-8"
    LC_TIME=3D"C.UTF-8"
    LC_COLLATE=3D"C.UTF-8"
    LC_MONETARY=3D"C.UTF-8"
    LC_MESSAGES=3D"C.UTF-8"
    LC_PAPER=3D"C.UTF-8"
    LC_NAME=3D"C.UTF-8"
    LC_ADDRESS=3D"C.UTF-8"
    LC_TELEPHONE=3D"C.UTF-8"
    LC_MEASUREMENT=3D"C.UTF-8"
    LC_IDENTIFICATION=3D"C.UTF-8"
    LC_ALL=3D
    urs@bit:~/coreutils-8.20$ echo =E4=F6=FC=BC|od -tx1
    0000000 c3 a4 c3 b6 c3 bc c2 bc 0a
    0000011
    urs@bit:~/coreutils-8.20$ echo =E4=F6=FC=BC|tr -d =FC|od -tx1
    0000000 a4 b6 c2 0a
    0000004

2. Replacing the single character =FC (c3 bc) by the single character =BD
   (c2 bd) does instead replace each c3 by c2 and each bc by bd:

    urs@bit:~/coreutils-8.20$ echo =E4=F6=FC=BC|tr =FC =BD|od -tx1
    0000000 c2 a4 c2 b6 c2 bd c2 bd 0a
    0000011

urs




Acknowledgement sent to Urs Thuermann <urs@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-coreutils@HIDDEN. Full text available.
Report forwarded to bug-coreutils@HIDDEN:
bug#13362; Package coreutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.