GNU bug report logs - #20114
multibyte: tr: support multibyte chars in first argument

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: coreutils; Severity: wishlist; Reported by: Bruno Haible <bruno@HIDDEN>; dated Mon, 16 Mar 2015 02:31:02 UTC; Maintainer for coreutils is bug-coreutils@HIDDEN.
Changed bug title to 'multibyte: tr: support multibyte chars in first argument' from 'tr does not support multibyte characters in the first argument' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.
Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 20114 <at> debbugs.gnu.org:


Received: (at 20114) by debbugs.gnu.org; 17 Mar 2015 15:13:21 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Mar 17 11:13:21 2015
Received: from localhost ([127.0.0.1]:50206 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1YXtBJ-0006yc-Ae
	for submit <at> debbugs.gnu.org; Tue, 17 Mar 2015 11:13:21 -0400
Received: from mail-ie0-f182.google.com ([209.85.223.182]:36427)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <meyering@HIDDEN>) id 1YXtBH-0006yV-SV
 for 20114 <at> debbugs.gnu.org; Tue, 17 Mar 2015 11:13:20 -0400
Received: by iegc3 with SMTP id c3so13232932ieg.3
 for <20114 <at> debbugs.gnu.org>; Tue, 17 Mar 2015 08:13:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:from:date:message-id
 :subject:to:cc:content-type:content-transfer-encoding;
 bh=TRdQZAesMeIK1kYtOKOmmjZMJV7wWPdFwO2baHUezsg=;
 b=pKDYfdg5lf1Nxcz4gG/+Dk2U0XJDKTQc0/k+HWhON17tLKBO6xhVtpHSnbIkMXgghx
 26bcMisKTgUfFzxuuFivNhrDfVcF7QRrOIE9RwXRcmdwO0tLFSqnmvNxvCklrbR0tt7Z
 ysrea/0RR0AFFCyKEeqH0BlXuMhq71rMBFwVcyLE0cpDrnoyrlRybtwn4FqJt+d05XGi
 E9NX6Y3YtIJaoc5Dbw9x/8EXZfsKVAMKRNNAXg32hygvFzbTGIzg2PN4gvua7p2JiCTz
 LixU3Sn8mXpeA/d8A+8oyt+mrBb/DE/OzFKm/VQQ1638UJim+HvUVcwP67NG0uswX/HI
 qYNg==
X-Received: by 10.50.43.130 with SMTP id w2mr119766776igl.30.1426605199417;
 Tue, 17 Mar 2015 08:13:19 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.64.171.101 with HTTP; Tue, 17 Mar 2015 08:12:55 -0700 (PDT)
In-Reply-To: <5506C94B.5090607@HIDDEN>
References: <2777312.3PiQp2ULlP@HIDDEN>
 <5506C94B.5090607@HIDDEN>
From: Jim Meyering <jim@HIDDEN>
Date: Tue, 17 Mar 2015 08:12:55 -0700
X-Google-Sender-Auth: w2yblawbQw4NykbqmVr1jCA5dSM
Message-ID: <CA+8g5KEGxMfqFUjRtPzBHob087dzZz4H32OK1n6iuvd7M=N-LQ@HIDDEN>
Subject: Re: bug#20114: tr does not support multibyte characters in the first
 argument
To: =?ISO-8859-1?Q?P=E1draig_Brady?= <P@HIDDEN>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -0.7 (/)
X-Debbugs-Envelope-To: 20114
Cc: Bjoern Jacke <bjoern@HIDDEN>, Assaf Gordon <assafgordon@HIDDEN>,
 Ondrej Oprala <ooprala@HIDDEN>, 20114 <at> debbugs.gnu.org,
 Daiki Ueno <ueno@HIDDEN>, Bruno Haible <bruno@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.7 (/)

On Mon, Mar 16, 2015 at 5:15 AM, P=E1draig Brady <P@HIDDEN> wrote:
...
> Yes you're right Bruno.
> Multi-byte support in coreutils in general has languished,
> but we hope to start improving support in the next major release (9?)
> after the current imminent 8.24 stable release.
>
> To that end I've put together a plan:
> http://www.pixelbeat.org/docs/coreutils_i18n/

Very nice plan!




Information forwarded to bug-coreutils@HIDDEN:
bug#20114; Package coreutils. Full text available.

Message received at 20114 <at> debbugs.gnu.org:


Received: (at 20114) by debbugs.gnu.org; 16 Mar 2015 12:15:13 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Mar 16 08:15:13 2015
Received: from localhost ([127.0.0.1]:48180 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1YXTvN-00031C-7U
	for submit <at> debbugs.gnu.org; Mon, 16 Mar 2015 08:15:13 -0400
Received: from mail1.vodafone.ie ([213.233.128.43]:62589)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <P@HIDDEN>) id 1YXTvK-000313-6d
 for 20114 <at> debbugs.gnu.org; Mon, 16 Mar 2015 08:15:11 -0400
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ag4JACvIBlVtTAQN/2dsb2JhbABbgwZSWoI+wh6FK0YBAgKBK0wBAQEBAQF9hBABBTIBRhALDQsJFg8JAwIBAgFFBgEMAQcBAQWIKgEIr1GVIAEBAQEBBQEBAQEBAQEBGosXhHEHhC0FlCWHT4U7jQUjg24+MQGCQgEBAQ
Received: from unknown (HELO localhost.localdomain) ([109.76.4.13])
 by mail1.vodafone.ie with ESMTP; 16 Mar 2015 12:15:08 +0000
Message-ID: <5506C94B.5090607@HIDDEN>
Date: Mon, 16 Mar 2015 12:15:07 +0000
From: =?windows-1252?Q?P=E1draig_Brady?= <P@HIDDEN>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
To: Bruno Haible <bruno@HIDDEN>, 20114 <at> debbugs.gnu.org
Subject: Re: bug#20114: tr does not support multibyte characters in the first
 argument
References: <2777312.3PiQp2ULlP@HIDDEN>
In-Reply-To: <2777312.3PiQp2ULlP@HIDDEN>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 20114
Cc: Bjoern Jacke <bjoern@HIDDEN>, Daiki Ueno <ueno@HIDDEN>,
 Assaf Gordon <assafgordon@HIDDEN>, Ondrej Oprala <ooprala@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)

On 16/03/15 02:30, Bruno Haible wrote:
> POSIX [1] specifies that the recognition of characters in 'tr' depends on
> the environment variables LANG, etc.
> 
> But trying to replace a multibyte character by another character does not
> work:
> 
> $ echo $LANG
> de_DE.UTF-8
> $ enspace=`printf '\u2002'`
> $ echo -n "X${enspace}Y" | tr "${enspace}" ' ' | od -t x1
> 0000000 58 20 20 20 59
> 0000005
> 
> Expected output would be:
> $ echo -n "X${enspace}Y" | tr "${enspace}" ' ' | od -t x1
> 0000000 58 20 59
> 0000003
> 
> With 'sed' it works:
> 
> $ echo -n "X${enspace}Y" | sed -e "s/${enspace}/ /g" | od -t x1
> 0000000 58 20 59
> 0000003
> 
> Bruno
> 
> [1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/tr.html

Yes you're right Bruno.
Multi-byte support in coreutils in general has languished,
but we hope to start improving support in the next major release (9?)
after the current imminent 8.24 stable release.

To that end I've put together a plan:
http://www.pixelbeat.org/docs/coreutils_i18n/

cheers,
Pádraig.




Information forwarded to bug-coreutils@HIDDEN:
bug#20114; Package coreutils. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 16 Mar 2015 02:30:45 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Mar 15 22:30:45 2015
Received: from localhost ([127.0.0.1]:47819 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1YXKni-0003v2-IU
	for submit <at> debbugs.gnu.org; Sun, 15 Mar 2015 22:30:44 -0400
Received: from eggs.gnu.org ([208.118.235.92]:47847)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <bruno@HIDDEN>) id 1YXKnf-0003ur-QE
 for submit <at> debbugs.gnu.org; Sun, 15 Mar 2015 22:30:40 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <bruno@HIDDEN>) id 1YXKne-00057c-Bd
 for submit <at> debbugs.gnu.org; Sun, 15 Mar 2015 22:30:39 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,T_DKIM_INVALID
 autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:37648)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <bruno@HIDDEN>) id 1YXKne-00057Y-64
 for submit <at> debbugs.gnu.org; Sun, 15 Mar 2015 22:30:38 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:49561)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <bruno@HIDDEN>) id 1YXKnc-0006xo-MF
 for bug-coreutils@HIDDEN; Sun, 15 Mar 2015 22:30:38 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <bruno@HIDDEN>) id 1YXKnZ-00057B-Da
 for bug-coreutils@HIDDEN; Sun, 15 Mar 2015 22:30:36 -0400
Received: from mo6-p00-ob.smtp.rzone.de ([2a01:238:20a:202:5300::9]:14943)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <bruno@HIDDEN>) id 1YXKnY-00056y-VU
 for bug-coreutils@HIDDEN; Sun, 15 Mar 2015 22:30:33 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1426473028; l=662;
 s=domk; d=clisp.org;
 h=Content-Type:Content-Transfer-Encoding:MIME-Version:Date:Subject:Cc:
 To:From; bh=i14EOmS7Gj7YJSZAM9pF+QCsmqpKrZ7klwujfaPtg3g=;
 b=meAQ97m2fXl842vglzH3VgCgObdv4mvkSmWnemM/mWGMcRutw19cS7sl8hQZTrgDMdF
 MFmt7cBS2iwqCBxS/IXlowVCkrXKe08zxNZ05wwrqiLBYzXzmKrIWgATBtf+RE999hwcJ
 Sr+0UhldBTvpcQOuq7x5SPJJ/I5wIHWDz/c=
X-RZG-AUTH: :Ln4Re0+Ic/6oZXR1YgKryK8brksyK8dozXDwHXjf9hj/zDNRbf84418J
X-RZG-CLASS-ID: mo00
Received: from bruno.haible.de
 (dslb-088-068-049-015.088.068.pools.vodafone-ip.de [88.68.49.15])
 by smtp.strato.de (RZmta 37.4 DYNA|AUTH)
 with ESMTPSA id R03e25r2G2UPQQ1
 (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
 (Client did not present a certificate);
 Mon, 16 Mar 2015 03:30:25 +0100 (CET)
From: Bruno Haible <bruno@HIDDEN>
To: bug-coreutils@HIDDEN
Subject: tr does not support multibyte characters in the first argument
Date: Mon, 16 Mar 2015 03:30:25 +0100
Message-ID: <2777312.3PiQp2ULlP@HIDDEN>
User-Agent: KMail/4.8.5 (Linux/3.2.0-64-generic; KDE/4.8.5; x86_64; ; )
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
 (bad octet value).
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
 (bad octet value).
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -5.0 (-----)
X-Debbugs-Envelope-To: submit
Cc: Bjoern Jacke <bjoern@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.0 (-----)

POSIX [1] specifies that the recognition of characters in 'tr' depends on
the environment variables LANG, etc.

But trying to replace a multibyte character by another character does not
work:

$ echo $LANG
de_DE.UTF-8
$ enspace=`printf '\u2002'`
$ echo -n "X${enspace}Y" | tr "${enspace}" ' ' | od -t x1
0000000 58 20 20 20 59
0000005

Expected output would be:
$ echo -n "X${enspace}Y" | tr "${enspace}" ' ' | od -t x1
0000000 58 20 59
0000003

With 'sed' it works:

$ echo -n "X${enspace}Y" | sed -e "s/${enspace}/ /g" | od -t x1
0000000 58 20 59
0000003

Bruno

[1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/tr.html





Acknowledgement sent to Bruno Haible <bruno@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-coreutils@HIDDEN. Full text available.
Report forwarded to bug-coreutils@HIDDEN:
bug#20114; Package coreutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.