Assaf Gordon <assafgordon@HIDDEN>
to control <at> debbugs.gnu.org.
Full text available.Assaf Gordon <assafgordon@HIDDEN>
to control <at> debbugs.gnu.org.
Full text available.Received: (at 20114) by debbugs.gnu.org; 17 Mar 2015 15:13:21 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Tue Mar 17 11:13:21 2015 Received: from localhost ([127.0.0.1]:50206 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1YXtBJ-0006yc-Ae for submit <at> debbugs.gnu.org; Tue, 17 Mar 2015 11:13:21 -0400 Received: from mail-ie0-f182.google.com ([209.85.223.182]:36427) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <meyering@HIDDEN>) id 1YXtBH-0006yV-SV for 20114 <at> debbugs.gnu.org; Tue, 17 Mar 2015 11:13:20 -0400 Received: by iegc3 with SMTP id c3so13232932ieg.3 for <20114 <at> debbugs.gnu.org>; Tue, 17 Mar 2015 08:13:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=TRdQZAesMeIK1kYtOKOmmjZMJV7wWPdFwO2baHUezsg=; b=pKDYfdg5lf1Nxcz4gG/+Dk2U0XJDKTQc0/k+HWhON17tLKBO6xhVtpHSnbIkMXgghx 26bcMisKTgUfFzxuuFivNhrDfVcF7QRrOIE9RwXRcmdwO0tLFSqnmvNxvCklrbR0tt7Z ysrea/0RR0AFFCyKEeqH0BlXuMhq71rMBFwVcyLE0cpDrnoyrlRybtwn4FqJt+d05XGi E9NX6Y3YtIJaoc5Dbw9x/8EXZfsKVAMKRNNAXg32hygvFzbTGIzg2PN4gvua7p2JiCTz LixU3Sn8mXpeA/d8A+8oyt+mrBb/DE/OzFKm/VQQ1638UJim+HvUVcwP67NG0uswX/HI qYNg== X-Received: by 10.50.43.130 with SMTP id w2mr119766776igl.30.1426605199417; Tue, 17 Mar 2015 08:13:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.64.171.101 with HTTP; Tue, 17 Mar 2015 08:12:55 -0700 (PDT) In-Reply-To: <5506C94B.5090607@HIDDEN> References: <2777312.3PiQp2ULlP@HIDDEN> <5506C94B.5090607@HIDDEN> From: Jim Meyering <jim@HIDDEN> Date: Tue, 17 Mar 2015 08:12:55 -0700 X-Google-Sender-Auth: w2yblawbQw4NykbqmVr1jCA5dSM Message-ID: <CA+8g5KEGxMfqFUjRtPzBHob087dzZz4H32OK1n6iuvd7M=N-LQ@HIDDEN> Subject: Re: bug#20114: tr does not support multibyte characters in the first argument To: =?ISO-8859-1?Q?P=E1draig_Brady?= <P@HIDDEN> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 20114 Cc: Bjoern Jacke <bjoern@HIDDEN>, Assaf Gordon <assafgordon@HIDDEN>, Ondrej Oprala <ooprala@HIDDEN>, 20114 <at> debbugs.gnu.org, Daiki Ueno <ueno@HIDDEN>, Bruno Haible <bruno@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.7 (/) On Mon, Mar 16, 2015 at 5:15 AM, P=E1draig Brady <P@HIDDEN> wrote: ... > Yes you're right Bruno. > Multi-byte support in coreutils in general has languished, > but we hope to start improving support in the next major release (9?) > after the current imminent 8.24 stable release. > > To that end I've put together a plan: > http://www.pixelbeat.org/docs/coreutils_i18n/ Very nice plan!
bug-coreutils@HIDDEN:bug#20114; Package coreutils.
Full text available.
Received: (at 20114) by debbugs.gnu.org; 16 Mar 2015 12:15:13 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Mar 16 08:15:13 2015
Received: from localhost ([127.0.0.1]:48180 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.80)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1YXTvN-00031C-7U
for submit <at> debbugs.gnu.org; Mon, 16 Mar 2015 08:15:13 -0400
Received: from mail1.vodafone.ie ([213.233.128.43]:62589)
by debbugs.gnu.org with esmtp (Exim 4.80)
(envelope-from <P@HIDDEN>) id 1YXTvK-000313-6d
for 20114 <at> debbugs.gnu.org; Mon, 16 Mar 2015 08:15:11 -0400
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ag4JACvIBlVtTAQN/2dsb2JhbABbgwZSWoI+wh6FK0YBAgKBK0wBAQEBAQF9hBABBTIBRhALDQsJFg8JAwIBAgFFBgEMAQcBAQWIKgEIr1GVIAEBAQEBBQEBAQEBAQEBGosXhHEHhC0FlCWHT4U7jQUjg24+MQGCQgEBAQ
Received: from unknown (HELO localhost.localdomain) ([109.76.4.13])
by mail1.vodafone.ie with ESMTP; 16 Mar 2015 12:15:08 +0000
Message-ID: <5506C94B.5090607@HIDDEN>
Date: Mon, 16 Mar 2015 12:15:07 +0000
From: =?windows-1252?Q?P=E1draig_Brady?= <P@HIDDEN>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
To: Bruno Haible <bruno@HIDDEN>, 20114 <at> debbugs.gnu.org
Subject: Re: bug#20114: tr does not support multibyte characters in the first
argument
References: <2777312.3PiQp2ULlP@HIDDEN>
In-Reply-To: <2777312.3PiQp2ULlP@HIDDEN>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 20114
Cc: Bjoern Jacke <bjoern@HIDDEN>, Daiki Ueno <ueno@HIDDEN>,
Assaf Gordon <assafgordon@HIDDEN>, Ondrej Oprala <ooprala@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.0 (/)
On 16/03/15 02:30, Bruno Haible wrote:
> POSIX [1] specifies that the recognition of characters in 'tr' depends on
> the environment variables LANG, etc.
>
> But trying to replace a multibyte character by another character does not
> work:
>
> $ echo $LANG
> de_DE.UTF-8
> $ enspace=`printf '\u2002'`
> $ echo -n "X${enspace}Y" | tr "${enspace}" ' ' | od -t x1
> 0000000 58 20 20 20 59
> 0000005
>
> Expected output would be:
> $ echo -n "X${enspace}Y" | tr "${enspace}" ' ' | od -t x1
> 0000000 58 20 59
> 0000003
>
> With 'sed' it works:
>
> $ echo -n "X${enspace}Y" | sed -e "s/${enspace}/ /g" | od -t x1
> 0000000 58 20 59
> 0000003
>
> Bruno
>
> [1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/tr.html
Yes you're right Bruno.
Multi-byte support in coreutils in general has languished,
but we hope to start improving support in the next major release (9?)
after the current imminent 8.24 stable release.
To that end I've put together a plan:
http://www.pixelbeat.org/docs/coreutils_i18n/
cheers,
Pádraig.
bug-coreutils@HIDDEN:bug#20114; Package coreutils.
Full text available.
Received: (at submit) by debbugs.gnu.org; 16 Mar 2015 02:30:45 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Mar 15 22:30:45 2015
Received: from localhost ([127.0.0.1]:47819 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.80)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1YXKni-0003v2-IU
for submit <at> debbugs.gnu.org; Sun, 15 Mar 2015 22:30:44 -0400
Received: from eggs.gnu.org ([208.118.235.92]:47847)
by debbugs.gnu.org with esmtp (Exim 4.80)
(envelope-from <bruno@HIDDEN>) id 1YXKnf-0003ur-QE
for submit <at> debbugs.gnu.org; Sun, 15 Mar 2015 22:30:40 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from <bruno@HIDDEN>) id 1YXKne-00057c-Bd
for submit <at> debbugs.gnu.org; Sun, 15 Mar 2015 22:30:39 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level:
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,T_DKIM_INVALID
autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:37648)
by eggs.gnu.org with esmtp (Exim 4.71)
(envelope-from <bruno@HIDDEN>) id 1YXKne-00057Y-64
for submit <at> debbugs.gnu.org; Sun, 15 Mar 2015 22:30:38 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:49561)
by lists.gnu.org with esmtp (Exim 4.71)
(envelope-from <bruno@HIDDEN>) id 1YXKnc-0006xo-MF
for bug-coreutils@HIDDEN; Sun, 15 Mar 2015 22:30:38 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from <bruno@HIDDEN>) id 1YXKnZ-00057B-Da
for bug-coreutils@HIDDEN; Sun, 15 Mar 2015 22:30:36 -0400
Received: from mo6-p00-ob.smtp.rzone.de ([2a01:238:20a:202:5300::9]:14943)
by eggs.gnu.org with esmtp (Exim 4.71)
(envelope-from <bruno@HIDDEN>) id 1YXKnY-00056y-VU
for bug-coreutils@HIDDEN; Sun, 15 Mar 2015 22:30:33 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1426473028; l=662;
s=domk; d=clisp.org;
h=Content-Type:Content-Transfer-Encoding:MIME-Version:Date:Subject:Cc:
To:From; bh=i14EOmS7Gj7YJSZAM9pF+QCsmqpKrZ7klwujfaPtg3g=;
b=meAQ97m2fXl842vglzH3VgCgObdv4mvkSmWnemM/mWGMcRutw19cS7sl8hQZTrgDMdF
MFmt7cBS2iwqCBxS/IXlowVCkrXKe08zxNZ05wwrqiLBYzXzmKrIWgATBtf+RE999hwcJ
Sr+0UhldBTvpcQOuq7x5SPJJ/I5wIHWDz/c=
X-RZG-AUTH: :Ln4Re0+Ic/6oZXR1YgKryK8brksyK8dozXDwHXjf9hj/zDNRbf84418J
X-RZG-CLASS-ID: mo00
Received: from bruno.haible.de
(dslb-088-068-049-015.088.068.pools.vodafone-ip.de [88.68.49.15])
by smtp.strato.de (RZmta 37.4 DYNA|AUTH)
with ESMTPSA id R03e25r2G2UPQQ1
(using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
(Client did not present a certificate);
Mon, 16 Mar 2015 03:30:25 +0100 (CET)
From: Bruno Haible <bruno@HIDDEN>
To: bug-coreutils@HIDDEN
Subject: tr does not support multibyte characters in the first argument
Date: Mon, 16 Mar 2015 03:30:25 +0100
Message-ID: <2777312.3PiQp2ULlP@HIDDEN>
User-Agent: KMail/4.8.5 (Linux/3.2.0-64-generic; KDE/4.8.5; x86_64; ; )
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
(bad octet value).
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
(bad octet value).
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -5.0 (-----)
X-Debbugs-Envelope-To: submit
Cc: Bjoern Jacke <bjoern@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.0 (-----)
POSIX [1] specifies that the recognition of characters in 'tr' depends on
the environment variables LANG, etc.
But trying to replace a multibyte character by another character does not
work:
$ echo $LANG
de_DE.UTF-8
$ enspace=`printf '\u2002'`
$ echo -n "X${enspace}Y" | tr "${enspace}" ' ' | od -t x1
0000000 58 20 20 20 59
0000005
Expected output would be:
$ echo -n "X${enspace}Y" | tr "${enspace}" ' ' | od -t x1
0000000 58 20 59
0000003
With 'sed' it works:
$ echo -n "X${enspace}Y" | sed -e "s/${enspace}/ /g" | od -t x1
0000000 58 20 59
0000003
Bruno
[1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/tr.html
Bruno Haible <bruno@HIDDEN>:bug-coreutils@HIDDEN.
Full text available.bug-coreutils@HIDDEN:bug#20114; Package coreutils.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.