GNU bug report logs - #20114
multibyte: tr: support multibyte chars in first argument

Previous Next

Package: coreutils;

Reported by: Bruno Haible <bruno <at> clisp.org>

Date: Mon, 16 Mar 2015 02:31:02 UTC

Severity: wishlist

To reply to this bug, email your comments to 20114 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#20114; Package coreutils. (Mon, 16 Mar 2015 02:31:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bruno Haible <bruno <at> clisp.org>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Mon, 16 Mar 2015 02:31:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Bruno Haible <bruno <at> clisp.org>
To: bug-coreutils <at> gnu.org
Cc: Bjoern Jacke <bjoern <at> j3e.de>
Subject: tr does not support multibyte characters in the first argument
Date: Mon, 16 Mar 2015 03:30:25 +0100
POSIX [1] specifies that the recognition of characters in 'tr' depends on
the environment variables LANG, etc.

But trying to replace a multibyte character by another character does not
work:

$ echo $LANG
de_DE.UTF-8
$ enspace=`printf '\u2002'`
$ echo -n "X${enspace}Y" | tr "${enspace}" ' ' | od -t x1
0000000 58 20 20 20 59
0000005

Expected output would be:
$ echo -n "X${enspace}Y" | tr "${enspace}" ' ' | od -t x1
0000000 58 20 59
0000003

With 'sed' it works:

$ echo -n "X${enspace}Y" | sed -e "s/${enspace}/ /g" | od -t x1
0000000 58 20 59
0000003

Bruno

[1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/tr.html





Information forwarded to bug-coreutils <at> gnu.org:
bug#20114; Package coreutils. (Mon, 16 Mar 2015 12:16:02 GMT) Full text and rfc822 format available.

Message #8 received at 20114 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Bruno Haible <bruno <at> clisp.org>, 20114 <at> debbugs.gnu.org
Cc: Bjoern Jacke <bjoern <at> j3e.de>, Daiki Ueno <ueno <at> gnu.org>,
 Assaf Gordon <assafgordon <at> gmail.com>, Ondrej Oprala <ooprala <at> redhat.com>
Subject: Re: bug#20114: tr does not support multibyte characters in the first
 argument
Date: Mon, 16 Mar 2015 12:15:07 +0000
On 16/03/15 02:30, Bruno Haible wrote:
> POSIX [1] specifies that the recognition of characters in 'tr' depends on
> the environment variables LANG, etc.
> 
> But trying to replace a multibyte character by another character does not
> work:
> 
> $ echo $LANG
> de_DE.UTF-8
> $ enspace=`printf '\u2002'`
> $ echo -n "X${enspace}Y" | tr "${enspace}" ' ' | od -t x1
> 0000000 58 20 20 20 59
> 0000005
> 
> Expected output would be:
> $ echo -n "X${enspace}Y" | tr "${enspace}" ' ' | od -t x1
> 0000000 58 20 59
> 0000003
> 
> With 'sed' it works:
> 
> $ echo -n "X${enspace}Y" | sed -e "s/${enspace}/ /g" | od -t x1
> 0000000 58 20 59
> 0000003
> 
> Bruno
> 
> [1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/tr.html

Yes you're right Bruno.
Multi-byte support in coreutils in general has languished,
but we hope to start improving support in the next major release (9?)
after the current imminent 8.24 stable release.

To that end I've put together a plan:
http://www.pixelbeat.org/docs/coreutils_i18n/

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#20114; Package coreutils. (Tue, 17 Mar 2015 15:14:02 GMT) Full text and rfc822 format available.

Message #11 received at 20114 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigbrady.com>
Cc: Bjoern Jacke <bjoern <at> j3e.de>, Assaf Gordon <assafgordon <at> gmail.com>,
 Ondrej Oprala <ooprala <at> redhat.com>, 20114 <at> debbugs.gnu.org,
 Daiki Ueno <ueno <at> gnu.org>, Bruno Haible <bruno <at> clisp.org>
Subject: Re: bug#20114: tr does not support multibyte characters in the first
 argument
Date: Tue, 17 Mar 2015 08:12:55 -0700
On Mon, Mar 16, 2015 at 5:15 AM, Pádraig Brady <P <at> draigbrady.com> wrote:
...
> Yes you're right Bruno.
> Multi-byte support in coreutils in general has languished,
> but we hope to start improving support in the next major release (9?)
> after the current imminent 8.24 stable release.
>
> To that end I've put together a plan:
> http://www.pixelbeat.org/docs/coreutils_i18n/

Very nice plan!




Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 23 Oct 2018 01:59:01 GMT) Full text and rfc822 format available.

Changed bug title to 'multibyte: tr: support multibyte chars in first argument' from 'tr does not support multibyte characters in the first argument' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 23 Oct 2018 01:59:01 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 159 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.