GNU bug report logs - #69488
tr (question)

Previous Next

Package: coreutils;

Reported by: lacsaP Patatetom <patatetom <at> gmail.com>

Date: Fri, 1 Mar 2024 15:35:02 UTC

Severity: normal

To reply to this bug, email your comments to 69488 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#69488; Package coreutils. (Fri, 01 Mar 2024 15:35:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to lacsaP Patatetom <patatetom <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Fri, 01 Mar 2024 15:35:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: lacsaP Patatetom <patatetom <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: tr (question)
Date: Fri, 1 Mar 2024 16:33:29 +0100
[Message part 1 (text/plain, inline)]
hi,

I did a few tests with tr and I'm surprised by the results...

$ echo éèçà
éèçà

these characters are encoded in utf-8 on 2 bytes :

$ echo éèçà | xxd
00000000: c3a9 c3a8 c3a7 c3a0 0a                   .........

now I use tr to remove non-printable characters :

$ echo éèçà | tr -cd '[:print:]'
$ echo éèçà | tr -cd '[:print:]' | wc
      0       0       0

all characters are deleted by tr
now I want to keep the "é" character :

$ echo éèçà | tr -cd '[:print:]é'
��

why do the "�" characters appear ?

regards, lacsaP.
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#69488; Package coreutils. (Fri, 01 Mar 2024 19:33:02 GMT) Full text and rfc822 format available.

Message #8 received at 69488 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: lacsaP Patatetom <patatetom <at> gmail.com>, 69488 <at> debbugs.gnu.org
Subject: Re: bug#69488: tr (question)
Date: Fri, 1 Mar 2024 19:30:33 +0000
On 01/03/2024 15:33, lacsaP Patatetom wrote:
> hi,
> 
> I did a few tests with tr and I'm surprised by the results...
> 
> $ echo éèçà
> éèçà
> 
> these characters are encoded in utf-8 on 2 bytes :
> 
> $ echo éèçà | xxd
> 00000000: c3a9 c3a8 c3a7 c3a0 0a                   .........
> 
> now I use tr to remove non-printable characters :
> 
> $ echo éèçà | tr -cd '[:print:]'
> $ echo éèçà | tr -cd '[:print:]' | wc
>        0       0       0
> 
> all characters are deleted by tr
> now I want to keep the "é" character :
> 
> $ echo éèçà | tr -cd '[:print:]é'
> ��
> 
> why do the "�" characters appear ?
> 
> regards, lacsaP.


It's a known issue that tr is currently non multi-byte aware.

thanks,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#69488; Package coreutils. (Mon, 04 Mar 2024 08:28:01 GMT) Full text and rfc822 format available.

Message #11 received at 69488 <at> debbugs.gnu.org (full text, mbox):

From: lacsaP Patatetom <patatetom <at> gmail.com>
To: Pádraig Brady <P <at> draigbrady.com>
Cc: 69488 <at> debbugs.gnu.org
Subject: Re: bug#69488: tr (question)
Date: Mon, 4 Mar 2024 09:25:24 +0100
[Message part 1 (text/plain, inline)]
Le ven. 1 mars 2024 à 20:30, Pádraig Brady <P <at> draigbrady.com> a écrit :

> On 01/03/2024 15:33, lacsaP Patatetom wrote:
> > hi,
> >
> > I did a few tests with tr and I'm surprised by the results...
> >
> > $ echo éèçà
> > éèçà
> >
> > these characters are encoded in utf-8 on 2 bytes :
> >
> > $ echo éèçà | xxd
> > 00000000: c3a9 c3a8 c3a7 c3a0 0a                   .........
> >
> > now I use tr to remove non-printable characters :
> >
> > $ echo éèçà | tr -cd '[:print:]'
> > $ echo éèçà | tr -cd '[:print:]' | wc
> >        0       0       0
> >
> > all characters are deleted by tr
> > now I want to keep the "é" character :
> >
> > $ echo éèçà | tr -cd '[:print:]é'
> > ��
> >
> > why do the "�" characters appear ?
> >
> > regards, lacsaP.
>
>
> It's a known issue that tr is currently non multi-byte aware.
>
> thanks,
> Pádraig
>
hi,

thank you for this clarification.

what alternative to `tr` would you recommend for this type of treatment ?

regards, lacsaP.
[Message part 2 (text/html, inline)]

This bug report was last modified 264 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.