GNU bug report logs -
#69488
tr (question)
Previous Next
To reply to this bug, email your comments to 69488 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#69488
; Package
coreutils
.
(Fri, 01 Mar 2024 15:35:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
lacsaP Patatetom <patatetom <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Fri, 01 Mar 2024 15:35:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
hi,
I did a few tests with tr and I'm surprised by the results...
$ echo éèçà
éèçà
these characters are encoded in utf-8 on 2 bytes :
$ echo éèçà | xxd
00000000: c3a9 c3a8 c3a7 c3a0 0a .........
now I use tr to remove non-printable characters :
$ echo éèçà | tr -cd '[:print:]'
$ echo éèçà | tr -cd '[:print:]' | wc
0 0 0
all characters are deleted by tr
now I want to keep the "é" character :
$ echo éèçà | tr -cd '[:print:]é'
��
why do the "�" characters appear ?
regards, lacsaP.
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#69488
; Package
coreutils
.
(Fri, 01 Mar 2024 19:33:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 69488 <at> debbugs.gnu.org (full text, mbox):
On 01/03/2024 15:33, lacsaP Patatetom wrote:
> hi,
>
> I did a few tests with tr and I'm surprised by the results...
>
> $ echo éèçà
> éèçà
>
> these characters are encoded in utf-8 on 2 bytes :
>
> $ echo éèçà | xxd
> 00000000: c3a9 c3a8 c3a7 c3a0 0a .........
>
> now I use tr to remove non-printable characters :
>
> $ echo éèçà | tr -cd '[:print:]'
> $ echo éèçà | tr -cd '[:print:]' | wc
> 0 0 0
>
> all characters are deleted by tr
> now I want to keep the "é" character :
>
> $ echo éèçà | tr -cd '[:print:]é'
> ��
>
> why do the "�" characters appear ?
>
> regards, lacsaP.
It's a known issue that tr is currently non multi-byte aware.
thanks,
Pádraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#69488
; Package
coreutils
.
(Mon, 04 Mar 2024 08:28:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 69488 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Le ven. 1 mars 2024 à 20:30, Pádraig Brady <P <at> draigbrady.com> a écrit :
> On 01/03/2024 15:33, lacsaP Patatetom wrote:
> > hi,
> >
> > I did a few tests with tr and I'm surprised by the results...
> >
> > $ echo éèçà
> > éèçà
> >
> > these characters are encoded in utf-8 on 2 bytes :
> >
> > $ echo éèçà | xxd
> > 00000000: c3a9 c3a8 c3a7 c3a0 0a .........
> >
> > now I use tr to remove non-printable characters :
> >
> > $ echo éèçà | tr -cd '[:print:]'
> > $ echo éèçà | tr -cd '[:print:]' | wc
> > 0 0 0
> >
> > all characters are deleted by tr
> > now I want to keep the "é" character :
> >
> > $ echo éèçà | tr -cd '[:print:]é'
> > ��
> >
> > why do the "�" characters appear ?
> >
> > regards, lacsaP.
>
>
> It's a known issue that tr is currently non multi-byte aware.
>
> thanks,
> Pádraig
>
hi,
thank you for this clarification.
what alternative to `tr` would you recommend for this type of treatment ?
regards, lacsaP.
[Message part 2 (text/html, inline)]
This bug report was last modified 246 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.