GNU bug report logs -
#73194
ls command converts utf-8 character into escape sequences
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 73194 in the body.
You can then email your comments to 73194 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#73194
; Package
coreutils
.
(Thu, 12 Sep 2024 10:18:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Simon Wolfe <sekaihenodoa <at> mutsuba.info>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Thu, 12 Sep 2024 10:18:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
I have one file name that uses Unicode character U+318DF, which is in the tertiary pane, more precisely CJK Unified Ideographs Extension H.
touch 𱣟
ls
returns:
''$'\360\261\243\237'
Extension H was introduced in Unicode 15.0 in 2022.
I also notice that this bug occurs with any character with Extension I (introduced in 2023).
Extension G seems to works okay.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#73194
; Package
coreutils
.
(Thu, 12 Sep 2024 10:37:01 GMT)
Full text and
rfc822 format available.
Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):
Am 12.09.2024 um 12:16 schrieb Simon Wolfe:
> I have one file name that uses Unicode character U+318DF, which is in
> the tertiary pane, more precisely CJK Unified Ideographs Extension H.
>
> touch 𱣟
> ls
>
> returns:
>
> ''$'\360\261\243\237'
I use a wrapper with my favourite options and a pipe to stop ls from
being witty about the terminal:
ls | cat
>
> Extension H was introduced in Unicode 15.0 in 2022.
>
> I also notice that this bug occurs with any character with Extension I
> (introduced in 2023).
>
> Extension G seems to works okay.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#73194
; Package
coreutils
.
(Thu, 12 Sep 2024 10:44:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 73194 <at> debbugs.gnu.org (full text, mbox):
On 12/09/2024 11:16, Simon Wolfe wrote:
> I have one file name that uses Unicode character U+318DF, which is in the tertiary pane, more precisely CJK Unified Ideographs Extension H.
>
> touch 𱣟
> ls
>
> returns:
>
> ''$'\360\261\243\237'
>
> Extension H was introduced in Unicode 15.0 in 2022.
>
> I also notice that this bug occurs with any character with Extension I (introduced in 2023).
>
> Extension G seems to works okay.
ls 9.4 works as expected for me with glibc-2.39 in a UTF-8 locale.
I.e. that file is displayed directly.
Now if I set the locale to non UTF-8 it will display the form above
(which works on all locales BTW).
$ touch ''$'\360\261\243\237'
$ ls ''$'\360\261\243\237'
𱣟
$ LC_ALL=C ls ''$'\360\261\243\237'
''$'\360\261\243\237'
So I suspect your system libs are not updated to recognize this character,
hence the fallback format is used.
cheers,
Pádraig.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#73194
; Package
coreutils
.
(Thu, 12 Sep 2024 13:15:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 73194 <at> debbugs.gnu.org (full text, mbox):
On 2024/09/12 19:42, Pádraig Brady wrote:
> On 12/09/2024 11:16, Simon Wolfe wrote:
>> I have one file name that uses Unicode character U+318DF, which is in the tertiary pane, more precisely CJK Unified Ideographs Extension H.
>>
>> touch 𱣟
>> ls
>>
>> returns:
>>
>> ''$'\360\261\243\237'
>>
>> Extension H was introduced in Unicode 15.0 in 2022.
>>
>> I also notice that this bug occurs with any character with Extension I (introduced in 2023).
>>
>> Extension G seems to works okay.
>
> ls 9.4 works as expected for me with glibc-2.39 in a UTF-8 locale.
> I.e. that file is displayed directly.
> Now if I set the locale to non UTF-8 it will display the form above
> (which works on all locales BTW).
>
> $ touch ''$'\360\261\243\237'
> $ ls ''$'\360\261\243\237'
> 𱣟
> $ LC_ALL=C ls ''$'\360\261\243\237'
> ''$'\360\261\243\237'
>
> So I suspect your system libs are not updated to recognize this character,
> hence the fallback format is used.
>
> cheers,
> Pádraig.
>
I am on UTF-8 locale (ja_JP.utf8), though with glibc-2.35. I am not sure I can upgrade without breaking dependencies.
Thanks for checking, anyway.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#73194
; Package
coreutils
.
(Fri, 13 Sep 2024 00:46:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 73194 <at> debbugs.gnu.org (full text, mbox):
How does ls version 9.4 do with code points not yet used ?
I'm asking because it seems it takes 2 years for changes to make it to distros; it might be a good idea to code things ahead...
Like if you use U+40500 ( ) and type
touch ''$'\361\200\224\200'
ls ''$'\361\200\224\200'
will it show or ''$'\361\200\224\200' ?
Added tag(s) notabug.
Request was from
Paul Eggert <eggert <at> cs.ucla.edu>
to
control <at> debbugs.gnu.org
.
(Sun, 16 Feb 2025 06:59:03 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
73194 <at> debbugs.gnu.org and Simon Wolfe <sekaihenodoa <at> mutsuba.info>
Request was from
Paul Eggert <eggert <at> cs.ucla.edu>
to
control <at> debbugs.gnu.org
.
(Sun, 16 Feb 2025 06:59:03 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 16 Mar 2025 11:24:32 GMT)
Full text and
rfc822 format available.
This bug report was last modified 116 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.