GNU bug report logs -
#79631
bug in the utility command "cut"
Previous Next
To reply to this bug, email your comments to 79631 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org:
bug#79631; Package
coreutils.
(Wed, 15 Oct 2025 15:52:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Michael Cornelison <mkornelix <at> gmail.com>:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org.
(Wed, 15 Oct 2025 15:52:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
The Linux shell command: $ cut -c6- de.text > de2.text
outputs 2114 correct lines with first 5 characters removed.
From line 2115, the two characters (hex 80, hex AF) are prepended to every
output line.
The rest of each output line is correct.
I have attached the file "de.text" which triggers this bug.
I am using Ubuntu 25.04 in case that matters.
regards
Mike Cornelison
[Message part 2 (text/html, inline)]
[de.text (text/plain, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org:
bug#79631; Package
coreutils.
(Wed, 15 Oct 2025 16:54:02 GMT)
Full text and
rfc822 format available.
Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Michael Cornelison
Subject: bug#79631: bug in the utility command "cut"
Date: Wed, 15 Oct 2025 14:34:40 +0200
>The Linux shell command: $ cut -c6- de.text > de2.text
>outputs 2114 correct lines with first 5 characters removed.
>Starting with line 2115, the two characters (hex 80, hex AF) are
>prepended to every output line. The rest of each output line is
>correct.
>I have attached the file "de.text" which triggers this bug.
>I am using Ubuntu 25.04 in case that matters.
Those characters are in your input file (starting at line 2115).
Not a bug.
=================================================================================
Please do not send me replies to my posts on the list.
I always read the replies via the web archive, so CC'ing to me is unnecessary.
When responding to my posts, please try to refrain from giving bureaucratic answers.
If you have nothing useful to say, then just click Next and go on.
Information forwarded
to
bug-coreutils <at> gnu.org:
bug#79631; Package
coreutils.
(Wed, 15 Oct 2025 17:42:03 GMT)
Full text and
rfc822 format available.
Message #11 received at 79631 <at> debbugs.gnu.org (full text, mbox):
On 15/10/2025 13:34, Michael Cornelison wrote:
> The Linux shell command: $ cut -c6- de.text > de2.text
> outputs 2114 correct lines with first 5 characters removed.
> From line 2115, the two characters (hex 80, hex AF) are prepended to every
> output line.
> The rest of each output line is correct.
>
> I have attached the file "de.text" which triggers this bug.
>
> I am using Ubuntu 25.04 in case that matters.
>
> regards
> Mike Cornelison
The issue is that cut(1) does not support multi-byte characters yet,
and is treating -c like -b. This can cause cut(1) to
output a partial multi-byte character. In your case,
the following shows it starts outputting in the middle of the
UTF-8 Narrow non-breaking space character:
LC_ALL=de_DE.UTF-8 git/coreutils/src/cut -c1-10 de.text |
head -n2115 | tail -n1 | od -Ax -tx1z -v
000000 33 31 30 30 e2 80 af c3 9c 62 0a >3100.....b.<
This is already on our TODO list.
thank you,
Padraig
Information forwarded
to
bug-coreutils <at> gnu.org:
bug#79631; Package
coreutils.
(Thu, 16 Oct 2025 04:50:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 79631 <at> debbugs.gnu.org (full text, mbox):
Pádraig Brady <P <at> draigBrady.com> writes:
> The issue is that cut(1) does not support multi-byte characters yet,
> and is treating -c like -b. This can cause cut(1) to
> output a partial multi-byte character. In your case,
> the following shows it starts outputting in the middle of the
> UTF-8 Narrow non-breaking space character:
>
> LC_ALL=de_DE.UTF-8 git/coreutils/src/cut -c1-10 de.text |
> head -n2115 | tail -n1 | od -Ax -tx1z -v
> 000000 33 31 30 30 e2 80 af c3 9c 62 0a >3100.....b.<
>
> This is already on our TODO list.
I haven't thought of a decent interface for multibyte characters that
behaves like getndelim2 yet, which is needed for 'cut'. Outside of that,
it should not be too difficult.
Collin
This bug report was last modified 20 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.