GNU bug report logs - #79631
bug in the utility command "cut"

Previous Next

Package: coreutils;

Reported by: Michael Cornelison <mkornelix <at> gmail.com>

Date: Wed, 15 Oct 2025 15:52:01 UTC

Severity: normal

To reply to this bug, email your comments to 79631 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#79631; Package coreutils. (Wed, 15 Oct 2025 15:52:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Michael Cornelison <mkornelix <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 15 Oct 2025 15:52:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Michael Cornelison <mkornelix <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: bug in the utility command "cut"
Date: Wed, 15 Oct 2025 14:34:40 +0200
[Message part 1 (text/plain, inline)]
The Linux shell command: $ cut -c6- de.text > de2.text
outputs 2114 correct lines with first 5 characters removed.
From line 2115, the two characters (hex 80, hex AF) are prepended to every
output line.
The rest of each output line is correct.

I have attached the file "de.text" which triggers this bug.

I am using Ubuntu 25.04 in case that matters.

regards
Mike Cornelison
[Message part 2 (text/html, inline)]
[de.text (text/plain, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#79631; Package coreutils. (Wed, 15 Oct 2025 16:54:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stan Marsh <gazelle <at> xmission.com>
To: bug-coreutils <at> gnu.org
Subject: bug#79631: bug in the utility command "cut"
Date: Wed, 15 Oct 2025 10:52:41 -0600
From:	Michael Cornelison
Subject:	bug#79631: bug in the utility command "cut"
Date:	Wed, 15 Oct 2025 14:34:40 +0200

>The Linux shell command: $ cut -c6- de.text > de2.text
>outputs 2114 correct lines with first 5 characters removed.
>Starting with line 2115, the two characters (hex 80, hex AF) are
>prepended to every output line.  The rest of each output line is
>correct.

>I have attached the file "de.text" which triggers this bug.

>I am using Ubuntu 25.04 in case that matters.

Those characters are in your input file (starting at line 2115).

Not a bug.

=================================================================================
Please do not send me replies to my posts on the list.
I always read the replies via the web archive, so CC'ing to me is unnecessary.

When responding to my posts, please try to refrain from giving bureaucratic answers.
If you have nothing useful to say, then just click Next and go on.




Information forwarded to bug-coreutils <at> gnu.org:
bug#79631; Package coreutils. (Wed, 15 Oct 2025 17:42:03 GMT) Full text and rfc822 format available.

Message #11 received at 79631 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Michael Cornelison <mkornelix <at> gmail.com>, 79631 <at> debbugs.gnu.org
Subject: Re: bug#79631: bug in the utility command "cut"
Date: Wed, 15 Oct 2025 18:41:13 +0100
On 15/10/2025 13:34, Michael Cornelison wrote:
> The Linux shell command: $ cut -c6- de.text > de2.text
> outputs 2114 correct lines with first 5 characters removed.
>  From line 2115, the two characters (hex 80, hex AF) are prepended to every
> output line.
> The rest of each output line is correct.
> 
> I have attached the file "de.text" which triggers this bug.
> 
> I am using Ubuntu 25.04 in case that matters.
> 
> regards
> Mike Cornelison

The issue is that cut(1) does not support multi-byte characters yet,
and is treating -c like -b.  This can cause cut(1) to
output a partial multi-byte character. In your case,
the following shows it starts outputting in the middle of the
UTF-8 Narrow non-breaking space character:

  LC_ALL=de_DE.UTF-8 git/coreutils/src/cut -c1-10 de.text |
   head -n2115 | tail -n1 | od -Ax -tx1z -v
  000000 33 31 30 30 e2 80 af c3 9c 62 0a                 >3100.....b.<

This is already on our TODO list.

thank you,
Padraig





Information forwarded to bug-coreutils <at> gnu.org:
bug#79631; Package coreutils. (Thu, 16 Oct 2025 04:50:02 GMT) Full text and rfc822 format available.

Message #14 received at 79631 <at> debbugs.gnu.org (full text, mbox):

From: Collin Funk <collin.funk1 <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 79631 <at> debbugs.gnu.org, Michael Cornelison <mkornelix <at> gmail.com>
Subject: Re: bug#79631: bug in the utility command "cut"
Date: Wed, 15 Oct 2025 21:49:36 -0700
Pádraig Brady <P <at> draigBrady.com> writes:

> The issue is that cut(1) does not support multi-byte characters yet,
> and is treating -c like -b.  This can cause cut(1) to
> output a partial multi-byte character. In your case,
> the following shows it starts outputting in the middle of the
> UTF-8 Narrow non-breaking space character:
>
>   LC_ALL=de_DE.UTF-8 git/coreutils/src/cut -c1-10 de.text |
>    head -n2115 | tail -n1 | od -Ax -tx1z -v
>   000000 33 31 30 30 e2 80 af c3 9c 62 0a                 >3100.....b.<
>
> This is already on our TODO list.

I haven't thought of a decent interface for multibyte characters that
behaves like getndelim2 yet, which is needed for 'cut'. Outside of that,
it should not be too difficult.

Collin




This bug report was last modified 20 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.