GNU bug report logs - #35636
bug report sort command

Previous Next

Package: coreutils;

Reported by: Michele Liberi <mliberi <at> gmail.com>

Date: Wed, 8 May 2019 14:29:02 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 35636 in the body.
You can then email your comments to 35636 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#35636; Package coreutils. (Wed, 08 May 2019 14:29:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Michele Liberi <mliberi <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 08 May 2019 14:29:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Michele Liberi <mliberi <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: bug report sort command
Date: Wed, 8 May 2019 10:35:01 +0200
[Message part 1 (text/plain, inline)]
I verified the following bug is there in:

   - sort (GNU coreutils) 8.21
   - sort (GNU coreutils) 8.22
   - sort (GNU coreutils) 8.23

*Input file:*
# cat sort.in
1|a|x
2|b|x
3|aa|x
4|bb|x
5|c|x


*shell command and output:*
# sort -t'|' -k2 <sort.in
3|aa|x
1|a|x
4|bb|x
2|b|x
5|c|x

*I expected that key "a" to come before key "aa" and key "b" to come before
key "bb".*
[Message part 2 (text/html, inline)]

Added tag(s) notabug. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Wed, 08 May 2019 14:43:02 GMT) Full text and rfc822 format available.

Reply sent to Eric Blake <eblake <at> redhat.com>:
You have taken responsibility. (Wed, 08 May 2019 14:43:02 GMT) Full text and rfc822 format available.

Notification sent to Michele Liberi <mliberi <at> gmail.com>:
bug acknowledged by developer. (Wed, 08 May 2019 14:43:03 GMT) Full text and rfc822 format available.

Message #12 received at 35636-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Michele Liberi <mliberi <at> gmail.com>, 35636-done <at> debbugs.gnu.org
Subject: Re: bug#35636: bug report sort command
Date: Wed, 8 May 2019 09:41:58 -0500
[Message part 1 (text/plain, inline)]
tag 35636 notabug
thanks

On 5/8/19 3:35 AM, Michele Liberi wrote:
> I verified the following bug is there in:
> 
>    - sort (GNU coreutils) 8.21
>    - sort (GNU coreutils) 8.22
>    - sort (GNU coreutils) 8.23
> 
> *Input file:*
> # cat sort.in
> 1|a|x
> 2|b|x
> 3|aa|x
> 4|bb|x
> 5|c|x
> 
> 
> *shell command and output:*
> # sort -t'|' -k2 <sort.in
> 3|aa|x
> 1|a|x
> 4|bb|x
> 2|b|x
> 5|c|x

Let's use --debug to see what sort really did:

$ sort --debug -t'|' -k2 <sort.in
sort: using ‘en_US.UTF-8’ sorting rules
3|aa|x
  ____
______
1|a|x
  ___
_____
4|bb|x
  ____
______
2|b|x
  ___
_____
5|c|x
  ___
_____


Since you did not specify an ending field, you are comparing the string
"aa|x" with "a|x", and the string "a|x" with "bb|x"; in the en_US.UTF-8
locale, punctuation is ignored on the first-order pass through
strcoll(), which means you are effectively comparing "aax" with "ax"
with "bbx", and the sort is correct; but even in a locale that does not
ignore punctuation:

$ LC_ALL=C sort --debug -t'|' -k2 <sort.in
sort: using simple byte comparison
3|aa|x
  ____
______
1|a|x
  ___
_____
4|bb|x
  ____
______
2|b|x
  ___
_____
5|c|x
  ___
_____

the sort is still correct, since ASCII '|' sorts after ASCII 'a'. Your
real problem is that you are sorting on too much data; you need to try
again with the key limited to exactly the second field:

$ sort --debug -t'|' -k2,2 <sort.in
sort: using ‘en_US.UTF-8’ sorting rules
1|a|x
  _
_____
3|aa|x
  __
______
2|b|x
  _
_____
4|bb|x
  __
______
5|c|x
  _
_____

where now sort can see that "a" is a prefix of "aa" because it is no
longer bleeding on to the rest of the line.


> 
> *I expected that key "a" to come before key "aa" and key "b" to come before
> key "bb".*

Your expectations are at odds with your incomplete command line.  sort
is behaving as required; therefore, I'm closing this as not a bug. But
feel free to reply if you have further questions.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[signature.asc (application/pgp-signature, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 06 Jun 2019 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 318 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.