GNU bug report logs - #37177
sort don't respect the ASCII order

Previous Next

Package: coreutils;

Reported by: Xavier Sanchez <xasa <at> 4js.com>

Date: Sat, 24 Aug 2019 23:17:01 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 37177 in the body.
You can then email your comments to 37177 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#37177; Package coreutils. (Sat, 24 Aug 2019 23:17:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Xavier Sanchez <xasa <at> 4js.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sat, 24 Aug 2019 23:17:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Xavier Sanchez <xasa <at> 4js.com>
To: bug-coreutils <at> gnu.org
Subject: sort don't respect the ASCII order
Date: Sat, 24 Aug 2019 23:15:17 +0000
Linux snake 5.2.0-2-amd64 #1 SMP Debian 5.2.9-2 (2019-08-21) x86_64 GNU/Linux
coreutils: 8.30-3+b1 on Debian 10

Hello, here's the test I'm using doing cross platforms verifications:

#!/bin/sh
seq() (
	first=$1 incr=$2 last=$3
	echo "for (i = $first; i <= $last; i+=$incr) i" | bc -l
)

_sha256() {
	sha256sum 2>/dev/null || sha256
}

# shellcheck disable=SC2059
ascii_seq="$(
	for i in $(seq 33 1 126); do
		printf "\\$(printf %03o "$i")\\n"
	done
)"

echo ASCII
printf %s\\n "$ascii_seq" | _sha256

echo SORT
printf %s\\n "$ascii_seq" | sort | _sha256
### end

--- >Resulsts:

Coreutils's 8.30-3+b1:
ASCII
d39a8797c560b434fe58e910a31c4e5454a6626602b7114a41509fa12792c1a2  -
SORT
d4225db1191701c182feba79f67d3b4d824bc8e90164d7fe55bdb3d34b71406d  -

Busybox 1.30.1 (Alpine Linux):
ASCII
d39a8797c560b434fe58e910a31c4e5454a6626602b7114a41509fa12792c1a2  -
SORT
d39a8797c560b434fe58e910a31c4e5454a6626602b7114a41509fa12792c1a2  -

More explicit example:

Coreutils's 8.30-3+b1: find . | sort
.
./files
./files/$-e
./files/ascii1
./files/ascii2
./files/!-e
./files/#-e
./files/?-e
./files/empty1
./files/empty2
./files/filename with space
./files/subdir
./files/subdir/empty3
./files/subdir with space
./files/subdir with space/empty4
./tests.sh

Busybox 1.30.1 (Alpine Linux): find . | sort
.
./files
./files/!-e
./files/#-e
./files/$-e
./files/?-e
./files/ascii1
./files/ascii2
./files/empty1
./files/empty2
./files/filename with space
./files/subdir
./files/subdir with space
./files/subdir with space/empty4
./files/subdir/empty3
./tests.sh

I think something is broken in that version thus I did not verified one newer
ones but I think it's worth writing it there.

-- 
Xavier




Added tag(s) notabug. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Sun, 25 Aug 2019 04:04:02 GMT) Full text and rfc822 format available.

Reply sent to Eric Blake <eblake <at> redhat.com>:
You have taken responsibility. (Sun, 25 Aug 2019 04:04:02 GMT) Full text and rfc822 format available.

Notification sent to Xavier Sanchez <xasa <at> 4js.com>:
bug acknowledged by developer. (Sun, 25 Aug 2019 04:04:03 GMT) Full text and rfc822 format available.

Message #12 received at 37177-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Xavier Sanchez <xasa <at> 4js.com>, 37177-done <at> debbugs.gnu.org
Subject: Re: bug#37177: sort don't respect the ASCII order
Date: Sat, 24 Aug 2019 23:03:35 -0500
[Message part 1 (text/plain, inline)]
tag 37177 notabug
thanks

On 8/24/19 6:15 PM, Xavier Sanchez wrote:
> Linux snake 5.2.0-2-amd64 #1 SMP Debian 5.2.9-2 (2019-08-21) x86_64 GNU/Linux
> coreutils: 8.30-3+b1 on Debian 10
> 

> More explicit example:
> 
> Coreutils's 8.30-3+b1: find . | sort
> .
> ./files
> ./files/$-e
> ./files/ascii1

Most likely, your problem is not a bug in sort, but a difference in
locales.  A common scenario is that people are surprised to learn that
the en_US.UTF-8 locale sorts case-insensitively while ignoring
punctuation, rather than in ASCII order.

Run 'find . | sort --debug' for more information, as well as 'find . |
LC_ALL=C sort --debug' to see the difference the locale makes.

Assuming that you are repeating the same non-bug of locale differences
as has been frequently reported by others, I'm marking this as not a
bug.  But we can reopen it if your further investigations find something
other than your locale affecting things.

It may also be that busybox sort does not properly honor locale
variables as required by POSIX, but that would be a problem in busybox,
and not something we can fix here.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[signature.asc (application/pgp-signature, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 22 Sep 2019 11:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 217 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.