GNU bug report logs -
#36674
Sort Suggestion
Previous Next
Reported by: Marshall Lake <mlake <at> mlake.net>
Date: Mon, 15 Jul 2019 18:53:01 UTC
Severity: normal
Tags: notabug
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 36674 in the body.
You can then email your comments to 36674 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#36674
; Package
coreutils
.
(Mon, 15 Jul 2019 18:53:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Marshall Lake <mlake <at> mlake.net>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Mon, 15 Jul 2019 18:53:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi,
Even though this isn't a bug, I was asked to send the following to this
email address.
Re: SORT Command from GNU coreutils 8.25
A suggestion for an additional option to the SORT command is to ignore
non-alphanumeric characters.
As an example, in attempting to sort an index ...
Abbott, William 259
sorts before:
Abbot, William 099
If non-alphanumeric characters were ignored then the same two records
would sort as:
Abbot, William 099
Abbott, William 259
Thanks for reading.
--
Marshall Lake -- mlake <at> mlake.net -- http://www.mlake.net
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#36674
; Package
coreutils
.
(Mon, 15 Jul 2019 19:25:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 36674 <at> debbugs.gnu.org (full text, mbox):
tag 36674 notabug
close 36674
stop
Hello,
On Mon, Jul 15, 2019 at 11:42:01AM -0700, Marshall Lake wrote:
> Even though this isn't a bug, I was asked to send the following to this
> email address.
(General suggestions and discussions are better suited for
coreutils <at> gnu.org mailing list, that way the system won't open a new
bug item.)
>
> Re: SORT Command from GNU coreutils 8.25
>
> A suggestion for an additional option to the SORT command is to ignore
> non-alphanumeric characters.
>
> As an example, in attempting to sort an index ...
>
> Abbott, William 259
>
> sorts before:
>
> Abbot, William 099
>
> If non-alphanumeric characters were ignored then the same two records
> would sort as:
>
> Abbot, William 099
> Abbott, William 259
>
>
There's actually something else at play here:
In your case, sort does ignore non-alphanumeric characters,
but it ALSO ignores white space.
That happens because your locale is set to some language
(for example, en_US.UTF8).
Using such locale makes sort ignore all non-alphanumeric chareacters,
whitespace, and upper/lower cases.
In essense, you are compaing "AbbottWilliam" (two 't's) to
'AbbotWilliam' (one 't') - and then the second 't' is compared to a 'w',
and is determined to come first.
If you force a POSIX/C locate, then all characters are considered,
and the result will be as you requested.
Observe the following:
$ printf "%s\n" AbbottWilliam AbbotWilliam | LC_ALL=en_CA.utf8 sort
AbbottWilliam
AbbotWilliam
$ printf "%s\n" "Abbott William" "Abbot William" | LC_ALL=en_CA.utf8 sort
Abbott William
Abbot William
$ printf "%s\n" "Abbott William" "Abbot William" | LC_ALL=C sort
Abbot William
Abbott William
$ printf "%s\n" "Abbott, William" "Abbot, William" | LC_ALL=C sort
Abbot, William
Abbott, William
Note that 'sort' already has an option for dictionary style sorting:
-d, --dictionary-order: consider only blanks and alphanumeric characters.
However, locale rules take precedence over it, so effectively it only
works in "C" locale:
$ printf "%s\n" "Ab,,b,,ott William" "Abbot William" | LC_ALL=C sort
Ab,,b,,ott William
Abbot William
$ printf "%s\n" "Ab,,b,,ott William" "Abbot William" | LC_ALL=C sort -d
Abbot William
Ab,,b,,ott William
You can read past discussion about the confusion resulting from locale
sorting rules here:
https://debbugs.gnu.org/11621
https://debbugs.gnu.org/12783
As such, I'm closing this as "not a bug", but discussion can continue
by replying to this thread.
-assaf
Added tag(s) notabug.
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Mon, 15 Jul 2019 19:25:02 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
36674 <at> debbugs.gnu.org and Marshall Lake <mlake <at> mlake.net>
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Mon, 15 Jul 2019 19:25:04 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Tue, 13 Aug 2019 11:24:08 GMT)
Full text and
rfc822 format available.
This bug report was last modified 4 years and 258 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.