GNU bug report logs - #36674
Sort Suggestion

Previous Next

Package: coreutils;

Reported by: Marshall Lake <mlake <at> mlake.net>

Date: Mon, 15 Jul 2019 18:53:01 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 36674 in the body.
You can then email your comments to 36674 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#36674; Package coreutils. (Mon, 15 Jul 2019 18:53:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Marshall Lake <mlake <at> mlake.net>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Mon, 15 Jul 2019 18:53:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Marshall Lake <mlake <at> mlake.net>
To: bug-coreutils <at> gnu.org
Subject: Sort Suggestion
Date: Mon, 15 Jul 2019 11:42:01 -0700 (MST)
Hi,

Even though this isn't a bug, I was asked to send the following to this 
email address.


Re:  SORT Command from GNU coreutils 8.25

A suggestion for an additional option to the SORT command is to ignore 
non-alphanumeric characters.

As an example, in attempting to sort an index ...

Abbott, William                        259

sorts before:

Abbot, William                         099

If non-alphanumeric characters were ignored then the same two records
would sort as:

Abbot, William                         099
Abbott, William                        259


Thanks for reading.


-- 
Marshall Lake -- mlake <at> mlake.net -- http://www.mlake.net




Information forwarded to bug-coreutils <at> gnu.org:
bug#36674; Package coreutils. (Mon, 15 Jul 2019 19:25:02 GMT) Full text and rfc822 format available.

Message #8 received at 36674 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Marshall Lake <mlake <at> mlake.net>
Cc: 36674 <at> debbugs.gnu.org
Subject: Re: bug#36674: Sort Suggestion
Date: Mon, 15 Jul 2019 13:23:52 -0600
tag 36674 notabug
close 36674
stop

Hello,

On Mon, Jul 15, 2019 at 11:42:01AM -0700, Marshall Lake wrote:
> Even though this isn't a bug, I was asked to send the following to this
> email address.

(General suggestions and discussions are better suited for
coreutils <at> gnu.org mailing list, that way the system won't open a new
bug item.)

> 
> Re:  SORT Command from GNU coreutils 8.25
> 
> A suggestion for an additional option to the SORT command is to ignore
> non-alphanumeric characters.
> 
> As an example, in attempting to sort an index ...
> 
> Abbott, William                        259
> 
> sorts before:
> 
> Abbot, William                         099
> 
> If non-alphanumeric characters were ignored then the same two records
> would sort as:
> 
> Abbot, William                         099
> Abbott, William                        259
> 
> 

There's actually something else at play here:
In your case, sort does ignore non-alphanumeric characters,
but it ALSO ignores white space.
That happens because your locale is set to some language
(for example, en_US.UTF8).

Using such locale makes sort ignore all non-alphanumeric chareacters,
whitespace, and upper/lower cases.

In essense, you are compaing "AbbottWilliam" (two 't's) to
'AbbotWilliam' (one 't') - and then the second 't' is compared to a 'w',
and is determined to come first.

If you force a POSIX/C locate, then all characters are considered,
and the result will be as you requested.

Observe the following:

  $ printf "%s\n" AbbottWilliam AbbotWilliam | LC_ALL=en_CA.utf8 sort
  AbbottWilliam
  AbbotWilliam

  $ printf "%s\n" "Abbott William" "Abbot William" | LC_ALL=en_CA.utf8 sort
  Abbott William
  Abbot William

  $ printf "%s\n" "Abbott William" "Abbot William" | LC_ALL=C sort
  Abbot William
  Abbott William

  $ printf "%s\n" "Abbott, William" "Abbot, William" | LC_ALL=C sort
  Abbot, William
  Abbott, William

Note that 'sort' already has an option for dictionary style sorting:
   -d, --dictionary-order: consider only blanks and alphanumeric characters.

However, locale rules take precedence over it, so effectively it only
works in "C" locale:

  $ printf "%s\n" "Ab,,b,,ott William" "Abbot William" | LC_ALL=C sort
  Ab,,b,,ott William
  Abbot William

  $ printf "%s\n" "Ab,,b,,ott William" "Abbot William" | LC_ALL=C sort -d
  Abbot William
  Ab,,b,,ott William


You can read past discussion about the confusion resulting from locale
sorting rules here:
   https://debbugs.gnu.org/11621
   https://debbugs.gnu.org/12783


As such, I'm closing this as "not a bug", but discussion can continue
by replying to this thread.

-assaf





Added tag(s) notabug. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Mon, 15 Jul 2019 19:25:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 36674 <at> debbugs.gnu.org and Marshall Lake <mlake <at> mlake.net> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Mon, 15 Jul 2019 19:25:04 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 13 Aug 2019 11:24:08 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 258 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.