GNU bug report logs -
#29044
sort --debug results improvement
Previous Next
Reported by: Dan Jacobson <jidanni <at> jidanni.org>
Date: Sat, 28 Oct 2017 17:31:01 UTC
Severity: normal
Tags: notabug
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29044 in the body.
You can then email your comments to 29044 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#29044
; Package
coreutils
.
(Sat, 28 Oct 2017 17:31:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Dan Jacobson <jidanni <at> jidanni.org>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Sat, 28 Oct 2017 17:31:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
$ sort -k 2n -k 3n --debug file.txt
sort: using simple byte comparison
sort: key 1 is numeric and spans multiple fields
sort: key 2 is numeric and spans multiple fields
41 011 92.3 亞太
___
____
________________
41 011 97.1 大漢
___
____
OK but they look like they only span one field.
Also the user is confused if
________________
is a "key 3", or just a separator.
Therefore please say
": key 1" or "1" etc. at the end of each of them.
This is also important if there many keys.
And add a separator bar, made of -, =, etc. but not _.
Also the Info documentation doesn't mention how to inflence
"sort: using simple byte comparison"
which seems to always be printed when using --debug no matter what.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#29044
; Package
coreutils
.
(Sun, 29 Oct 2017 03:07:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 29044 <at> debbugs.gnu.org (full text, mbox):
tag 29044 notabug
close 29044
thanks
Hello,
There are few issues at hand. Answering out of order:
> $ sort -k 2n -k 3n --debug file.txt
[...]
> Also the user is confused if
> ________________
> is a "key 3", or just a separator.
>
> Therefore please say
> ": key 1" or "1" etc. at the end of each of them.
> This is also important if there many keys.
>
> And add a separator bar, made of -, =, etc. but not _.
This is indeed a 3rd key: it is the default behavior
of the 'last resort' sorting by the entire line.
It is not a separator.
It is used to sort lines for which the specified keys are equal.
It can be disabled with "-s/--stable" option.
Consider the following:
Case 1: The first key is equal ("A" in both lines).
Sort then uses the last resort sorting and compares the entire
lines, making "A B" appear first:
$ printf "%s\n" "A C" "A B" | sort --debug -k1,1
A B
_
___
A C
_
___
Case 2: Using "-s" disable last-resort, and lines with equal keys
are printed in the same order they were specified (hence "stable"):
$ printf "%s\n" "A C" "A B" | sort --debug -k1,1 -s
A C
_
A B
_
On 2017-10-28 11:26 AM, Dan Jacobson wrote:
> $ sort -k 2n -k 3n --debug file.txt
> sort: using simple byte comparison
> sort: key 1 is numeric and spans multiple fields
> sort: key 2 is numeric and spans multiple fields
> 41 011 92.3 亞太
> ___
> ____
> ________________
> 41 011 97.1 大漢
> ___
> ____
>
> OK but they look like they only span one field.
'sort --debug' will indicate the *actual* characters
that were used for the comparison.
In case of "-n" (numeric sort), the conversion to a numeric value
stopped at the space character, and it is indicated so.
This has nothing to do with the fact that the key specification
spans multiple fields for a single numeric key.
Consider the following cases (I'm using "-s" for all cases to
reduce clutter, it doesn't change the meaning):
Case 1: Because we used alphanumeric sorting order (the default),
All the characters until the first space are marked by "--debug":
$ printf "%s\n" "11A A" "33 C" "4e4D D" | sort -k1,1 --debug -s
11A A
___
33 C
__
4e4D D
____
Case 2: with numeric sorting, only the digits are marked:
$ printf "%s\n" "11A A" "33 C" "4e4D D" | sort -k1n,1 --debug -s
4e4D D
_
11A A
__
33 C
__
case 3: if using "-g" (general numeric sort, which can parse scientific
notation) the "4e4" is parsed, but stopped at the "D" character:
$ printf "%s\n" "11A A" "33 C" "4e4D D" | sort -s -k1g,1 --debug
11A A
__
33 C
__
4e4D D
___
> Also the Info documentation doesn't mention how to inflence
> "sort: using simple byte comparison"
> which seems to always be printed when using --debug no matter what.
This message indicates you are sorting in the C/POSIX locale.
Perhaps it is the default locale on your system ?
"sort --debug" will always print the sorting rules, e.g.:
$ LC_ALL=en_CA.UTF-8 sort --debug < /dev/null
sort: using ‘en_CA.UTF-8’ sorting rules
$ LC_ALL=C sort --debug < /dev/null
sort: using simple byte comparison
As such,
I'm marking this item as not-a-bug and closing it, but discussion can
continue by replying to this thread.
regards,
- assaf
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#29044
; Package
coreutils
.
(Sun, 29 Oct 2017 18:36:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 29044 <at> debbugs.gnu.org (full text, mbox):
Your answer is absolutely pure gold for a new page linked from
‘--debug’
Highlight the portion of each line used for sorting. Also issue
warnings about questionable usage to stderr.
in the Info manual! Please don't let it go to waste sitting in the bug
tracker. Perhaps call it Debugging examples. You can pretty much just
quote the entire exchange between you and me.
P.S., Yes indeed I had LC_COLLATE=C so maybe --debug should mention
where in the environment it made it choices from too.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#29044
; Package
coreutils
.
(Sun, 29 Oct 2017 18:41:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 29044 <at> debbugs.gnu.org (full text, mbox):
< P.S., Yes indeed I had LC_COLLATE=C so maybe --debug should mention
< where in the environment it made it choices from too.
Ah, like you said
$ LC_ALL=en_CA.UTF-8 sort --debug < /dev/null
sort: using ‘en_CA.UTF-8’ sorting rules
$ LC_ALL=C sort --debug < /dev/null
sort: using simple byte comparison
So the last line should be
sort: using 'C' sorting rules (simple byte comparison)
or maybe also say "effective LC_COLLATE value is ...."..
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#29044
; Package
coreutils
.
(Sun, 29 Oct 2017 21:35:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 29044 <at> debbugs.gnu.org (full text, mbox):
On 29/10/17 11:40, 積丹尼 Dan Jacobson wrote:
> < P.S., Yes indeed I had LC_COLLATE=C so maybe --debug should mention
> < where in the environment it made it choices from too.
>
> Ah, like you said
>
> $ LC_ALL=en_CA.UTF-8 sort --debug < /dev/null
> sort: using ‘en_CA.UTF-8’ sorting rules
>
> $ LC_ALL=C sort --debug < /dev/null
> sort: using simple byte comparison
>
> So the last line should be
> sort: using 'C' sorting rules (simple byte comparison)
>
> or maybe also say "effective LC_COLLATE value is ...."..
"C" sorting is badly named and assume prior knowledge,
and is also ambiguous with C.UTF8 etc.
I thought "simple byte comparison" was the most appropriate.
I agree we might mention the locale env vars,
though defaults, and significant env vars vary per system.
cheers,
Pádraig.
Added tag(s) notabug.
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Tue, 30 Oct 2018 01:46:04 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
29044 <at> debbugs.gnu.org and Dan Jacobson <jidanni <at> jidanni.org>
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Tue, 30 Oct 2018 01:46:04 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Tue, 27 Nov 2018 12:24:08 GMT)
Full text and
rfc822 format available.
This bug report was last modified 5 years and 123 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.