GNU bug report logs - #61696
Warn about sort --numeric-sort --unique data loss

Previous Next

Package: coreutils;

Reported by: Dan Jacobson <jidanni <at> jidanni.org>

Date: Wed, 22 Feb 2023 02:05:02 UTC

Severity: normal

Tags: notabug

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 61696 in the body.
You can then email your comments to 61696 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#61696; Package coreutils. (Wed, 22 Feb 2023 02:05:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Dan Jacobson <jidanni <at> jidanni.org>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 22 Feb 2023 02:05:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Dan Jacobson <jidanni <at> jidanni.org>
To: bug-coreutils <at> gnu.org
Subject: Warn about sort --numeric-sort --unique data loss
Date: Wed, 22 Feb 2023 09:26:21 +0800
At (info "(coreutils) sort invocation") it says
  For example, ‘sort -n -u’ inspects only the value of the initial
  numeric string when checking for uniqueness, whereas ‘sort -n | uniq’
  inspects the entire line. *Note uniq invocation::.

OK, but you still need to add a warning about data loss.

Here's a shell script:

k="3 Billy
17 Villy
4 Nibblesberg
3 Philbert
3 Billy"
c=sort
echo We sort the students [$c]
echo "$k" | $c
c="sort --numeric-sort"
echo Oh my gosh, we must use [$c]
echo "$k" | $c
c="sort --numeric-sort --unique"
echo Yuck, let\'s eliminate the duplicates too [$c]
echo "$k" | $c
echo Oops, we caused \"data loss\". Good thing we noticed it.
c="sort --unique" d="sort --numeric-sort"
echo Let\'s try it the right way: [$c \| $d]
echo "$k" | $c | $d

Running it shows:
We sort the students [sort]
17 Villy
3 Billy
3 Billy
3 Philbert
4 Nibblesberg
Oh my gosh, we must use [sort --numeric-sort]
3 Billy
3 Billy
3 Philbert
4 Nibblesberg
17 Villy
Yuck, let's eliminate the duplicates too [sort --numeric-sort --unique]
3 Billy
4 Nibblesberg
17 Villy
Oops, we caused "data loss". Good thing we noticed it.
Let's try it the right way: [sort --unique | sort --numeric-sort]
3 Billy
3 Philbert
4 Nibblesberg
17 Villy

Sure, you might say, "That's already mentioned" (in the fine print). "The
reader just needs to put 2 + 2 together in their heads." Yes, but
anyway, the document needs to drive home the point more.

Maybe the man page should say so too.




Information forwarded to bug-coreutils <at> gnu.org:
bug#61696; Package coreutils. (Wed, 22 Feb 2023 11:30:03 GMT) Full text and rfc822 format available.

Message #8 received at 61696 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Dan Jacobson <jidanni <at> jidanni.org>, 61696 <at> debbugs.gnu.org
Subject: Re: bug#61696: Warn about sort --numeric-sort --unique data loss
Date: Wed, 22 Feb 2023 11:29:13 +0000
tag 61696 notabug
close 61696
stop

On 22/02/2023 01:26, Dan Jacobson wrote:
> At (info "(coreutils) sort invocation") it says
>    For example, ‘sort -n -u’ inspects only the value of the initial
>    numeric string when checking for uniqueness, whereas ‘sort -n | uniq’
>    inspects the entire line. *Note uniq invocation::.
> 
> OK, but you still need to add a warning about data loss.

> Sure, you might say, "That's already mentioned" (in the fine print). "The
> reader just needs to put 2 + 2 together in their heads." Yes, but
> anyway, the document needs to drive home the point more.
> 
> Maybe the man page should say so too.

Honestly I don't think it's fine print.
That's essentially the point the quoted paragraph is making.
Being overly verbose in docs is a concern too.
Marking this as done for now.

cheers,
Pádraig





Added tag(s) notabug. Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Wed, 22 Feb 2023 11:30:04 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 61696 <at> debbugs.gnu.org and Dan Jacobson <jidanni <at> jidanni.org> Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Wed, 22 Feb 2023 11:30:04 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 23 Mar 2023 11:24:09 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 33 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.