GNU bug report logs - #29802
"uniq -c" doesn't like counting lines with nulls

Previous Next

Package: coreutils;

Reported by: "PD" <bug-bash.gnu.org <at> pkts.ca>

Date: Thu, 21 Dec 2017 16:29:02 UTC

Severity: normal

Tags: moreinfo

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29802 in the body.
You can then email your comments to 29802 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#29802; Package coreutils. (Thu, 21 Dec 2017 16:29:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "PD" <bug-bash.gnu.org <at> pkts.ca>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Thu, 21 Dec 2017 16:29:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "PD" <bug-bash.gnu.org <at> pkts.ca>
To: bug-coreutils <at> gnu.org
Subject: "uniq -c" doesn't like counting lines with nulls
Date: Thu, 21 Dec 2017 00:40:34 -0800
Uniq *sometimes* fails to combine lines containing a null character:

# uniq --version
uniq (GNU coreutils) 8.4

##### Count duplicate text lines:
# printf "\n\x00\n\x00\n" | cat -e | uniq -c
      1 $
      2 ^@$

##### Count duplicate binary lines:
# printf "\x00\n\x00\n\n" | uniq -c | cat -e
      2 ^@$
      1 $

##### Whoops, fail to count duplicate binary lines:
# printf "\n\x00\n\x00\n" | uniq -c | cat -e
      1 $
      1 ^@$
      1 ^@$

This was the smallest test case; the original file had hundreds of lines
with nulls (\x00) and Ctrl-A (\x01) characters, and it was quite a
surprise when the output of 'sort testfile | uniq -c' had many pages of '1
^@$' followed by '496 ^A$': it was counting the Ctrl-A lines correctly,
but failing on the null-character lines.

For automated testing with 'delta' or 'git bisect', this works:
---
#!/bin/bash
a=$(sort $1 | cat -e | uniq -c | md5sum -)
b=$(sort $1 | uniq -c | cat -e | md5sum -)
if [[ "$a" != "$b" ]]; then
  echo "PASS (bug present)"; exit 0
else
  echo "FAIL (bug absent)"; exit 1
fi
----

I regret not having the time to test this with coreutils 8.28, but I
couldn't see anything in the git log to suggest this has been fixed:
http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=history;f=src/uniq.c;h=d1dac93c010d7333ced4b54fccbd965cbd5729c2;hb=HEAD

Cheers,
PD




Information forwarded to bug-coreutils <at> gnu.org:
bug#29802; Package coreutils. (Thu, 21 Dec 2017 16:34:02 GMT) Full text and rfc822 format available.

Message #8 received at 29802 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: PD <bug-bash.gnu.org <at> pkts.ca>, 29802 <at> debbugs.gnu.org
Subject: Re: bug#29802: "uniq -c" doesn't like counting lines with nulls
Date: Thu, 21 Dec 2017 16:33:52 +0000
On 21/12/17 08:40, PD wrote:
> ##### Whoops, fail to count duplicate binary lines:
> # printf "\n\x00\n\x00\n" | uniq -c | cat -e
>       1 $
>       1 ^@$
>       1 ^@$

Not reproducible on recent versions.
Might this have been specific to the i18n patch?
I.E. can you reproduce with LC_ALL=C set in the env?

thanks,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#29802; Package coreutils. (Tue, 30 Oct 2018 02:21:02 GMT) Full text and rfc822 format available.

Message #11 received at 29802 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: 29802 <at> debbugs.gnu.org
Subject: Re: bug#29802: "uniq -c" doesn't like counting lines with nulls
Date: Mon, 29 Oct 2018 20:20:38 -0600
tags 29802 moreinfo
close 29802
stop

(triaging old bugs)

On 2017-12-21 9:33 a.m., Pádraig Brady wrote:
> On 21/12/17 08:40, PD wrote:
>> # printf "\n\x00\n\x00\n" | uniq -c | cat -e
>>        1 $
>>        1 ^@$
>>        1 ^@$
> 
> Not reproducible on recent versions.
> Might this have been specific to the i18n patch?
> I.E. can you reproduce with LC_ALL=C set in the env?
> 

With no further comments in almost a year, I'm closing this bug.
Discussion can continue by replying to this thread.

-assaf






Added tag(s) moreinfo. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 30 Oct 2018 02:21:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 29802 <at> debbugs.gnu.org and "PD" <bug-bash.gnu.org <at> pkts.ca> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 30 Oct 2018 02:21:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 27 Nov 2018 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 123 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.