GNU bug report logs -
#11967
Bug in "uniq"
Previous Next
Reported by: Jaime Gaspar <mail <at> jaimegaspar.com>
Date: Tue, 17 Jul 2012 21:30:02 UTC
Severity: normal
Tags: notabug
Merged with 11968
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 11967 in the body.
You can then email your comments to 11967 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#11967
; Package
coreutils
.
(Tue, 17 Jul 2012 21:30:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Jaime Gaspar <mail <at> jaimegaspar.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Tue, 17 Jul 2012 21:30:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Dear Sir or Madam,
I think that there is a bug in "uniq" (version 8.13).
The file "bug.txt" attached consists of two lines:
- the first one containing a character that
looks like a "v" and a line break;
- the second one containing a character that
looks like a upside down "v" and a line break.
In hex:
E2 88 A8 0A
E2 88 A7 0A
When we run "uniq bug.txt" in a terminal, "uniq" outputs a single line, so "uniq" thinks that the two lines are equal, but they are not.
Regards,
Jaime Gaspar
_____________________________
Homepage: www.jaimegaspar.com
E-mail: mail <at> jaimegaspar.com
____________________________________________________________
Send any screenshot to your friends in seconds...
Works in all emails, instant messengers, blogs, forums and social networks.
TRY IM TOOLPACK at http://www.imtoolpack.com/default.aspx?rc=if2 for FREE
Forcibly Merged 11967 11968.
Request was from
Eric Blake <eblake <at> redhat.com>
to
control <at> debbugs.gnu.org
.
(Tue, 17 Jul 2012 21:56:01 GMT)
Full text and
rfc822 format available.
Added tag(s) notabug.
Request was from
Eric Blake <eblake <at> redhat.com>
to
control <at> debbugs.gnu.org
.
(Tue, 17 Jul 2012 21:56:01 GMT)
Full text and
rfc822 format available.
Reply sent
to
Eric Blake <eblake <at> redhat.com>
:
You have taken responsibility.
(Tue, 17 Jul 2012 21:56:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Jaime Gaspar <mail <at> jaimegaspar.com>
:
bug acknowledged by developer.
(Tue, 17 Jul 2012 21:56:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 11967-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
forcemerge 11967 11968
tag 11967 notabug
thanks
On 07/17/2012 12:17 PM, Jaime Gaspar wrote:
> I think that there is a bug in "uniq" (version 8.13).
Is this your distro's build? However, I repeated your claim with the
latest coreutils.git (post-8.17)., so this is not likely to be a bug in
a distro-specific multibyte patch.
>
> The file "bug.txt" attached consists of two lines:
> - the first one containing a character that
> looks like a "v" and a line break;
> - the second one containing a character that
> looks like a upside down "v" and a line break.
> In hex:
>
> E2 88 A8 0A
> E2 88 A7 0A
Those glyphs that you describe line up with Unicode characters. I bet
you are using a locale with UTF-8 character encoding.
>
> When we run "uniq bug.txt" in a terminal, "uniq" outputs a single line, so "uniq" thinks that the two lines are equal, but they are not.
I can reproduce your symptoms, but only when I fudge my locale:
$ LC_ALL=C uniq ../bug.txt
∨
∧
$ LC_ALL=en_US.UTF-8 uniq ../bug.txt
∨
$
Remember, 'uniq' is required by POSIX to use the same line comparison
techniques as 'sort'; and 'sort' is required to use strcoll() (not
strcmp) to compare lines. And in your particular choice of locale,
strcoll() happens to state that '∨' and '∧' collate identically; hence
uniq is correct in stating that you have a duplicated line according to
your current locale.
$ LC_ALL=en_US.UTF-8 sort ../bug.txt -u --debug
sort: using ‘en_US.UTF-8’ sorting rules
∨
_
$
So I'm closing this as not a bug, along with a final pointer to our FAQ:
https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021
--
Eric Blake eblake <at> redhat.com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Reply sent
to
Eric Blake <eblake <at> redhat.com>
:
You have taken responsibility.
(Tue, 17 Jul 2012 21:56:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Jaime Gaspar <mail <at> jaimegaspar.com>
:
bug acknowledged by developer.
(Tue, 17 Jul 2012 21:56:03 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Wed, 15 Aug 2012 11:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 11 years and 257 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.