GNU bug report logs - #25550
multibyte: uniq: special characters comparison

Previous Next

Package: coreutils;

Reported by: David Loyall <david.loyall <at> the-good-guys.net>

Date: Thu, 26 Jan 2017 23:14:02 UTC

Severity: wishlist

To reply to this bug, email your comments to 25550 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#25550; Package coreutils. (Thu, 26 Jan 2017 23:14:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to David Loyall <david.loyall <at> the-good-guys.net>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Thu, 26 Jan 2017 23:14:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: David Loyall <david.loyall <at> the-good-guys.net>
To: bug-coreutils <at> gnu.org
Subject: Apparent unicode bug in uniq 8.26
Date: Thu, 26 Jan 2017 16:45:37 -0600
Hello.  I think I found a bug in uniq 8.26.

Here's a demo:

hobbes <at> metalbaby:~/e2-scratch$ cat faces_mre.txt
(◕‿◕)
(︺︹︺)

hobbes <at> metalbaby:~/e2-scratch$ uniq -c faces_mre.txt
2 (◕‿◕)

Here's some background info:

hobbes <at> metalbaby:~/e2-scratch$ od -x faces_mre.txt
0000000 e228 9597 80e2 e2bf 9597 0a29 ef28 bab8
0000020 b8ef efb9 bab8 0a29
0000030

hobbes <at> metalbaby:~/e2-scratch$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

hobbes <at> metalbaby:~/e2-scratch$ uniq --version
uniq (GNU coreutils) 8.26
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Richard M. Stallman and David MacKenzie.

The bug disappears in the C locale.

hobbes <at> metalbaby:~/e2-scratch$ LC_COLLATE=c uniq -c faces_mre.txt
1 (◕‿◕)
1 (︺︹︺)

I hope this helps.

Cheers,

--Dave Loyall
Omaha, Nebraska, USA




Information forwarded to bug-coreutils <at> gnu.org:
bug#25550; Package coreutils. (Tue, 14 Mar 2017 07:03:02 GMT) Full text and rfc822 format available.

Message #8 received at 25550 <at> debbugs.gnu.org (full text, mbox):

From: Mike Frysinger <vapier <at> gentoo.org>
To: David Loyall <david.loyall <at> the-good-guys.net>
Cc: 25550 <at> debbugs.gnu.org
Subject: Re: bug#25550: Apparent unicode bug in uniq 8.26
Date: Tue, 14 Mar 2017 03:02:09 -0400
[Message part 1 (text/plain, inline)]
On 26 Jan 2017 16:45, David Loyall wrote:
> Hello.  I think I found a bug in uniq 8.26.

while it is a bug, i'm pretty sure it's a bug in glibc.
coreutils relies on data glibc provides in cases like this.
-mike
[signature.asc (application/pgp-signature, inline)]

Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Sun, 28 Oct 2018 07:58:02 GMT) Full text and rfc822 format available.

Changed bug title to 'multibyte: uniq: special characters comparison' from 'Apparent unicode bug in uniq 8.26' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Sun, 28 Oct 2018 07:58:02 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 180 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.