GNU bug report logs - #25550
multibyte: uniq: special characters comparison

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: coreutils; Severity: wishlist; Reported by: David Loyall <david.loyall@HIDDEN>; dated Thu, 26 Jan 2017 23:14:02 UTC; Maintainer for coreutils is bug-coreutils@HIDDEN.
Changed bug title to 'multibyte: uniq: special characters comparison' from 'Apparent unicode bug in uniq 8.26' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.
Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 25550 <at> debbugs.gnu.org:


Received: (at 25550) by debbugs.gnu.org; 14 Mar 2017 07:02:19 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Mar 14 03:02:19 2017
Received: from localhost ([127.0.0.1]:54667 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1cngTL-0006gm-GX
	for submit <at> debbugs.gnu.org; Tue, 14 Mar 2017 03:02:19 -0400
Received: from smtp.gentoo.org ([140.211.166.183]:34582)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <vapier@HIDDEN>) id 1cngTJ-0006gY-L2
 for 25550 <at> debbugs.gnu.org; Tue, 14 Mar 2017 03:02:18 -0400
Received: from vapier (localhost [127.0.0.1])
 by smtp.gentoo.org (Postfix) with SMTP id 414CA340806;
 Tue, 14 Mar 2017 07:02:10 +0000 (UTC)
Date: Tue, 14 Mar 2017 03:02:09 -0400
From: Mike Frysinger <vapier@HIDDEN>
To: David Loyall <david.loyall@HIDDEN>
Subject: Re: bug#25550: Apparent unicode bug in uniq 8.26
Message-ID: <20170314070209.GM24205@vapier>
Mail-Followup-To: David Loyall <david.loyall@HIDDEN>,
 25550 <at> debbugs.gnu.org
References: <CA+4fW6nVUnw3CHMTXY9NRaewJ0v8y-Jwvm7BHBAt8fBrU6NyDg@HIDDEN>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256;
 protocol="application/pgp-signature"; boundary="wAI/bQb0EMvlZCHl"
Content-Disposition: inline
In-Reply-To: <CA+4fW6nVUnw3CHMTXY9NRaewJ0v8y-Jwvm7BHBAt8fBrU6NyDg@HIDDEN>
X-Spam-Score: -5.3 (-----)
X-Debbugs-Envelope-To: 25550
Cc: 25550 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.3 (-----)


--wAI/bQb0EMvlZCHl
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline

On 26 Jan 2017 16:45, David Loyall wrote:
> Hello.  I think I found a bug in uniq 8.26.

while it is a bug, i'm pretty sure it's a bug in glibc.
coreutils relies on data glibc provides in cases like this.
-mike

--wAI/bQb0EMvlZCHl
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEuQK1JxMl+JKsJRrUQWM7n+g39YEFAljHlXAACgkQQWM7n+g3
9YH1iQ/+LQ7bjW3vDrQfTZPzAc7NYIpSoRcarXN55B5b72C4jkKbI1Z5BOHHdR20
CgGdTaOZDXrlKa3yHFE9211RMk9UVuDm1FqhP7ELnerBh0cgpLnuyFYMgjF1KayG
/aIqv2Zn5WRcxBszU05SWmdCMRSRfQJMOAEhz7nnwAuroBdGF9GdFFDfInCGoMPp
2ObfXxl9AwUM15upE3rrvBZWxRSwIQuSkP2iLxTzqJtBhtsxZ0meBRBiYCEBqLnu
tdmKbXlQWIwcoLox/MUBHCYfwXiUfDFIa90kiWXVtQ3Pmr7YFpB7yyfxaCxthZJF
+1eYC3ZbT4OjR10725HY5pjuEBA0VWyhZBjNPXQwuKmkOIyohwdK1JOimLQ/bH1v
XQ14R0YbYHbCb7/Yr6edkFatdckJqmPT6M2udaig1LUXxXLI39JkvV1mx4AQ9vNq
22zh1TtsxdGWMEl8MilY0I+aRF8hZCoBpQG4L1gG2tWb7l5w97VFNqpOQV5BzMLl
+Djn86T81ke64ctlLNfcV3uzZNpmibuOtYejZTEXsmN1Ima+2NKxcc8FZxV+ty6l
iUMCVPc1WoZnnu4NjRbH5ku7BpSRBQ3UUKuOdL8+Qzi2AWCeVRkOj4KwbqQysb+/
44vEjzLPxFb2ykUPI68iwQyl2Cntmwim6lKj3tuWLcw4nuYJQXE=
=rd3H
-----END PGP SIGNATURE-----

--wAI/bQb0EMvlZCHl--




Information forwarded to bug-coreutils@HIDDEN:
bug#25550; Package coreutils. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 26 Jan 2017 23:13:12 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jan 26 18:13:12 2017
Received: from localhost ([127.0.0.1]:46573 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1cWtE8-000385-8i
	for submit <at> debbugs.gnu.org; Thu, 26 Jan 2017 18:13:12 -0500
Received: from eggs.gnu.org ([208.118.235.92]:43687)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <david.loyall@HIDDEN>) id 1cWsnn-00086f-0Q
 for submit <at> debbugs.gnu.org; Thu, 26 Jan 2017 17:45:59 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <david.loyall@HIDDEN>) id 1cWsnd-0000sQ-EU
 for submit <at> debbugs.gnu.org; Thu, 26 Jan 2017 17:45:53 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,T_DKIM_INVALID
 autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:52180)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <david.loyall@HIDDEN>)
 id 1cWsnd-0000sM-BT
 for submit <at> debbugs.gnu.org; Thu, 26 Jan 2017 17:45:49 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:34633)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <david.loyall@HIDDEN>) id 1cWsnY-00012L-Of
 for bug-coreutils@HIDDEN; Thu, 26 Jan 2017 17:45:49 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <david.loyall@HIDDEN>) id 1cWsnU-0000o6-JG
 for bug-coreutils@HIDDEN; Thu, 26 Jan 2017 17:45:44 -0500
Received: from mail-qt0-x22a.google.com ([2607:f8b0:400d:c0d::22a]:34778)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
 (Exim 4.71) (envelope-from <david.loyall@HIDDEN>)
 id 1cWsnU-0000lq-D8
 for bug-coreutils@HIDDEN; Thu, 26 Jan 2017 17:45:40 -0500
Received: by mail-qt0-x22a.google.com with SMTP id w20so39639913qtb.1
 for <bug-coreutils@HIDDEN>; Thu, 26 Jan 2017 14:45:38 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=the-good-guys-net.20150623.gappssmtp.com; s=20150623;
 h=mime-version:from:date:message-id:subject:to
 :content-transfer-encoding;
 bh=TBaI+ECrBOBIoFf2Xn+G8g4YmQKA6GgIsP3JJ+47a5U=;
 b=nkk7k3SpMFm+zmds+UOxG6gaD8VYwedRaqLOpVpekcAnWogjMyMfAPieVBBmEaiOi8
 MZzbSE154YegS4mo00gY1msSRLj0QRFl/W1awnFG/D1qKLck8VSlLILXv6z74OusgnJg
 YWC7rt4C9mHXFJpow7yMI2kcNHL/TITqDVVZ5aReVtI6aA1gIUknCmk6ooezlIAiP6uE
 rjybRFB8JTh371X6S+C/4vDcu8DK6QA3pktwtYNfBlPkjw5t8ImskwSp/G2SW81FTwDG
 XH+TIXjOfuCqop7jKihqefo53qqgZJSgusYSCOzarMQU+8xsu7Misat/wqUD97MhVNTI
 TxKQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:from:date:message-id:subject:to
 :content-transfer-encoding;
 bh=TBaI+ECrBOBIoFf2Xn+G8g4YmQKA6GgIsP3JJ+47a5U=;
 b=b88YqmUXfEfOdEygPzvQmJ5deAV+jRD2/Z8PhNdZ8oUieaRrIFrZxblMoP+NIzZIX5
 ovJn995H7908LgE/jigUnezhuC1U9sHAqyN7cLeQCbNqT3bXHAU09jvzxfndasYc56fg
 LbHAl37FfCwrAm4Z/yXSxJ/45EKkko5OLDI9mvAyKk74/uqG3NT50jqv1rXWuiK8xXAC
 fLGLhO7213axpWpbEfxtNVJ32CCjXT4msFjfl8MHwXlCv0EBfcpeClRP4SmINFyMYGZa
 q+9paNUPRAOyzMenmi9yhQudUsRE5gNuAwVqDrJGmU3ONaTTkFxwEvpD2nJZ79yeOrtZ
 g8YQ==
X-Gm-Message-State: AIkVDXJrESpStWuzQPSZ9sV//RjFfHyaXUVLFgtykSFdIQzsQp9OzmwY8R5nEWcCrsHJmi0rnQ6bPPxbE/0q5g==
X-Received: by 10.55.215.202 with SMTP id t71mr5907265qkt.114.1485470737603;
 Thu, 26 Jan 2017 14:45:37 -0800 (PST)
MIME-Version: 1.0
Received: by 10.200.49.72 with HTTP; Thu, 26 Jan 2017 14:45:37 -0800 (PST)
From: David Loyall <david.loyall@HIDDEN>
Date: Thu, 26 Jan 2017 16:45:37 -0600
Message-ID: <CA+4fW6nVUnw3CHMTXY9NRaewJ0v8y-Jwvm7BHBAt8fBrU6NyDg@HIDDEN>
Subject: Apparent unicode bug in uniq 8.26
To: bug-coreutils@HIDDEN
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -5.0 (-----)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Thu, 26 Jan 2017 18:13:11 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.0 (-----)

Hello.  I think I found a bug in uniq 8.26.

Here's a demo:

hobbes@metalbaby:~/e2-scratch$ cat faces_mre.txt
(=E2=97=95=E2=80=BF=E2=97=95)
(=EF=B8=BA=EF=B8=B9=EF=B8=BA)

hobbes@metalbaby:~/e2-scratch$ uniq -c faces_mre.txt
2 (=E2=97=95=E2=80=BF=E2=97=95)

Here's some background info:

hobbes@metalbaby:~/e2-scratch$ od -x faces_mre.txt
0000000 e228 9597 80e2 e2bf 9597 0a29 ef28 bab8
0000020 b8ef efb9 bab8 0a29
0000030

hobbes@metalbaby:~/e2-scratch$ locale
LANG=3Den_US.UTF-8
LANGUAGE=3D
LC_CTYPE=3D"en_US.UTF-8"
LC_NUMERIC=3D"en_US.UTF-8"
LC_TIME=3D"en_US.UTF-8"
LC_COLLATE=3D"en_US.UTF-8"
LC_MONETARY=3D"en_US.UTF-8"
LC_MESSAGES=3D"en_US.UTF-8"
LC_PAPER=3D"en_US.UTF-8"
LC_NAME=3D"en_US.UTF-8"
LC_ADDRESS=3D"en_US.UTF-8"
LC_TELEPHONE=3D"en_US.UTF-8"
LC_MEASUREMENT=3D"en_US.UTF-8"
LC_IDENTIFICATION=3D"en_US.UTF-8"
LC_ALL=3D

hobbes@metalbaby:~/e2-scratch$ uniq --version
uniq (GNU coreutils) 8.26
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.htm=
l>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Richard M. Stallman and David MacKenzie.

The bug disappears in the C locale.

hobbes@metalbaby:~/e2-scratch$ LC_COLLATE=3Dc uniq -c faces_mre.txt
1 (=E2=97=95=E2=80=BF=E2=97=95)
1 (=EF=B8=BA=EF=B8=B9=EF=B8=BA)

I hope this helps.

Cheers,

--Dave Loyall
Omaha, Nebraska, USA




Acknowledgement sent to David Loyall <david.loyall@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-coreutils@HIDDEN. Full text available.
Report forwarded to bug-coreutils@HIDDEN:
bug#25550; Package coreutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Sun, 28 Oct 2018 08:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.