GNU bug report logs - #36718
uniq treats distinct Korean characters equal

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: coreutils; Reported by: Felix Hamme <fhamme@HIDDEN>; dated Thu, 18 Jul 2019 14:49:01 UTC; Maintainer for coreutils is bug-coreutils@HIDDEN.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 18 Jul 2019 14:48:58 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jul 18 10:48:58 2019
Received: from localhost ([127.0.0.1]:54473 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ho7iM-0002gX-0N
	for submit <at> debbugs.gnu.org; Thu, 18 Jul 2019 10:48:58 -0400
Received: from lists.gnu.org ([209.51.188.17]:37545)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <fhamme@HIDDEN>) id 1ho75l-0001jP-K7
 for submit <at> debbugs.gnu.org; Thu, 18 Jul 2019 10:09:06 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:46336)
 by lists.gnu.org with esmtp (Exim 4.86_2)
 (envelope-from <fhamme@HIDDEN>) id 1ho75k-00077Q-Os
 for bug-coreutils@HIDDEN; Thu, 18 Jul 2019 10:09:05 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_40,RCVD_IN_DNSWL_NONE
 autolearn=disabled version=3.3.2
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <fhamme@HIDDEN>) id 1ho75j-0002dL-Rv
 for bug-coreutils@HIDDEN; Thu, 18 Jul 2019 10:09:04 -0400
Received: from moint.1and1.com ([212.227.15.8]:49344)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
 (Exim 4.71) (envelope-from <fhamme@HIDDEN>)
 id 1ho75j-0002cV-K3
 for bug-coreutils@HIDDEN; Thu, 18 Jul 2019 10:09:03 -0400
Received: from [82.165.232.198] (helo=[10.21.18.246])
 by mrint.1and1.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.84_2) (envelope-from <fhamme@HIDDEN>)
 id 1ho75d-0006pm-U6; Thu, 18 Jul 2019 16:08:58 +0200
From: Felix Hamme <fhamme@HIDDEN>
Openpgp: preference=signencrypt
Autocrypt: addr=fhamme@HIDDEN; prefer-encrypt=mutual; keydata=
 mQINBFnU9gQBEACpRolVcloxAchzV386GmfEgvPIAKtPoFyVJW5E1vW7wLmZotB8Mx8JZ5uV
 ek5Gz4vNNaSNdowlDFw563t8PPiE7cbZ3VKqPfUnHml9Ky/jYQjFmdkWu9ffdbcKlqHk3rsc
 rdY5AzYjbnfuF0kJgXgwlm85csucCiwdBvGf4xvyqwx7u2kP0V5FoDbOpPRiSSNDBTR9dxT7
 RJmM9D+P48x90G0E2sXRb7oAmLYzID/38slgtDGhtfTj80E5qJR1Uy78iY7yM8JADBHTxwa3
 7ytW6lR2QcwTkm0nF5BHX47t5jbu/y7We4OQgr4koQo3Xc93ybtaMEIfVyCtD/r/x4jQ39cE
 /Eu+yRqGKeiHB6dzndT+8MSwVsRZf7isDUHBNfUC7P3fe1zqH3wdC6UX83FBxLkRVd2Nxmay
 OdaK3oggYPN8FL3TUOQhGL6QNGdMdxy98vFZvZZ23Wg/k5fRKTvA6r+RFZpFN04TWmfJq7W4
 GoYZJ7zs7oPBksOWQAi63d+P3fQi2DreMyqBFUuhXpf3g6ZncS1jZ/yI/RLzjKw08t3UUD1+
 CrzHDAgcx6v1y/WVWtXDEV+heH92HRlaokfNoMLxmX/T97Mcru5OSndhCVXVLizvFNR7x8F+
 mjvKyNvN9KI9k7dKGAlaYB4T6bcCqGpgA8UO7Xh1wlb/p/jKvwARAQABtCdGZWxpeCBIYW1t
 ZSA8ZmhhbW1lQHVuaXRlZC1pbnRlcm5ldC5kZT6JAj0EEwEIACcFAlnU9gQCGyMFCQlmAYAF
 CwkIBwIGFQgJCgsCBBYCAwECHgECF4AACgkQCzkgEoCLFB+b6g/5AY6pYi52p9qh2oBZjvMW
 1rX8+9vwrcVXEX1dJdr/ZbniHRguoYVog7V08zyZzAC84dzA6zGDYzc83leWUdOV9W17NMUY
 8J4DdyMX7mmCY9nAmfviloR+D00PR0d2MjBTzQJePRe/487pVMueBARWuWhl27sGE5G0KLmF
 Tngi4j9xx3yKveSwREwEMEsdA1T95ZB+tLCuiR1S8gqWw1gVtjMIEuA43ScF+bkSg4eP91zw
 TIlbsvwIoe/+i/T4Hev8fdfKVPd4mHNvK0oNHLnaOdF5UdCHuSirNeJ56bjOyJvdHjjgLXsU
 blBJX4zG0J7Pc/xtCIIFzgzmhdApy3QpdOQ/FInoqxfn+Zd8QD/9IZSvPGJlctsjuetcyQbi
 09uQWsBl4tFWO7OQkkxcPkPBFDQ7beMFfZPL/zNjACzirG7wRQY8xzReBHsTWxolWM1vcn0u
 IkrENdF3yeMBiti4zLrMnlw1hw+C2TpLwNEh6hP8UqR6qGEgZiwW8lOl63Heh8EJllZnKW88
 0IY6K41e/VUkZN4yKIWg6K5/FNxoUo5+ZCA6i9dfcytPOvK/pQRQvAp6qlbqfKzqvClCfnQd
 BWqt0QsK1c4BEcvvkwphwBWrRPdx4AiglVfiousKDOax7gjha/4HsRaAOi0YQZ8zHx8nPtmE
 z2fAXjwQjgxpBvK5Ag0EWdT2BAEQAKXuBKgOh69+jMhP+DqADZbx323n1FxgaAZ2pwLvgic5
 cq00G0cJbsf/WE5ri4yRpMS2EK37YqNrWttyFc0fopZD4tGoNmZY2qdzlEtGyfp4AxfUw+FW
 78cfL3fSNPDvNMKORKSgWy0VcPFGl+5Fc+Gr4xhcPL6I/Pyg+U3NWlevTwdyvHacR5fV4loa
 V+ULhaN+zR4ZSaYsLnthgEWTWzC9tqORQ/O34iLUiyd8+XjRmgXvZVYFgAm+5cOVv8xic1xp
 1TRGQmOIKF1TFR3kcp0Xtv3U/lS8bCZrYek56Ptn56pUxbu52HEolJTifAEB2lpOBRDrHc0b
 KueAP6rnirN02waPjK22B62Ujt66S0mHHHnovsvK+oCDjfe8ITe9g5nrm11EVSNiv9+MJh6g
 FCdIHr8CaOhipZn9gLjY0PCMt9PPIVAmuvG6x3HABvse5PHH4tXHcvatKVhqFbmZu4ryRvER
 27qNzyLGAcssrXWTzYuo7Q/mVFkk6pMHJd9uXa8fIYmrdVM9yQgj7Gs0TixcaHVtsL/StfCk
 3P7CfflE6pnnc0yRTwC82XXRpK8GA7TUtEg38z2G0C1O0vH6C4sWuRwXbCq3HQb8wuW8ukxj
 XFW8rUkbY/z9rneaSMsTjUJmZKgWDBrqlTVbRU+gVhO2RCoaJNnf4ZrmJkFof0RvABEBAAGJ
 AiUEGAEIAA8FAlnU9gQCGwwFCQlmAYAACgkQCzkgEoCLFB9T3w//cczKxXaX+dlW5QDMSZUK
 8qmedA7bi7O2fiUZbISddnjYPdR+BJDwTnYCBeKGS8eXlGeNVTi+X9LycV0LqDxJX+NLVUXD
 QfCCtQyZhh/lojD+RTs/KBV6nXD6Ad2rXeYZFB6fr8ABv4YzSGI3z/pDbZpa/1Pop9VATS97
 VPFUUY8p3fNbmNyndANyOrtyTpkeK5TF4euQDJXg9CJJKwcN60LYz63K+hjGmgXlhMqEgBMJ
 7s/ywQCA4gY4RLVeUZK9ZisRixqoHcza3GsgFdFaImU2l3KK7vwdff2QXC7IWWWgnIHk+U2Y
 mZ5qU3QUXsv80b24JQzGUT/MnwFx7D56eICNqzIxe77NGi8XGCLMtBb9ZA5jFvd0lBeOm9L7
 F29ta1l8z0maHttlR1g+FufV8a2yZF5vHL5jBLJ+6WJfqMvEF3lBjtv9KGltkr2YDhs14Jap
 yRGAwD3J7Rzq1AjEAa6qUdNXbTLXRofabWFQS5NQ0V8iEuMQ3jSNiuS3RnXyNphXR3og6myU
 u+uP5vt2mKLjADeywl3tufDqkKXZ2IgAvKwccJaMRkZtcmbMG1oN6pb5Z/WENCwNsXNm0DvF
 +54VhvMowEeOihiTuvrqGcIpkt2ZpCvQvMbtsYqXIJfsxQOgQnjbVNxyeop9gLOeTi88BHmz
 WvwzPrxf0Ensy7U=
To: bug-coreutils@HIDDEN
Subject: uniq treats distinct Korean characters equal
Message-ID: <b63c29f8-dbed-b445-6ab9-ad0d0872481d@HIDDEN>
Date: Thu, 18 Jul 2019 16:08:57 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.8.0
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="------------D989C0C3F5A783E7E6CBFD03"
Content-Language: en-US
X-Virus-Scanned: ClamAV@mvs-ha-bs
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 212.227.15.8
X-Spam-Score: -1.4 (-)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Thu, 18 Jul 2019 10:48:55 -0400
Cc: Gerhard Dittes <gerhard.dittes@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.4 (--)

This is a multi-part message in MIME format.
--------------D989C0C3F5A783E7E6CBFD03
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

Dear all,

I found that, when performing uniq on some Korean characters, it treats
them as equal (counts as duplicate) although the characters aren't
equal. To be precise, it happened to me on the Characters 프 (U+D504) and
틀 (U+D2C0).

An example (input, expected output, actual output) can be found in the
attachment.
I've tried that using uniq (GNU coreutils) 8.30.

Greetings
Felix Hamme

--------------D989C0C3F5A783E7E6CBFD03
Content-Type: application/gzip;
 name="uniq-korean-characters-bug.tar.gz"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="uniq-korean-characters-bug.tar.gz"

H4sIAA59MF0AA+3Wv07CQADH8c48xb3A0bvrXevrVCxKRIqlTRgdiiZi4uKIiSzMTYyRwfgu
XfnzDrY1EgylloTUiL/PwDW9ElqOb2nQaV3Sc9dz7A5tnNme3fAdr0ePg1Nd2xeWsJRKR24p
tj5+0bihuGSWaViGxrgSnGlE7e0MCgQ93/YI0ZpOu9UvOO6n+T8q2L7+m3uTIbDb1A38buDT
pude0Oz9DXpUN1jd7/u5n5EusCnl1vXnyW/j2/oL07SkRlglX8D/Xn+SEWT5EM7Ho8XT4/J2
uri+m0UjvfY5x8lyeDWfRLPX6XzytojeZ9NIz9m1OjwOh/FgHIf32Wuy/RKHN/HgWa/99sXC
hr30X1h/if7Fqn9lSDPtXybT6L8Ced2X7B2dH4Cd+nf6XSfZOtm4AxTEr5X5/+er/oUQWf9M
oP8qlOwf94QDtVP/rU5WveuViH5N+f5NxUX6/C+FwPN/JdA/AAAAAAAAAAAAAMDh+QC5PrXF
ACgAAA==
--------------D989C0C3F5A783E7E6CBFD03--




Acknowledgement sent to Felix Hamme <fhamme@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-coreutils@HIDDEN. Full text available.
Report forwarded to bug-coreutils@HIDDEN:
bug#36718; Package coreutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Thu, 18 Jul 2019 15:00:03 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.