Received: (at 36718) by debbugs.gnu.org; 19 Jul 2019 11:58:01 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Fri Jul 19 07:58:01 2019 Received: from localhost ([127.0.0.1]:55256 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1hoRWS-0004Af-RU for submit <at> debbugs.gnu.org; Fri, 19 Jul 2019 07:58:01 -0400 Received: from moint.1and1.com ([212.227.15.8]:57120) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <fhamme@HIDDEN>) id 1hoPyY-0007oq-Kn for 36718 <at> debbugs.gnu.org; Fri, 19 Jul 2019 06:18:55 -0400 Received: from [82.165.232.198] (helo=[10.21.18.246]) by mrint.1and1.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from <fhamme@HIDDEN>) id 1hoPyH-0004gE-7f; Fri, 19 Jul 2019 12:18:37 +0200 To: Paul Eggert <eggert@HIDDEN>, 36718 <at> debbugs.gnu.org References: <b63c29f8-dbed-b445-6ab9-ad0d0872481d@HIDDEN> <1395fa6c-9741-c5af-e9b0-36d2677b7cc4@HIDDEN> From: Felix Hamme <fhamme@HIDDEN> Openpgp: preference=signencrypt Autocrypt: addr=fhamme@HIDDEN; prefer-encrypt=mutual; keydata= mQINBFnU9gQBEACpRolVcloxAchzV386GmfEgvPIAKtPoFyVJW5E1vW7wLmZotB8Mx8JZ5uV ek5Gz4vNNaSNdowlDFw563t8PPiE7cbZ3VKqPfUnHml9Ky/jYQjFmdkWu9ffdbcKlqHk3rsc rdY5AzYjbnfuF0kJgXgwlm85csucCiwdBvGf4xvyqwx7u2kP0V5FoDbOpPRiSSNDBTR9dxT7 RJmM9D+P48x90G0E2sXRb7oAmLYzID/38slgtDGhtfTj80E5qJR1Uy78iY7yM8JADBHTxwa3 7ytW6lR2QcwTkm0nF5BHX47t5jbu/y7We4OQgr4koQo3Xc93ybtaMEIfVyCtD/r/x4jQ39cE /Eu+yRqGKeiHB6dzndT+8MSwVsRZf7isDUHBNfUC7P3fe1zqH3wdC6UX83FBxLkRVd2Nxmay OdaK3oggYPN8FL3TUOQhGL6QNGdMdxy98vFZvZZ23Wg/k5fRKTvA6r+RFZpFN04TWmfJq7W4 GoYZJ7zs7oPBksOWQAi63d+P3fQi2DreMyqBFUuhXpf3g6ZncS1jZ/yI/RLzjKw08t3UUD1+ CrzHDAgcx6v1y/WVWtXDEV+heH92HRlaokfNoMLxmX/T97Mcru5OSndhCVXVLizvFNR7x8F+ mjvKyNvN9KI9k7dKGAlaYB4T6bcCqGpgA8UO7Xh1wlb/p/jKvwARAQABtCdGZWxpeCBIYW1t ZSA8ZmhhbW1lQHVuaXRlZC1pbnRlcm5ldC5kZT6JAj0EEwEIACcFAlnU9gQCGyMFCQlmAYAF CwkIBwIGFQgJCgsCBBYCAwECHgECF4AACgkQCzkgEoCLFB+b6g/5AY6pYi52p9qh2oBZjvMW 1rX8+9vwrcVXEX1dJdr/ZbniHRguoYVog7V08zyZzAC84dzA6zGDYzc83leWUdOV9W17NMUY 8J4DdyMX7mmCY9nAmfviloR+D00PR0d2MjBTzQJePRe/487pVMueBARWuWhl27sGE5G0KLmF Tngi4j9xx3yKveSwREwEMEsdA1T95ZB+tLCuiR1S8gqWw1gVtjMIEuA43ScF+bkSg4eP91zw TIlbsvwIoe/+i/T4Hev8fdfKVPd4mHNvK0oNHLnaOdF5UdCHuSirNeJ56bjOyJvdHjjgLXsU blBJX4zG0J7Pc/xtCIIFzgzmhdApy3QpdOQ/FInoqxfn+Zd8QD/9IZSvPGJlctsjuetcyQbi 09uQWsBl4tFWO7OQkkxcPkPBFDQ7beMFfZPL/zNjACzirG7wRQY8xzReBHsTWxolWM1vcn0u IkrENdF3yeMBiti4zLrMnlw1hw+C2TpLwNEh6hP8UqR6qGEgZiwW8lOl63Heh8EJllZnKW88 0IY6K41e/VUkZN4yKIWg6K5/FNxoUo5+ZCA6i9dfcytPOvK/pQRQvAp6qlbqfKzqvClCfnQd BWqt0QsK1c4BEcvvkwphwBWrRPdx4AiglVfiousKDOax7gjha/4HsRaAOi0YQZ8zHx8nPtmE z2fAXjwQjgxpBvK5Ag0EWdT2BAEQAKXuBKgOh69+jMhP+DqADZbx323n1FxgaAZ2pwLvgic5 cq00G0cJbsf/WE5ri4yRpMS2EK37YqNrWttyFc0fopZD4tGoNmZY2qdzlEtGyfp4AxfUw+FW 78cfL3fSNPDvNMKORKSgWy0VcPFGl+5Fc+Gr4xhcPL6I/Pyg+U3NWlevTwdyvHacR5fV4loa V+ULhaN+zR4ZSaYsLnthgEWTWzC9tqORQ/O34iLUiyd8+XjRmgXvZVYFgAm+5cOVv8xic1xp 1TRGQmOIKF1TFR3kcp0Xtv3U/lS8bCZrYek56Ptn56pUxbu52HEolJTifAEB2lpOBRDrHc0b KueAP6rnirN02waPjK22B62Ujt66S0mHHHnovsvK+oCDjfe8ITe9g5nrm11EVSNiv9+MJh6g FCdIHr8CaOhipZn9gLjY0PCMt9PPIVAmuvG6x3HABvse5PHH4tXHcvatKVhqFbmZu4ryRvER 27qNzyLGAcssrXWTzYuo7Q/mVFkk6pMHJd9uXa8fIYmrdVM9yQgj7Gs0TixcaHVtsL/StfCk 3P7CfflE6pnnc0yRTwC82XXRpK8GA7TUtEg38z2G0C1O0vH6C4sWuRwXbCq3HQb8wuW8ukxj XFW8rUkbY/z9rneaSMsTjUJmZKgWDBrqlTVbRU+gVhO2RCoaJNnf4ZrmJkFof0RvABEBAAGJ AiUEGAEIAA8FAlnU9gQCGwwFCQlmAYAACgkQCzkgEoCLFB9T3w//cczKxXaX+dlW5QDMSZUK 8qmedA7bi7O2fiUZbISddnjYPdR+BJDwTnYCBeKGS8eXlGeNVTi+X9LycV0LqDxJX+NLVUXD QfCCtQyZhh/lojD+RTs/KBV6nXD6Ad2rXeYZFB6fr8ABv4YzSGI3z/pDbZpa/1Pop9VATS97 VPFUUY8p3fNbmNyndANyOrtyTpkeK5TF4euQDJXg9CJJKwcN60LYz63K+hjGmgXlhMqEgBMJ 7s/ywQCA4gY4RLVeUZK9ZisRixqoHcza3GsgFdFaImU2l3KK7vwdff2QXC7IWWWgnIHk+U2Y mZ5qU3QUXsv80b24JQzGUT/MnwFx7D56eICNqzIxe77NGi8XGCLMtBb9ZA5jFvd0lBeOm9L7 F29ta1l8z0maHttlR1g+FufV8a2yZF5vHL5jBLJ+6WJfqMvEF3lBjtv9KGltkr2YDhs14Jap yRGAwD3J7Rzq1AjEAa6qUdNXbTLXRofabWFQS5NQ0V8iEuMQ3jSNiuS3RnXyNphXR3og6myU u+uP5vt2mKLjADeywl3tufDqkKXZ2IgAvKwccJaMRkZtcmbMG1oN6pb5Z/WENCwNsXNm0DvF +54VhvMowEeOihiTuvrqGcIpkt2ZpCvQvMbtsYqXIJfsxQOgQnjbVNxyeop9gLOeTi88BHmz WvwzPrxf0Ensy7U= Subject: Re: bug#36718: uniq treats distinct Korean characters equal Message-ID: <7476770c-e8c9-a8fc-4564-1613be9c9a9f@HIDDEN> Date: Fri, 19 Jul 2019 12:18:32 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <1395fa6c-9741-c5af-e9b0-36d2677b7cc4@HIDDEN> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV@mvs-ha-bs X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 36718 X-Mailman-Approved-At: Fri, 19 Jul 2019 07:57:59 -0400 Cc: Gerhard Dittes <gerhard.dittes@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) Thanks @Paul Eggert, it seems like this isn't a bug at all. My locale (de_DE.utf8) appears to lack definitions for the mentioned Korean characters. After setting my system language to Korean (ko_KR.utf8) uniq produces the expected output. For my purpose, I'll set my environment to LC_COLLATE=C, which forces byte-wise comparison and should work for all languages. Admittedly, I could've searched it: https://unix.stackexchange.com/questions/373848/why-does-uniq-think-%E3%81%82%E3%81%84-and-%E3%81%84%E3%81%82-are-the-same
bug-coreutils@HIDDEN
:bug#36718
; Package coreutils
.
Full text available.Received: (at 36718) by debbugs.gnu.org; 18 Jul 2019 22:43:58 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jul 18 18:43:58 2019 Received: from localhost ([127.0.0.1]:54799 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1hoF82-0002jT-Ba for submit <at> debbugs.gnu.org; Thu, 18 Jul 2019 18:43:58 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:46214) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1hoF7y-0002jB-WE for 36718 <at> debbugs.gnu.org; Thu, 18 Jul 2019 18:43:57 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 0AF681626E6; Thu, 18 Jul 2019 15:43:49 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id kWMLPgmNXoX0; Thu, 18 Jul 2019 15:43:48 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 695131626E8; Thu, 18 Jul 2019 15:43:48 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id jP5MSBBPkspq; Thu, 18 Jul 2019 15:43:48 -0700 (PDT) Received: from [192.168.1.9] (cpe-23-242-74-103.socal.res.rr.com [23.242.74.103]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 433EB1626E6; Thu, 18 Jul 2019 15:43:48 -0700 (PDT) Subject: Re: bug#36718: uniq treats distinct Korean characters equal To: Felix Hamme <fhamme@HIDDEN>, 36718 <at> debbugs.gnu.org References: <b63c29f8-dbed-b445-6ab9-ad0d0872481d@HIDDEN> From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department Message-ID: <1395fa6c-9741-c5af-e9b0-36d2677b7cc4@HIDDEN> Date: Thu, 18 Jul 2019 15:43:48 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <b63c29f8-dbed-b445-6ab9-ad0d0872481d@HIDDEN> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 36718 Cc: Gerhard Dittes <gerhard.dittes@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) uniq just calls strcoll, and if strcoll (A, B) returns 0 then uniq assumes the lines are equal. So my guess is that your problem has something to do with strcoll, not with coreutils per se.
bug-coreutils@HIDDEN
:bug#36718
; Package coreutils
.
Full text available.Received: (at submit) by debbugs.gnu.org; 18 Jul 2019 14:48:58 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jul 18 10:48:58 2019 Received: from localhost ([127.0.0.1]:54473 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1ho7iM-0002gX-0N for submit <at> debbugs.gnu.org; Thu, 18 Jul 2019 10:48:58 -0400 Received: from lists.gnu.org ([209.51.188.17]:37545) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <fhamme@HIDDEN>) id 1ho75l-0001jP-K7 for submit <at> debbugs.gnu.org; Thu, 18 Jul 2019 10:09:06 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46336) by lists.gnu.org with esmtp (Exim 4.86_2) (envelope-from <fhamme@HIDDEN>) id 1ho75k-00077Q-Os for bug-coreutils@HIDDEN; Thu, 18 Jul 2019 10:09:05 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_40,RCVD_IN_DNSWL_NONE autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <fhamme@HIDDEN>) id 1ho75j-0002dL-Rv for bug-coreutils@HIDDEN; Thu, 18 Jul 2019 10:09:04 -0400 Received: from moint.1and1.com ([212.227.15.8]:49344) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from <fhamme@HIDDEN>) id 1ho75j-0002cV-K3 for bug-coreutils@HIDDEN; Thu, 18 Jul 2019 10:09:03 -0400 Received: from [82.165.232.198] (helo=[10.21.18.246]) by mrint.1and1.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from <fhamme@HIDDEN>) id 1ho75d-0006pm-U6; Thu, 18 Jul 2019 16:08:58 +0200 From: Felix Hamme <fhamme@HIDDEN> Openpgp: preference=signencrypt Autocrypt: addr=fhamme@HIDDEN; prefer-encrypt=mutual; keydata= mQINBFnU9gQBEACpRolVcloxAchzV386GmfEgvPIAKtPoFyVJW5E1vW7wLmZotB8Mx8JZ5uV ek5Gz4vNNaSNdowlDFw563t8PPiE7cbZ3VKqPfUnHml9Ky/jYQjFmdkWu9ffdbcKlqHk3rsc rdY5AzYjbnfuF0kJgXgwlm85csucCiwdBvGf4xvyqwx7u2kP0V5FoDbOpPRiSSNDBTR9dxT7 RJmM9D+P48x90G0E2sXRb7oAmLYzID/38slgtDGhtfTj80E5qJR1Uy78iY7yM8JADBHTxwa3 7ytW6lR2QcwTkm0nF5BHX47t5jbu/y7We4OQgr4koQo3Xc93ybtaMEIfVyCtD/r/x4jQ39cE /Eu+yRqGKeiHB6dzndT+8MSwVsRZf7isDUHBNfUC7P3fe1zqH3wdC6UX83FBxLkRVd2Nxmay OdaK3oggYPN8FL3TUOQhGL6QNGdMdxy98vFZvZZ23Wg/k5fRKTvA6r+RFZpFN04TWmfJq7W4 GoYZJ7zs7oPBksOWQAi63d+P3fQi2DreMyqBFUuhXpf3g6ZncS1jZ/yI/RLzjKw08t3UUD1+ CrzHDAgcx6v1y/WVWtXDEV+heH92HRlaokfNoMLxmX/T97Mcru5OSndhCVXVLizvFNR7x8F+ mjvKyNvN9KI9k7dKGAlaYB4T6bcCqGpgA8UO7Xh1wlb/p/jKvwARAQABtCdGZWxpeCBIYW1t ZSA8ZmhhbW1lQHVuaXRlZC1pbnRlcm5ldC5kZT6JAj0EEwEIACcFAlnU9gQCGyMFCQlmAYAF CwkIBwIGFQgJCgsCBBYCAwECHgECF4AACgkQCzkgEoCLFB+b6g/5AY6pYi52p9qh2oBZjvMW 1rX8+9vwrcVXEX1dJdr/ZbniHRguoYVog7V08zyZzAC84dzA6zGDYzc83leWUdOV9W17NMUY 8J4DdyMX7mmCY9nAmfviloR+D00PR0d2MjBTzQJePRe/487pVMueBARWuWhl27sGE5G0KLmF Tngi4j9xx3yKveSwREwEMEsdA1T95ZB+tLCuiR1S8gqWw1gVtjMIEuA43ScF+bkSg4eP91zw TIlbsvwIoe/+i/T4Hev8fdfKVPd4mHNvK0oNHLnaOdF5UdCHuSirNeJ56bjOyJvdHjjgLXsU blBJX4zG0J7Pc/xtCIIFzgzmhdApy3QpdOQ/FInoqxfn+Zd8QD/9IZSvPGJlctsjuetcyQbi 09uQWsBl4tFWO7OQkkxcPkPBFDQ7beMFfZPL/zNjACzirG7wRQY8xzReBHsTWxolWM1vcn0u IkrENdF3yeMBiti4zLrMnlw1hw+C2TpLwNEh6hP8UqR6qGEgZiwW8lOl63Heh8EJllZnKW88 0IY6K41e/VUkZN4yKIWg6K5/FNxoUo5+ZCA6i9dfcytPOvK/pQRQvAp6qlbqfKzqvClCfnQd BWqt0QsK1c4BEcvvkwphwBWrRPdx4AiglVfiousKDOax7gjha/4HsRaAOi0YQZ8zHx8nPtmE z2fAXjwQjgxpBvK5Ag0EWdT2BAEQAKXuBKgOh69+jMhP+DqADZbx323n1FxgaAZ2pwLvgic5 cq00G0cJbsf/WE5ri4yRpMS2EK37YqNrWttyFc0fopZD4tGoNmZY2qdzlEtGyfp4AxfUw+FW 78cfL3fSNPDvNMKORKSgWy0VcPFGl+5Fc+Gr4xhcPL6I/Pyg+U3NWlevTwdyvHacR5fV4loa V+ULhaN+zR4ZSaYsLnthgEWTWzC9tqORQ/O34iLUiyd8+XjRmgXvZVYFgAm+5cOVv8xic1xp 1TRGQmOIKF1TFR3kcp0Xtv3U/lS8bCZrYek56Ptn56pUxbu52HEolJTifAEB2lpOBRDrHc0b KueAP6rnirN02waPjK22B62Ujt66S0mHHHnovsvK+oCDjfe8ITe9g5nrm11EVSNiv9+MJh6g FCdIHr8CaOhipZn9gLjY0PCMt9PPIVAmuvG6x3HABvse5PHH4tXHcvatKVhqFbmZu4ryRvER 27qNzyLGAcssrXWTzYuo7Q/mVFkk6pMHJd9uXa8fIYmrdVM9yQgj7Gs0TixcaHVtsL/StfCk 3P7CfflE6pnnc0yRTwC82XXRpK8GA7TUtEg38z2G0C1O0vH6C4sWuRwXbCq3HQb8wuW8ukxj XFW8rUkbY/z9rneaSMsTjUJmZKgWDBrqlTVbRU+gVhO2RCoaJNnf4ZrmJkFof0RvABEBAAGJ AiUEGAEIAA8FAlnU9gQCGwwFCQlmAYAACgkQCzkgEoCLFB9T3w//cczKxXaX+dlW5QDMSZUK 8qmedA7bi7O2fiUZbISddnjYPdR+BJDwTnYCBeKGS8eXlGeNVTi+X9LycV0LqDxJX+NLVUXD QfCCtQyZhh/lojD+RTs/KBV6nXD6Ad2rXeYZFB6fr8ABv4YzSGI3z/pDbZpa/1Pop9VATS97 VPFUUY8p3fNbmNyndANyOrtyTpkeK5TF4euQDJXg9CJJKwcN60LYz63K+hjGmgXlhMqEgBMJ 7s/ywQCA4gY4RLVeUZK9ZisRixqoHcza3GsgFdFaImU2l3KK7vwdff2QXC7IWWWgnIHk+U2Y mZ5qU3QUXsv80b24JQzGUT/MnwFx7D56eICNqzIxe77NGi8XGCLMtBb9ZA5jFvd0lBeOm9L7 F29ta1l8z0maHttlR1g+FufV8a2yZF5vHL5jBLJ+6WJfqMvEF3lBjtv9KGltkr2YDhs14Jap yRGAwD3J7Rzq1AjEAa6qUdNXbTLXRofabWFQS5NQ0V8iEuMQ3jSNiuS3RnXyNphXR3og6myU u+uP5vt2mKLjADeywl3tufDqkKXZ2IgAvKwccJaMRkZtcmbMG1oN6pb5Z/WENCwNsXNm0DvF +54VhvMowEeOihiTuvrqGcIpkt2ZpCvQvMbtsYqXIJfsxQOgQnjbVNxyeop9gLOeTi88BHmz WvwzPrxf0Ensy7U= To: bug-coreutils@HIDDEN Subject: uniq treats distinct Korean characters equal Message-ID: <b63c29f8-dbed-b445-6ab9-ad0d0872481d@HIDDEN> Date: Thu, 18 Jul 2019 16:08:57 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------D989C0C3F5A783E7E6CBFD03" Content-Language: en-US X-Virus-Scanned: ClamAV@mvs-ha-bs X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 212.227.15.8 X-Spam-Score: -1.4 (-) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Thu, 18 Jul 2019 10:48:55 -0400 Cc: Gerhard Dittes <gerhard.dittes@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.4 (--) This is a multi-part message in MIME format. --------------D989C0C3F5A783E7E6CBFD03 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Dear all, I found that, when performing uniq on some Korean characters, it treats them as equal (counts as duplicate) although the characters aren't equal. To be precise, it happened to me on the Characters 프 (U+D504) and 틀 (U+D2C0). An example (input, expected output, actual output) can be found in the attachment. I've tried that using uniq (GNU coreutils) 8.30. Greetings Felix Hamme --------------D989C0C3F5A783E7E6CBFD03 Content-Type: application/gzip; name="uniq-korean-characters-bug.tar.gz" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="uniq-korean-characters-bug.tar.gz" H4sIAA59MF0AA+3Wv07CQADH8c48xb3A0bvrXevrVCxKRIqlTRgdiiZi4uKIiSzMTYyRwfgu XfnzDrY1EgylloTUiL/PwDW9ElqOb2nQaV3Sc9dz7A5tnNme3fAdr0ePg1Nd2xeWsJRKR24p tj5+0bihuGSWaViGxrgSnGlE7e0MCgQ93/YI0ZpOu9UvOO6n+T8q2L7+m3uTIbDb1A38buDT pude0Oz9DXpUN1jd7/u5n5EusCnl1vXnyW/j2/oL07SkRlglX8D/Xn+SEWT5EM7Ho8XT4/J2 uri+m0UjvfY5x8lyeDWfRLPX6XzytojeZ9NIz9m1OjwOh/FgHIf32Wuy/RKHN/HgWa/99sXC hr30X1h/if7Fqn9lSDPtXybT6L8Ced2X7B2dH4Cd+nf6XSfZOtm4AxTEr5X5/+er/oUQWf9M oP8qlOwf94QDtVP/rU5WveuViH5N+f5NxUX6/C+FwPN/JdA/AAAAAAAAAAAAAMDh+QC5PrXF ACgAAA== --------------D989C0C3F5A783E7E6CBFD03--
Felix Hamme <fhamme@HIDDEN>
:bug-coreutils@HIDDEN
.
Full text available.bug-coreutils@HIDDEN
:bug#36718
; Package coreutils
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.