Assaf Gordon <assafgordon@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Assaf Gordon <assafgordon@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Received: (at submit) by debbugs.gnu.org; 2 Apr 2018 23:16:45 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Apr 02 19:16:45 2018 Received: from localhost ([127.0.0.1]:35579 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1f38gu-0001tr-K0 for submit <at> debbugs.gnu.org; Mon, 02 Apr 2018 19:16:44 -0400 Received: from eggs.gnu.org ([208.118.235.92]:44984) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <enf1234567890@HIDDEN>) id 1f37mT-0000au-9e for submit <at> debbugs.gnu.org; Mon, 02 Apr 2018 18:18:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <enf1234567890@HIDDEN>) id 1f37mN-0000k7-0w for submit <at> debbugs.gnu.org; Mon, 02 Apr 2018 18:18:20 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00, FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:33691) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from <enf1234567890@HIDDEN>) id 1f37mM-0000k2-Tb for submit <at> debbugs.gnu.org; Mon, 02 Apr 2018 18:18:18 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35907) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <enf1234567890@HIDDEN>) id 1f37mL-0007iu-Ok for bug-coreutils@HIDDEN; Mon, 02 Apr 2018 18:18:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <enf1234567890@HIDDEN>) id 1f37mK-0000iq-Kw for bug-coreutils@HIDDEN; Mon, 02 Apr 2018 18:18:17 -0400 Received: from mail-io0-x235.google.com ([2607:f8b0:4001:c06::235]:36941) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from <enf1234567890@HIDDEN>) id 1f37mK-0000iS-FO for bug-coreutils@HIDDEN; Mon, 02 Apr 2018 18:18:16 -0400 Received: by mail-io0-x235.google.com with SMTP id y128so19595741iod.4 for <bug-coreutils@HIDDEN>; Mon, 02 Apr 2018 15:18:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:from:date:message-id:subject:to; bh=CJhZJYBT4DisdsRTY3DiK1BIkhxsJPmzzFGzCtFhp8o=; b=NFkQTln4ibo8j9Aafeg07eTFGLPZpoYEPWFERaXdbpXTpKxLWJr8eR+/ldQhYN7G4y 6Vj/TtfRe7OVeh9Ryd0bVM8ml4Eo51hNrnJazVhMH16EsZezEgeWoEAlIjWWqPwZlc7A cl6di2DN5uS9LtQYyrnRDihaRhWkwVrG1GYndRtjSSLNI8AQMVjiNXVMjlPlO2xO8rrC QtYIyBwS+vaTGqu/u22X1FOFXeTePOAw+fR6G3Ux8h2QoKrM97MdZfIhiuJ7CViyNhHf sQ1SSBsSzCWQp6LafZzoLypwYmVMEgbCJa4ZLMEkPvZ3eEo5zyxzvTeT3kHVOd5ZA9dz OoNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:from:date:message-id:subject :to; bh=CJhZJYBT4DisdsRTY3DiK1BIkhxsJPmzzFGzCtFhp8o=; b=gfTj2v79R+8w8ZD/rhhwXHhA0kkEalTd9hFk/e1VO5b6kzduIDGMGA6/JTtAVMyDyp EH1s1zXI2M2qUN/4JPc3455K3konhoU3CddaajpxQXK66Ma7GQ7KUJUB2kLyO+q3ISzU sM6VoK919Ww9Qcjzu5thh99juEYKlQwQQZbFwRwTssgkkNda9ya/1oZQ00c5B8WAzEIU XvSnYr9GY+umt2HRcayMQLCmNifDucF7NHfKlDXiqmuGZbbeSyk65yZ43eA76bT2dk/g 7Bw0SpggFB2CZ7DbsRr3IUdYSho0fAw76c8CNKLKfVkzSlnFEepR9+2MYgqdbC6OqLhQ N2/A== X-Gm-Message-State: ALQs6tDYv8vR9YL8eWGHNUiMX8yz7aKyKmdaUhub5fGx3RGLWpvi5NgE gmCtlZuXwHopZ7beTrxRJMH0kD+c6idpccn6OIIJbpzB X-Google-Smtp-Source: AIpwx4+n5iIGNjppWUMGlrtOqHVpAGV68LJa8IQMLl/jcUgOlFJy6XvRLnI/QemeA+P8RYKrr6V90vAHjffnbe5ZpqA= X-Received: by 10.107.173.137 with SMTP id m9mr2970134ioo.1.1522707495289; Mon, 02 Apr 2018 15:18:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.2.43.47 with HTTP; Mon, 2 Apr 2018 15:18:14 -0700 (PDT) From: Eric Fischer <enf@HIDDEN> Date: Mon, 2 Apr 2018 15:18:14 -0700 X-Google-Sender-Auth: G07IwCKFBNMiwiltoNYFJF9rrig Message-ID: <CAMJYP_PUqFfUEyBZt+U1hnAMLb0+LjPzm5w5yYCbJozzEP8=VA@HIDDEN> Subject: [PATCH] Multibyte support for sort, uniq, join, tr, cut, paste, expand, unexpand To: bug-coreutils@HIDDEN Content-Type: multipart/alternative; boundary="001a114492e45a8a170568e4f70a" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -3.3 (---) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 02 Apr 2018 19:16:43 -0400 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) --001a114492e45a8a170568e4f70a Content-Type: text/plain; charset="UTF-8" As previously discussed on the coreutils mailing list, beginning with http://lists.gnu.org/archive/html/coreutils/2017-12/msg00074.html most of the coreutils text processing commands process bytes instead of characters, regardless of the user's locale, so they do not handle UTF-8 text or options properly. I propose the changes in https://github.com/ericfischer/coreutils/compare/multibyte-squash to convert sort, uniq, join, tr, cut, paste, expand, and unexpand to process characters instead of bytes, allowing them to work correctly on non-ASCII text, as specified by POSIX. Eric Fischer --001a114492e45a8a170568e4f70a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">As previously discussed on the coreutils mailing list, beg= inning with<div><br></div><div>=C2=A0 <a href=3D"http://lists.gnu.org/archi= ve/html/coreutils/2017-12/msg00074.html">http://lists.gnu.org/archive/html/= coreutils/2017-12/msg00074.html</a></div><div><br></div><div>most of the co= reutils text processing commands process bytes instead of characters, regar= dless of the user's locale, so they do not handle UTF-8 text or options= properly.</div><div><br></div>I propose the changes in<br><br>=C2=A0 <a hr= ef=3D"https://github.com/ericfischer/coreutils/compare/multibyte-squash">ht= tps://github.com/ericfischer/coreutils/compare/multibyte-squash</a><br><br>= to convert sort, uniq, join, tr, cut, paste, expand, and unexpand to proces= s characters instead of bytes, allowing them to work correctly on non-ASCII= text, as specified by POSIX.<br><div><br></div><div>Eric Fischer</div></di= v> --001a114492e45a8a170568e4f70a--
Eric Fischer <enf@HIDDEN>
:bug-coreutils@HIDDEN
.
Full text available.bug-coreutils@HIDDEN
:bug#31033
; Package coreutils
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.