GNU bug report logs - #31033
multibyte: sort,uniq,join,tr,cut,paste,expand,unexpand patch

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: coreutils; Severity: wishlist; Reported by: Eric Fischer <enf@HIDDEN>; Keywords: patch; dated Mon, 2 Apr 2018 23:17:01 UTC; Maintainer for coreutils is bug-coreutils@HIDDEN.
Changed bug title to 'multibyte: sort,uniq,join,tr,cut,paste,expand,unexpand patch' from '[PATCH] Multibyte support for sort, uniq, join, tr, cut, paste, expand, unexpand' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.
Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 2 Apr 2018 23:16:45 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Apr 02 19:16:45 2018
Received: from localhost ([127.0.0.1]:35579 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1f38gu-0001tr-K0
	for submit <at> debbugs.gnu.org; Mon, 02 Apr 2018 19:16:44 -0400
Received: from eggs.gnu.org ([208.118.235.92]:44984)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <enf1234567890@HIDDEN>) id 1f37mT-0000au-9e
 for submit <at> debbugs.gnu.org; Mon, 02 Apr 2018 18:18:25 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <enf1234567890@HIDDEN>) id 1f37mN-0000k7-0w
 for submit <at> debbugs.gnu.org; Mon, 02 Apr 2018 18:18:20 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,
 FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,T_DKIM_INVALID
 autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:33691)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <enf1234567890@HIDDEN>)
 id 1f37mM-0000k2-Tb
 for submit <at> debbugs.gnu.org; Mon, 02 Apr 2018 18:18:18 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:35907)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <enf1234567890@HIDDEN>) id 1f37mL-0007iu-Ok
 for bug-coreutils@HIDDEN; Mon, 02 Apr 2018 18:18:18 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <enf1234567890@HIDDEN>) id 1f37mK-0000iq-Kw
 for bug-coreutils@HIDDEN; Mon, 02 Apr 2018 18:18:17 -0400
Received: from mail-io0-x235.google.com ([2607:f8b0:4001:c06::235]:36941)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
 (Exim 4.71) (envelope-from <enf1234567890@HIDDEN>)
 id 1f37mK-0000iS-FO
 for bug-coreutils@HIDDEN; Mon, 02 Apr 2018 18:18:16 -0400
Received: by mail-io0-x235.google.com with SMTP id y128so19595741iod.4
 for <bug-coreutils@HIDDEN>; Mon, 02 Apr 2018 15:18:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:sender:from:date:message-id:subject:to;
 bh=CJhZJYBT4DisdsRTY3DiK1BIkhxsJPmzzFGzCtFhp8o=;
 b=NFkQTln4ibo8j9Aafeg07eTFGLPZpoYEPWFERaXdbpXTpKxLWJr8eR+/ldQhYN7G4y
 6Vj/TtfRe7OVeh9Ryd0bVM8ml4Eo51hNrnJazVhMH16EsZezEgeWoEAlIjWWqPwZlc7A
 cl6di2DN5uS9LtQYyrnRDihaRhWkwVrG1GYndRtjSSLNI8AQMVjiNXVMjlPlO2xO8rrC
 QtYIyBwS+vaTGqu/u22X1FOFXeTePOAw+fR6G3Ux8h2QoKrM97MdZfIhiuJ7CViyNhHf
 sQ1SSBsSzCWQp6LafZzoLypwYmVMEgbCJa4ZLMEkPvZ3eEo5zyxzvTeT3kHVOd5ZA9dz
 OoNg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:sender:from:date:message-id:subject
 :to; bh=CJhZJYBT4DisdsRTY3DiK1BIkhxsJPmzzFGzCtFhp8o=;
 b=gfTj2v79R+8w8ZD/rhhwXHhA0kkEalTd9hFk/e1VO5b6kzduIDGMGA6/JTtAVMyDyp
 EH1s1zXI2M2qUN/4JPc3455K3konhoU3CddaajpxQXK66Ma7GQ7KUJUB2kLyO+q3ISzU
 sM6VoK919Ww9Qcjzu5thh99juEYKlQwQQZbFwRwTssgkkNda9ya/1oZQ00c5B8WAzEIU
 XvSnYr9GY+umt2HRcayMQLCmNifDucF7NHfKlDXiqmuGZbbeSyk65yZ43eA76bT2dk/g
 7Bw0SpggFB2CZ7DbsRr3IUdYSho0fAw76c8CNKLKfVkzSlnFEepR9+2MYgqdbC6OqLhQ
 N2/A==
X-Gm-Message-State: ALQs6tDYv8vR9YL8eWGHNUiMX8yz7aKyKmdaUhub5fGx3RGLWpvi5NgE
 gmCtlZuXwHopZ7beTrxRJMH0kD+c6idpccn6OIIJbpzB
X-Google-Smtp-Source: AIpwx4+n5iIGNjppWUMGlrtOqHVpAGV68LJa8IQMLl/jcUgOlFJy6XvRLnI/QemeA+P8RYKrr6V90vAHjffnbe5ZpqA=
X-Received: by 10.107.173.137 with SMTP id m9mr2970134ioo.1.1522707495289;
 Mon, 02 Apr 2018 15:18:15 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.2.43.47 with HTTP; Mon, 2 Apr 2018 15:18:14 -0700 (PDT)
From: Eric Fischer <enf@HIDDEN>
Date: Mon, 2 Apr 2018 15:18:14 -0700
X-Google-Sender-Auth: G07IwCKFBNMiwiltoNYFJF9rrig
Message-ID: <CAMJYP_PUqFfUEyBZt+U1hnAMLb0+LjPzm5w5yYCbJozzEP8=VA@HIDDEN>
Subject: [PATCH] Multibyte support for sort, uniq, join, tr, cut, paste,
 expand, unexpand
To: bug-coreutils@HIDDEN
Content-Type: multipart/alternative; boundary="001a114492e45a8a170568e4f70a"
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
 recognized.
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -3.3 (---)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Mon, 02 Apr 2018 19:16:43 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

--001a114492e45a8a170568e4f70a
Content-Type: text/plain; charset="UTF-8"

As previously discussed on the coreutils mailing list, beginning with

  http://lists.gnu.org/archive/html/coreutils/2017-12/msg00074.html

most of the coreutils text processing commands process bytes instead of
characters, regardless of the user's locale, so they do not handle UTF-8
text or options properly.

I propose the changes in

  https://github.com/ericfischer/coreutils/compare/multibyte-squash

to convert sort, uniq, join, tr, cut, paste, expand, and unexpand to
process characters instead of bytes, allowing them to work correctly on
non-ASCII text, as specified by POSIX.

Eric Fischer

--001a114492e45a8a170568e4f70a
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">As previously discussed on the coreutils mailing list, beg=
inning with<div><br></div><div>=C2=A0 <a href=3D"http://lists.gnu.org/archi=
ve/html/coreutils/2017-12/msg00074.html">http://lists.gnu.org/archive/html/=
coreutils/2017-12/msg00074.html</a></div><div><br></div><div>most of the co=
reutils text processing commands process bytes instead of characters, regar=
dless of the user&#39;s locale, so they do not handle UTF-8 text or options=
 properly.</div><div><br></div>I propose the changes in<br><br>=C2=A0 <a hr=
ef=3D"https://github.com/ericfischer/coreutils/compare/multibyte-squash">ht=
tps://github.com/ericfischer/coreutils/compare/multibyte-squash</a><br><br>=
to convert sort, uniq, join, tr, cut, paste, expand, and unexpand to proces=
s characters instead of bytes, allowing them to work correctly on non-ASCII=
 text, as specified by POSIX.<br><div><br></div><div>Eric Fischer</div></di=
v>

--001a114492e45a8a170568e4f70a--




Acknowledgement sent to Eric Fischer <enf@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-coreutils@HIDDEN. Full text available.
Report forwarded to bug-coreutils@HIDDEN:
bug#31033; Package coreutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.