Assaf Gordon <assafgordon@HIDDEN>
to control <at> debbugs.gnu.org.
Full text available.Assaf Gordon <assafgordon@HIDDEN>
to control <at> debbugs.gnu.org.
Full text available.Received: (at 32267) by debbugs.gnu.org; 26 Jul 2018 09:21:47 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jul 26 05:21:47 2018 Received: from localhost ([127.0.0.1]:58297 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1ficSv-0004yU-Ai for submit <at> debbugs.gnu.org; Thu, 26 Jul 2018 05:21:45 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:55466) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1ficSs-0004yE-Mh for 32267 <at> debbugs.gnu.org; Thu, 26 Jul 2018 05:21:43 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 1D588160656; Thu, 26 Jul 2018 02:21:37 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id NHOqJjFT8Yqo; Thu, 26 Jul 2018 02:21:36 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 6B605160657; Thu, 26 Jul 2018 02:21:36 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id azCQAtdBuJjj; Thu, 26 Jul 2018 02:21:36 -0700 (PDT) Received: from [192.168.1.9] (unknown [47.154.30.119]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 28D42160656; Thu, 26 Jul 2018 02:21:36 -0700 (PDT) Subject: Re: bug#32267: dd's ucase and lcase and LC_CTYPE. To: Ralph Corderoy <ralph@HIDDEN>, 32267 <at> debbugs.gnu.org References: <20180725081111.984911FBFB@HIDDEN> From: Paul Eggert <eggert@HIDDEN> Openpgp: preference=signencrypt Autocrypt: addr=eggert@HIDDEN; prefer-encrypt=mutual; keydata= xsFNBEyAcmQBEADAAyH2xoTu7ppG5D3a8FMZEon74dCvc4+q1XA2J2tBy2pwaTqfhpxxdGA9 Jj50UJ3PD4bSUEgN8tLZ0san47l5XTAFLi2456ciSl5m8sKaHlGdt9XmAAtmXqeZVIYX/UFS 96fDzf4xhEmm/y7LbYEPQdUdxu47xA5KhTYp5bltF3WYDz1Ygd7gx07Auwp7iw7eNvnoDTAl KAl8KYDZzbDNCQGEbpY3efZIvPdeI+FWQN4W+kghy+P6au6PrIIhYraeua7XDdb2LS1en3Ss mE3QjqfRqI/A2ue8JMwsvXe/WK38Ezs6x74iTaqI3AFH6ilAhDqpMnd/msSESNFt76DiO1ZK QMr9amVPknjfPmJISqdhgB1DlEdw34sROf6V8mZw0xfqT6PKE46LcFefzs0kbg4GORf8vjG2 Sf1tk5eU8MBiyN/bZ03bKNjNYMpODDQQwuP84kYLkX2wBxxMAhBxwbDVZudzxDZJ1C2VXujC OJVxq2kljBM9ETYuUGqd75AW2LXrLw6+MuIsHFAYAgRr7+KcwDgBAfwhPBYX34nSSiHlmLC+ KaHLeCLF5ZI2vKm3HEeCTtlOg7xZEONgwzL+fdKo+D6SoC8RRxJKs8a3sVfI4t6CnrQzvJbB n6gxdgCu5i29J1QCYrCYvql2UyFPAK+do99/1jOXT4m2836j1wARAQABzSBQYXVsIEVnZ2Vy dCA8ZWdnZXJ0QGNzLnVjbGEuZWR1PsLBfgQTAQIAKAUCTIByZAIbAwUJEswDAAYLCQgHAwIG FQgCCQoLBBYCAwECHgECF4AACgkQ7ZfpDmKqfjRRGw/+Ij03dhYfYl/gXVRiuzV1gGrbHk+t nfrI/C7fAeoFzQ5tVgVinShaPkZo0HTPf18x6IDEdAiO8Mqo1yp0CtHmzGMCJ50o4Grgfjlr 6g/+vtEOKbhleszN2XpJvpwM2QgGvn/laTLUu8PH9aRWTs7qJJZKKKAb4sxYc92FehPu6FOD 0dDiyhlDAq4lOV2mdBpzQbiojoZzQLMQwjpgCTK2572eK9EOEQySUThXrSIz6ASenp4NYTFH s9tuJQvXk9gZDdPSl3bp+47dGxlxEWLpBIM7zIONw4ks4azgT8nvDZxA5IZHtvqBlJLBObYY 0Le61Wp0y3TlBDh2qdK8eYL426W4scEMSuig5gb8OAtQiBW6k2sGUxxeiv8ovWu8YAZgKJfu oWI+uRnMEddruY8JsoM54KaKvZikkKs2bg1ndtLVzHpJ6qFZC7QVjeHUh6/BmgvdjWPZYFTt N+KA9CWX3GQKKgN3uu988yznD7LnB98T4EUH1HA/GnfBqMV1gpzTvPc4qVQinCmIkEFp83zl +G5fCjJJ3W7ivzCnYo4KhKLpFUm97okTKR2LW3xZzEW4cLSWO387MTK3CzDOx5qe6s4a91Zu ZM/j/TQdTLDaqNn83kA4Hq48UHXYxcIh+Nd8k/3w6lFuoK0wrOFiywjLx+0ur5jmmbecBGHc 1xdhAFHOwU0ETIByZAEQAKaF678T9wyH4wjTrV1Pz3cDEoSnV/0ZUrOT37p1dcGyj/IXq1x6 70HRVahAmk0sZpYc25PF9D5GPYHFWlNjuPU96rDndXB3hedmBRhLdC4bAXjI4DV+bmdVe+q/ IMnlZRaVlm9EiMCVAR6w13sReu7qXkW9r3RwY2AzXskp/tAe4BRKr1Zmbvi2nbnQ6epEC42r Rbx0B1EhjbIQZ5JHGk24iPT7LdBgnNmos5wYjzwNlkMQD5T0Ydzhk7J+UxwA5m46mOhRDC2r FV/A0gm5TLy8DXjv/Esc4gYnYai6SQqnUEVh5LuV8YCJBnijs+Tiw71x1icmn6xGI45EugJO gec+rLypYgpVp4x0HI5T88qBRYCkxH3Kg8Qo+EWNA9A4LRQ9DX8njona0gf0s03tocK8kBN6 6UoqqPtHBnc4eMgBymCflK12eKfd2YYxnyg9cZazWA5VslvTxpm76hbg5oiAEH/Vg/8MxHyA nPhfrgwyPrmJEcVBafdspJnYQxBYNco2LFPIhlOvWh8r4at+s+M3Lb26oUTczlgdW1Sf3SDA 77BMRnF0FQyE+7AzV79MBN4ykiqaezQxtaF1Fy/tvkhffSo8u+dwG0EgJh+te38gTcISVr0G IPplLz6YhjrbHrPRF1CN5UuL9DBGjxuN35RLNVEfta6RUFlR6NctTjvrABEBAAHCwWUEGAEC AA8FAkyAcmQCGwwFCRLMAwAACgkQ7ZfpDmKqfjSrHA/+KzAKvTxRhA9MWNLxIyJ7S5uJ16gs T3oCjZrBKGEhKMOGX4O0GA6VOEryO7QRCCYah3oxSG38IAnNeiwJXgU9Bzkk85UGbPEd7HGF /VSeHCQwWou6jqUDTSDvn9YhNTdG0KXPM74aC+xr2Zow1O2mhXihgWKD0Dw+0LYPnUOsQ0KO FxHXXYHmRrS1OZPU59BLvc+TRhIhafSHKLwbXK+6ckkxBx6h8z5ccpG0Qs4bFhdFYnFrEieD LoGmnE2YLhdV6swJ9VNCS6pLiEohT3fm7aXm15tZOIyzMZhHRSAPblXxQ0ZSWjq8oRrcYNFx c4W1URpAkBCOYJoXvQfD5L3lqAl8TCqDUzYxhH/tJhbDdHrqHH767jaDaTB1+Talp/2AMKwc XNOdiklGxbmHVG6YGl6g8Lrbsu9NZEI4yLlHzuikthJWgz+3vZhVGyNlt+HNIoF6CjDL2omu 5cEq4RDHM44QqPk6l7O0pUvN1mT4B+S1b08RKpqm/ff015E37HNV/piIvJlxGAYz8PSfuGCB 1thMYqlmgdhd9/BabGFbGGYHA6U4/T5zqU+f6xHy1SsAQZ1MSKlLwekBIT+4/cLRGqCHjnV0 q5H/T6a7t5mPkbzSrOLSo4puj+IToNjYyYIDBWzhlA19avOa+rvUjmHtD3sFN7cXWtkGoi8b uNcby4U= Organization: UCLA Computer Science Department Message-ID: <e062523e-4c7b-2f8e-ad51-586b9c45706e@HIDDEN> Date: Thu, 26 Jul 2018 02:21:35 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180725081111.984911FBFB@HIDDEN> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 32267 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) Yes, this is a known issue with dd as with many other coreutils programs. Strictly speaking as I understand it, it is not a deviation from POSIX, since POSIX does not require support for locales with multibyte encodings. Still, it would be nice to fix dd at some point, although it'd be a pain to do correctly and efficiently and it's long been low priority since hardly anybody needs or uses this feature on any platform.
bug-coreutils@HIDDEN:bug#32267; Package coreutils.
Full text available.
Received: (at submit) by debbugs.gnu.org; 25 Jul 2018 08:11:30 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jul 25 04:11:30 2018
Received: from localhost ([127.0.0.1]:56100 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1fiEtO-0003KO-A7
for submit <at> debbugs.gnu.org; Wed, 25 Jul 2018 04:11:30 -0400
Received: from eggs.gnu.org ([208.118.235.92]:50763)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <ralph@HIDDEN>) id 1fiEtM-0003K9-Bl
for submit <at> debbugs.gnu.org; Wed, 25 Jul 2018 04:11:29 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from <ralph@HIDDEN>) id 1fiEtD-0001wQ-Ur
for submit <at> debbugs.gnu.org; Wed, 25 Jul 2018 04:11:22 -0400
Received: from lists.gnu.org ([2001:4830:134:3::11]:47303)
by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
(Exim 4.71) (envelope-from <ralph@HIDDEN>)
id 1fiEtD-0001wK-Qp
for submit <at> debbugs.gnu.org; Wed, 25 Jul 2018 04:11:19 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:41694)
by lists.gnu.org with esmtp (Exim 4.71)
(envelope-from <ralph@HIDDEN>) id 1fiEtA-0003VW-Jh
for bug-coreutils@HIDDEN; Wed, 25 Jul 2018 04:11:19 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from <ralph@HIDDEN>) id 1fiEt7-0001mv-IX
for bug-coreutils@HIDDEN; Wed, 25 Jul 2018 04:11:16 -0400
Received: from relay01.pair.com ([209.68.5.15]:49964)
by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
(Exim 4.71) (envelope-from <ralph@HIDDEN>)
id 1fiEt7-0001mR-D4
for bug-coreutils@HIDDEN; Wed, 25 Jul 2018 04:11:13 -0400
Received: from orac.inputplus.co.uk (unknown [81.174.201.153])
by relay01.pair.com (Postfix) with ESMTP id 9D1B4D010E4
for <bug-coreutils@HIDDEN>; Wed, 25 Jul 2018 04:11:12 -0400 (EDT)
Received: from orac.inputplus.co.uk (orac.inputplus.co.uk [IPv6:::1])
by orac.inputplus.co.uk (Postfix) with ESMTP id 984911FBFB;
Wed, 25 Jul 2018 09:11:11 +0100 (BST)
To: bug-coreutils@HIDDEN
From: Ralph Corderoy <ralph@HIDDEN>
Subject: dd's ucase and lcase and LC_CTYPE.
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Date: Wed, 25 Jul 2018 09:11:11 +0100
Message-Id: <20180725081111.984911FBFB@HIDDEN>
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.1 (----)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.1 (-----)
Hi,
Of dd(1), POSIX says
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html
lcase
Map uppercase characters specified by the LC_CTYPE keyword
tolower to the corresponding lowercase character. Characters
for which no mapping is specified shall not be modified by this
conversion.=20
and similarly for `ucase'.
But dd in coreutils 8.29-1 on Arch Linux just has a simple 256-byte
translation table that's mapped through tolower(3) or toupper(3).
http://pubs.opengroup.org/onlinepubs/9699919799/functions/tolower.html
describes tolower(3) as handling only `unsigned char' or EOF, and being
the identity function on all values where there isn't a lowercase letter
for the uppercase value.
This deviation isn't documented AFAICS. It means ASCII and ISO-8859-1
are re-cased just fine. UTF-8 has its ASCII subset altered, and other
bytes left alone, so the end result is valid UTF-8, but not fully
re-cased. But charmaps like /usr/share/i18n/charmaps/CP949.gz,
https://en.wikipedia.org/wiki/Unified_Hangul_Code, have variable-length
byte sequences where 0x41, for example, isn't always an ASCII `A' and
thus shouldn't become 0x61, `a'.
Aside from improving the documentation, actually fixing dd to match
POSIX will need to handle the re-cased character being a different
number of bytes; particularly noticeable if the output file is the input
file with `conv=3Dnotrunc'.
$ locale | grep LC_CTYPE
LC_CTYPE=3D"en_GB.utf8"
$
$ sed 'l; s/./\u&/; l' <<<=C8=BF
\310\277$
\342\261\276$
=E2=B1=BE
$ sed 'l; s/./\l&/; l' <<<=E2=B1=BE
\342\261\276$
\310\277$
=C8=BF
$
--=20
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy
Ralph Corderoy <ralph@HIDDEN>:bug-coreutils@HIDDEN.
Full text available.bug-coreutils@HIDDEN:bug#32267; Package coreutils.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.