Assaf Gordon <assafgordon@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Assaf Gordon <assafgordon@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Received: (at 32267) by debbugs.gnu.org; 26 Jul 2018 09:21:47 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Jul 26 05:21:47 2018 Received: from localhost ([127.0.0.1]:58297 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1ficSv-0004yU-Ai for submit <at> debbugs.gnu.org; Thu, 26 Jul 2018 05:21:45 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:55466) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <eggert@HIDDEN>) id 1ficSs-0004yE-Mh for 32267 <at> debbugs.gnu.org; Thu, 26 Jul 2018 05:21:43 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 1D588160656; Thu, 26 Jul 2018 02:21:37 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id NHOqJjFT8Yqo; Thu, 26 Jul 2018 02:21:36 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 6B605160657; Thu, 26 Jul 2018 02:21:36 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id azCQAtdBuJjj; Thu, 26 Jul 2018 02:21:36 -0700 (PDT) Received: from [192.168.1.9] (unknown [47.154.30.119]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 28D42160656; Thu, 26 Jul 2018 02:21:36 -0700 (PDT) Subject: Re: bug#32267: dd's ucase and lcase and LC_CTYPE. To: Ralph Corderoy <ralph@HIDDEN>, 32267 <at> debbugs.gnu.org References: <20180725081111.984911FBFB@HIDDEN> From: Paul Eggert <eggert@HIDDEN> Openpgp: preference=signencrypt Autocrypt: addr=eggert@HIDDEN; prefer-encrypt=mutual; keydata= xsFNBEyAcmQBEADAAyH2xoTu7ppG5D3a8FMZEon74dCvc4+q1XA2J2tBy2pwaTqfhpxxdGA9 Jj50UJ3PD4bSUEgN8tLZ0san47l5XTAFLi2456ciSl5m8sKaHlGdt9XmAAtmXqeZVIYX/UFS 96fDzf4xhEmm/y7LbYEPQdUdxu47xA5KhTYp5bltF3WYDz1Ygd7gx07Auwp7iw7eNvnoDTAl KAl8KYDZzbDNCQGEbpY3efZIvPdeI+FWQN4W+kghy+P6au6PrIIhYraeua7XDdb2LS1en3Ss mE3QjqfRqI/A2ue8JMwsvXe/WK38Ezs6x74iTaqI3AFH6ilAhDqpMnd/msSESNFt76DiO1ZK QMr9amVPknjfPmJISqdhgB1DlEdw34sROf6V8mZw0xfqT6PKE46LcFefzs0kbg4GORf8vjG2 Sf1tk5eU8MBiyN/bZ03bKNjNYMpODDQQwuP84kYLkX2wBxxMAhBxwbDVZudzxDZJ1C2VXujC OJVxq2kljBM9ETYuUGqd75AW2LXrLw6+MuIsHFAYAgRr7+KcwDgBAfwhPBYX34nSSiHlmLC+ KaHLeCLF5ZI2vKm3HEeCTtlOg7xZEONgwzL+fdKo+D6SoC8RRxJKs8a3sVfI4t6CnrQzvJbB n6gxdgCu5i29J1QCYrCYvql2UyFPAK+do99/1jOXT4m2836j1wARAQABzSBQYXVsIEVnZ2Vy dCA8ZWdnZXJ0QGNzLnVjbGEuZWR1PsLBfgQTAQIAKAUCTIByZAIbAwUJEswDAAYLCQgHAwIG FQgCCQoLBBYCAwECHgECF4AACgkQ7ZfpDmKqfjRRGw/+Ij03dhYfYl/gXVRiuzV1gGrbHk+t nfrI/C7fAeoFzQ5tVgVinShaPkZo0HTPf18x6IDEdAiO8Mqo1yp0CtHmzGMCJ50o4Grgfjlr 6g/+vtEOKbhleszN2XpJvpwM2QgGvn/laTLUu8PH9aRWTs7qJJZKKKAb4sxYc92FehPu6FOD 0dDiyhlDAq4lOV2mdBpzQbiojoZzQLMQwjpgCTK2572eK9EOEQySUThXrSIz6ASenp4NYTFH s9tuJQvXk9gZDdPSl3bp+47dGxlxEWLpBIM7zIONw4ks4azgT8nvDZxA5IZHtvqBlJLBObYY 0Le61Wp0y3TlBDh2qdK8eYL426W4scEMSuig5gb8OAtQiBW6k2sGUxxeiv8ovWu8YAZgKJfu oWI+uRnMEddruY8JsoM54KaKvZikkKs2bg1ndtLVzHpJ6qFZC7QVjeHUh6/BmgvdjWPZYFTt N+KA9CWX3GQKKgN3uu988yznD7LnB98T4EUH1HA/GnfBqMV1gpzTvPc4qVQinCmIkEFp83zl +G5fCjJJ3W7ivzCnYo4KhKLpFUm97okTKR2LW3xZzEW4cLSWO387MTK3CzDOx5qe6s4a91Zu ZM/j/TQdTLDaqNn83kA4Hq48UHXYxcIh+Nd8k/3w6lFuoK0wrOFiywjLx+0ur5jmmbecBGHc 1xdhAFHOwU0ETIByZAEQAKaF678T9wyH4wjTrV1Pz3cDEoSnV/0ZUrOT37p1dcGyj/IXq1x6 70HRVahAmk0sZpYc25PF9D5GPYHFWlNjuPU96rDndXB3hedmBRhLdC4bAXjI4DV+bmdVe+q/ IMnlZRaVlm9EiMCVAR6w13sReu7qXkW9r3RwY2AzXskp/tAe4BRKr1Zmbvi2nbnQ6epEC42r Rbx0B1EhjbIQZ5JHGk24iPT7LdBgnNmos5wYjzwNlkMQD5T0Ydzhk7J+UxwA5m46mOhRDC2r FV/A0gm5TLy8DXjv/Esc4gYnYai6SQqnUEVh5LuV8YCJBnijs+Tiw71x1icmn6xGI45EugJO gec+rLypYgpVp4x0HI5T88qBRYCkxH3Kg8Qo+EWNA9A4LRQ9DX8njona0gf0s03tocK8kBN6 6UoqqPtHBnc4eMgBymCflK12eKfd2YYxnyg9cZazWA5VslvTxpm76hbg5oiAEH/Vg/8MxHyA nPhfrgwyPrmJEcVBafdspJnYQxBYNco2LFPIhlOvWh8r4at+s+M3Lb26oUTczlgdW1Sf3SDA 77BMRnF0FQyE+7AzV79MBN4ykiqaezQxtaF1Fy/tvkhffSo8u+dwG0EgJh+te38gTcISVr0G IPplLz6YhjrbHrPRF1CN5UuL9DBGjxuN35RLNVEfta6RUFlR6NctTjvrABEBAAHCwWUEGAEC AA8FAkyAcmQCGwwFCRLMAwAACgkQ7ZfpDmKqfjSrHA/+KzAKvTxRhA9MWNLxIyJ7S5uJ16gs T3oCjZrBKGEhKMOGX4O0GA6VOEryO7QRCCYah3oxSG38IAnNeiwJXgU9Bzkk85UGbPEd7HGF /VSeHCQwWou6jqUDTSDvn9YhNTdG0KXPM74aC+xr2Zow1O2mhXihgWKD0Dw+0LYPnUOsQ0KO FxHXXYHmRrS1OZPU59BLvc+TRhIhafSHKLwbXK+6ckkxBx6h8z5ccpG0Qs4bFhdFYnFrEieD LoGmnE2YLhdV6swJ9VNCS6pLiEohT3fm7aXm15tZOIyzMZhHRSAPblXxQ0ZSWjq8oRrcYNFx c4W1URpAkBCOYJoXvQfD5L3lqAl8TCqDUzYxhH/tJhbDdHrqHH767jaDaTB1+Talp/2AMKwc XNOdiklGxbmHVG6YGl6g8Lrbsu9NZEI4yLlHzuikthJWgz+3vZhVGyNlt+HNIoF6CjDL2omu 5cEq4RDHM44QqPk6l7O0pUvN1mT4B+S1b08RKpqm/ff015E37HNV/piIvJlxGAYz8PSfuGCB 1thMYqlmgdhd9/BabGFbGGYHA6U4/T5zqU+f6xHy1SsAQZ1MSKlLwekBIT+4/cLRGqCHjnV0 q5H/T6a7t5mPkbzSrOLSo4puj+IToNjYyYIDBWzhlA19avOa+rvUjmHtD3sFN7cXWtkGoi8b uNcby4U= Organization: UCLA Computer Science Department Message-ID: <e062523e-4c7b-2f8e-ad51-586b9c45706e@HIDDEN> Date: Thu, 26 Jul 2018 02:21:35 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180725081111.984911FBFB@HIDDEN> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 32267 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -3.3 (---) Yes, this is a known issue with dd as with many other coreutils programs. Strictly speaking as I understand it, it is not a deviation from POSIX, since POSIX does not require support for locales with multibyte encodings. Still, it would be nice to fix dd at some point, although it'd be a pain to do correctly and efficiently and it's long been low priority since hardly anybody needs or uses this feature on any platform.
bug-coreutils@HIDDEN
:bug#32267
; Package coreutils
.
Full text available.Received: (at submit) by debbugs.gnu.org; 25 Jul 2018 08:11:30 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Jul 25 04:11:30 2018 Received: from localhost ([127.0.0.1]:56100 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1fiEtO-0003KO-A7 for submit <at> debbugs.gnu.org; Wed, 25 Jul 2018 04:11:30 -0400 Received: from eggs.gnu.org ([208.118.235.92]:50763) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <ralph@HIDDEN>) id 1fiEtM-0003K9-Bl for submit <at> debbugs.gnu.org; Wed, 25 Jul 2018 04:11:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <ralph@HIDDEN>) id 1fiEtD-0001wQ-Ur for submit <at> debbugs.gnu.org; Wed, 25 Jul 2018 04:11:22 -0400 Received: from lists.gnu.org ([2001:4830:134:3::11]:47303) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from <ralph@HIDDEN>) id 1fiEtD-0001wK-Qp for submit <at> debbugs.gnu.org; Wed, 25 Jul 2018 04:11:19 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41694) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <ralph@HIDDEN>) id 1fiEtA-0003VW-Jh for bug-coreutils@HIDDEN; Wed, 25 Jul 2018 04:11:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <ralph@HIDDEN>) id 1fiEt7-0001mv-IX for bug-coreutils@HIDDEN; Wed, 25 Jul 2018 04:11:16 -0400 Received: from relay01.pair.com ([209.68.5.15]:49964) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from <ralph@HIDDEN>) id 1fiEt7-0001mR-D4 for bug-coreutils@HIDDEN; Wed, 25 Jul 2018 04:11:13 -0400 Received: from orac.inputplus.co.uk (unknown [81.174.201.153]) by relay01.pair.com (Postfix) with ESMTP id 9D1B4D010E4 for <bug-coreutils@HIDDEN>; Wed, 25 Jul 2018 04:11:12 -0400 (EDT) Received: from orac.inputplus.co.uk (orac.inputplus.co.uk [IPv6:::1]) by orac.inputplus.co.uk (Postfix) with ESMTP id 984911FBFB; Wed, 25 Jul 2018 09:11:11 +0100 (BST) To: bug-coreutils@HIDDEN From: Ralph Corderoy <ralph@HIDDEN> Subject: dd's ucase and lcase and LC_CTYPE. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Date: Wed, 25 Jul 2018 09:11:11 +0100 Message-Id: <20180725081111.984911FBFB@HIDDEN> Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.1 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -5.1 (-----) Hi, Of dd(1), POSIX says http://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html lcase Map uppercase characters specified by the LC_CTYPE keyword tolower to the corresponding lowercase character. Characters for which no mapping is specified shall not be modified by this conversion.=20 and similarly for `ucase'. But dd in coreutils 8.29-1 on Arch Linux just has a simple 256-byte translation table that's mapped through tolower(3) or toupper(3). http://pubs.opengroup.org/onlinepubs/9699919799/functions/tolower.html describes tolower(3) as handling only `unsigned char' or EOF, and being the identity function on all values where there isn't a lowercase letter for the uppercase value. This deviation isn't documented AFAICS. It means ASCII and ISO-8859-1 are re-cased just fine. UTF-8 has its ASCII subset altered, and other bytes left alone, so the end result is valid UTF-8, but not fully re-cased. But charmaps like /usr/share/i18n/charmaps/CP949.gz, https://en.wikipedia.org/wiki/Unified_Hangul_Code, have variable-length byte sequences where 0x41, for example, isn't always an ASCII `A' and thus shouldn't become 0x61, `a'. Aside from improving the documentation, actually fixing dd to match POSIX will need to handle the re-cased character being a different number of bytes; particularly noticeable if the output file is the input file with `conv=3Dnotrunc'. $ locale | grep LC_CTYPE LC_CTYPE=3D"en_GB.utf8" $ $ sed 'l; s/./\u&/; l' <<<=C8=BF \310\277$ \342\261\276$ =E2=B1=BE $ sed 'l; s/./\l&/; l' <<<=E2=B1=BE \342\261\276$ \310\277$ =C8=BF $ --=20 Cheers, Ralph. https://plus.google.com/+RalphCorderoy
Ralph Corderoy <ralph@HIDDEN>
:bug-coreutils@HIDDEN
.
Full text available.bug-coreutils@HIDDEN
:bug#32267
; Package coreutils
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.