Assaf Gordon <assafgordon@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Assaf Gordon <assafgordon@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.era eriksson <era@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Received: (at submit) by debbugs.gnu.org; 3 Feb 2011 12:49:38 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Feb 03 07:49:38 2011 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Pkyd0-0005um-IQ for submit <at> debbugs.gnu.org; Thu, 03 Feb 2011 07:49:38 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <Ulf.Zibis@HIDDEN>) id 1Pkycy-0005uZ-8k for submit <at> debbugs.gnu.org; Thu, 03 Feb 2011 07:49:36 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <Ulf.Zibis@HIDDEN>) id 1Pkyl9-0001li-Ox for submit <at> debbugs.gnu.org; Thu, 03 Feb 2011 07:58:04 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:56281) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <Ulf.Zibis@HIDDEN>) id 1Pkyl9-0001le-Mv for submit <at> debbugs.gnu.org; Thu, 03 Feb 2011 07:58:03 -0500 Received: from [140.186.70.92] (port=40062 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pkyl8-0000qk-Sc for bug-coreutils@HIDDEN; Thu, 03 Feb 2011 07:58:03 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <Ulf.Zibis@HIDDEN>) id 1Pkyl7-0001lE-Tn for bug-coreutils@HIDDEN; Thu, 03 Feb 2011 07:58:02 -0500 Received: from mailout-de.gmx.net ([213.165.64.22]:52874) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from <Ulf.Zibis@HIDDEN>) id 1Pkyl7-0001Z1-BX for bug-coreutils@HIDDEN; Thu, 03 Feb 2011 07:58:01 -0500 Received: (qmail invoked by alias); 03 Feb 2011 12:57:33 -0000 Received: from dslb-188-100-063-138.pools.arcor-ip.net (EHLO [127.0.0.1]) [188.100.63.138] by mail.gmx.net (mp002) with SMTP; 03 Feb 2011 13:57:33 +0100 X-Authenticated: #3615077 X-Provags-ID: V01U2FsdGVkX1+pUtB5AH2pyv9CZf8IQEfvOrymZyXzCqI31jNtba sStP5H3LtfOaeD Message-ID: <4D4AA63A.50903@HIDDEN> Date: Thu, 03 Feb 2011 13:57:30 +0100 From: Ulf Zibis <Ulf.Zibis@HIDDEN> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 MIME-Version: 1.0 To: Paul Eggert <eggert@HIDDEN> Subject: Re: bug#7948: 16-bit wchar_t on Windows and Cygwin References: <201101310304.42975.bruno@HIDDEN> <4D46EA2B.1010307@HIDDEN> <201102021229.04623.bruno@HIDDEN> <4D4999BA.2030100@HIDDEN> In-Reply-To: <4D4999BA.2030100@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Antivirus: avast! (VPS 110203-1, 03.02.2011), Outbound message X-Antivirus-Status: Clean X-Y-GMX-Trusted: 0 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 213.165.64.22 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 199.232.76.165 X-Spam-Score: -4.8 (----) X-Debbugs-Envelope-To: submit Cc: bug-coreutils <bug-coreutils@HIDDEN>, cygwin <cygwin@HIDDEN>, bug-gnulib@HIDDEN, Bruno Haible <bruno@HIDDEN>, Eric Blake <eblake@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -5.0 (-----) Hi, I think there is a kind of similar bug in discussion on GNU: bug#7960: [PATCH] fmt: fix formatting multibyte text (bug #7372) -Ulf Am 02.02.2011 18:51, schrieb Paul Eggert: > On 02/02/11 03:29, Bruno Haible wrote: >> - Define a type 'wwchar_t' on all platforms, equivalent to uint32_t >> on Windows platforms and to 'wchar_t' otherwise. > As a minor point, would it be OK to call this type > 'xchar_t' instead? 'x' is the successor to 'w', after all, > and it can be thought of as an abbreviation for 'eXtended'. > > A problem with the 'ww' prefix is that mentally I start thinking > "World Wide ..." > > > >
owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN
:bug#7948
; Package coreutils
.
Full text available.Received: (at submit) by debbugs.gnu.org; 2 Feb 2011 20:35:40 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 02 15:35:40 2011 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1PkjQS-0000SC-2f for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 15:35:40 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <andy.koppe@HIDDEN>) id 1PkjQP-0000Ry-Vk for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 15:35:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <andy.koppe@HIDDEN>) id 1PkjYV-0004AK-Vr for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 15:44:04 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, T_DKIM_INVALID, T_TO_NO_BRKTS_FREEMAIL autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:44140) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <andy.koppe@HIDDEN>) id 1PkjYV-00049J-1Q for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 15:43:59 -0500 Received: from [140.186.70.92] (port=41251 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PkjYM-0006dg-2F for bug-coreutils@HIDDEN; Wed, 02 Feb 2011 15:43:56 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <andy.koppe@HIDDEN>) id 1PkjYF-00045v-1b for bug-coreutils@HIDDEN; Wed, 02 Feb 2011 15:43:47 -0500 Received: from mail-gy0-f169.google.com ([209.85.160.169]:33210) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <andy.koppe@HIDDEN>) id 1PkjXq-000415-1I; Wed, 02 Feb 2011 15:43:18 -0500 Received: by gyd10 with SMTP id 10so209664gyd.0 for <multiple recipients>; Wed, 02 Feb 2011 12:43:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=q5KeEarZzXwjw+7ln2PUh3CiGVJJ2VCY9jYGVxr2/cQ=; b=F1PaYKqvU63ibMqN5zoS0wdUioXiGuEkqBx2afz2xu+rOdqHnmKU0lSfVGgrP6W0CW 70BKitUd2ZHPwTwfr+B/JIjJ52Go8ngQ9UkTGPflVn13PNL7F63eGhlaSt9Nwh8F8DtG yQd46o+UOVmmF9qHQJMm3uj7SHaSjXu6DN6jQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=v8gcRp0HKPKQmGs3hcmIri7ybEWFCcdwH0eIb18AUchm30C712aeR/Vf+VLNw3cr0L +lHjx7zYcPNE3aCBO8u+MArFJPQGMOS9hfm2XXED2S+zJnpQuRXL08ORoy1sh0NFWVwK 64nKW1LCOYd+O/xu4wsb3L/yaT0ELQNMJATlE= MIME-Version: 1.0 Received: by 10.151.153.12 with SMTP id f12mr3628911ybo.81.1296679397118; Wed, 02 Feb 2011 12:43:17 -0800 (PST) Received: by 10.147.172.19 with HTTP; Wed, 2 Feb 2011 12:43:17 -0800 (PST) In-Reply-To: <201102021957.07676.bruno@HIDDEN> References: <201101310304.42975.bruno@HIDDEN> <201102021229.04623.bruno@HIDDEN> <4D4999BA.2030100@HIDDEN> <201102021957.07676.bruno@HIDDEN> Date: Wed, 2 Feb 2011 20:43:17 +0000 Message-ID: <AANLkTikRJxssP7OLr7O+DQZr-BpjpEZJ8Pe2uJ=msDbh@HIDDEN> Subject: Re: bug#7948: 16-bit wchar_t on Windows and Cygwin From: Andy Koppe <andy.koppe@HIDDEN> To: cygwin@HIDDEN Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 209.85.160.169 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 199.232.76.165 X-Spam-Score: -5.9 (-----) X-Debbugs-Envelope-To: submit Cc: bug-gnulib@HIDDEN, Paul Eggert <eggert@HIDDEN>, Eric Blake <eblake@HIDDEN>, bug-coreutils <bug-coreutils@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -5.9 (-----) On 2 February 2011 18:57, Bruno Haible wrote: > Hi Paul, > >> > =C2=A0 - Define a type 'wwchar_t' on all platforms, equivalent to uint= 32_t >> > =C2=A0 =C2=A0 on Windows platforms and to 'wchar_t' otherwise. >> >> As a minor point, would it be OK to call this type >> 'xchar_t' instead? =C2=A0'x' is the successor to 'w', after all, >> and it can be thought of as an abbreviation for 'eXtended'. > > 'wwchar_t' means "wide wide character". > > In fact it's not really an "extended" character or "complex character". > It's just what POSIX calls a 'wchar_t'. It's extended in the sense that the original Unicode was only 16 bits wide (which of course is why wchar_t on Windows is 16 bits). Also, I think 'xchar_t' is less prone to typos, in particular forgetting one of the dubyas. Andy
owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN
:bug#7948
; Package coreutils
.
Full text available.Received: (at submit) by debbugs.gnu.org; 2 Feb 2011 18:49:23 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 02 13:49:23 2011 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Pkhlb-0004v4-3S for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 13:49:23 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <bruno@HIDDEN>) id 1PkhlY-0004us-C5 for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 13:49:20 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <bruno@HIDDEN>) id 1Pkhtd-00039d-8O for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 13:57:47 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_NONE, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:40470) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <bruno@HIDDEN>) id 1Pkhtd-00039Z-5V for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 13:57:41 -0500 Received: from [140.186.70.92] (port=57830 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PkhtX-0002P8-4C for bug-coreutils@HIDDEN; Wed, 02 Feb 2011 13:57:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <bruno@HIDDEN>) id 1PkhtQ-000357-Q3 for bug-coreutils@HIDDEN; Wed, 02 Feb 2011 13:57:35 -0500 Received: from mo-p00-ob.rzone.de ([81.169.146.162]:40944) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <bruno@HIDDEN>) id 1PkhtG-00032A-E9; Wed, 02 Feb 2011 13:57:18 -0500 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; t=1296673036; l=1141; s=domk; d=haible.de; h=Content-Transfer-Encoding:Content-Type:MIME-Version:In-Reply-To: References:Cc:Date:Subject:To:From:X-RZG-CLASS-ID:X-RZG-AUTH; bh=rNNLWGarGyDCzfVPXsRkU2mE+W0=; b=fLZW5cz2uFCzabcxgNaolvZUWvNX8HFM7IqdCn5Qxyid9KVjzgV4k/2nCoaHy1aYhZB KCCUuMTTWMObJs+q++eptHtpzBWidlbJWBxkgTBl2mWn0v0IS9f6APVWsHHBDuPqi+Phc xHJIoI9LygULqoUtwSXy6aQX+yWH0AoqltQ= X-RZG-AUTH: :Ln4Re0+Ic/6oZXR1YgKryK8brksyK8dozXDwHXjf9hj/zDNRbfA44+iwyQ== X-RZG-CLASS-ID: mo00 Received: from linuix.haible.de (dslb-088-068-046-137.pools.arcor-ip.net [88.68.46.137]) by post.strato.de (mrclete mo38) (RZmta 25.1) with ESMTPA id j025a9n12H80pF ; Wed, 2 Feb 2011 19:57:08 +0100 (MET) From: Bruno Haible <bruno@HIDDEN> To: Paul Eggert <eggert@HIDDEN> Subject: Re: bug#7948: 16-bit wchar_t on Windows and Cygwin Date: Wed, 2 Feb 2011 19:57:06 +0100 User-Agent: KMail/1.9.9 References: <201101310304.42975.bruno@HIDDEN> <201102021229.04623.bruno@HIDDEN> <4D4999BA.2030100@HIDDEN> In-Reply-To: <4D4999BA.2030100@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <201102021957.07676.bruno@HIDDEN> X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-Received-From: 81.169.146.162 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 199.232.76.165 X-Spam-Score: -5.6 (-----) X-Debbugs-Envelope-To: submit Cc: bug-gnulib@HIDDEN, cygwin <cygwin@HIDDEN>, bug-coreutils <bug-coreutils@HIDDEN>, Eric Blake <eblake@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -5.6 (-----) Hi Paul, > > - Define a type 'wwchar_t' on all platforms, equivalent to uint32_t > > on Windows platforms and to 'wchar_t' otherwise. > > As a minor point, would it be OK to call this type > 'xchar_t' instead? 'x' is the successor to 'w', after all, > and it can be thought of as an abbreviation for 'eXtended'. 'wwchar_t' means "wide wide character". In fact it's not really an "extended" character or "complex character". It's just what POSIX calls a 'wchar_t'. I like the analogy between strtol and strtoll. In the beginning, people thought a 'long int' would be enough for everything. Then they discovered a 'long long int' is needed. The same story repeats itself here with the "wide characters" which turn out to be not wide enough, and "wide wide characters" are needed. > A problem with the 'ww' prefix is that mentally I start thinking > "World Wide ..." Indeed this meaning can come to mind, but I think it's not dangerous since the term "world wide" has no meaning in a programming language. Bruno -- In memoriam Carl Friedrich Goerdeler <http://en.wikipedia.org/wiki/Carl_Friedrich_Goerdeler>
owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN
:bug#7948
; Package coreutils
.
Full text available.Received: (at submit) by debbugs.gnu.org; 2 Feb 2011 17:43:39 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 02 12:43:39 2011 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Pkgjz-0002hl-92 for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 12:43:39 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <eggert@HIDDEN>) id 1Pkgjx-0002hX-Gs for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 12:43:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <eggert@HIDDEN>) id 1Pkgs6-0002qO-CH for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 12:52:03 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:49425) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eggert@HIDDEN>) id 1Pkgs6-0002qK-AS for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 12:52:02 -0500 Received: from [140.186.70.92] (port=34449 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pkgs5-0003sc-8r for bug-coreutils@HIDDEN; Wed, 02 Feb 2011 12:52:02 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <eggert@HIDDEN>) id 1Pkgs3-0002pQ-I5 for bug-coreutils@HIDDEN; Wed, 02 Feb 2011 12:52:00 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:34233) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eggert@HIDDEN>) id 1Pkgs3-0002oY-Cn; Wed, 02 Feb 2011 12:51:59 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 8A82039E80DF; Wed, 2 Feb 2011 09:51:55 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Azivr9SXO1yO; Wed, 2 Feb 2011 09:51:55 -0800 (PST) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 109D439E80DB; Wed, 2 Feb 2011 09:51:55 -0800 (PST) Message-ID: <4D4999BA.2030100@HIDDEN> Date: Wed, 02 Feb 2011 09:51:54 -0800 From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7 MIME-Version: 1.0 To: Bruno Haible <bruno@HIDDEN> Subject: Re: bug#7948: 16-bit wchar_t on Windows and Cygwin References: <201101310304.42975.bruno@HIDDEN> <4D46EA2B.1010307@HIDDEN> <201102021229.04623.bruno@HIDDEN> In-Reply-To: <201102021229.04623.bruno@HIDDEN> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 131.179.128.62 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 199.232.76.165 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit Cc: bug-gnulib@HIDDEN, cygwin <cygwin@HIDDEN>, bug-coreutils <bug-coreutils@HIDDEN>, Eric Blake <eblake@HIDDEN> X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -5.0 (-----) On 02/02/11 03:29, Bruno Haible wrote: > - Define a type 'wwchar_t' on all platforms, equivalent to uint32_t > on Windows platforms and to 'wchar_t' otherwise. As a minor point, would it be OK to call this type 'xchar_t' instead? 'x' is the successor to 'w', after all, and it can be thought of as an abbreviation for 'eXtended'. A problem with the 'ww' prefix is that mentally I start thinking "World Wide ..."
owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN
:bug#7948
; Package coreutils
.
Full text available.Received: (at submit) by debbugs.gnu.org; 2 Feb 2011 14:24:00 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 02 09:24:00 2011 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1Pkdcl-0006cP-U8 for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 09:24:00 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <bruno@HIDDEN>) id 1Pkdcj-0006cC-7C for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 09:23:58 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <bruno@HIDDEN>) id 1Pkdkh-0002iM-2S for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 09:32:23 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_NONE, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:58544) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <bruno@HIDDEN>) id 1Pkdkg-0002hZ-QO for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 09:32:11 -0500 Received: from [140.186.70.92] (port=60984 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PkdkZ-0005Lw-OG for bug-coreutils@HIDDEN; Wed, 02 Feb 2011 09:32:12 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <bruno@HIDDEN>) id 1Pkatm-0007OW-Hv for bug-coreutils@HIDDEN; Wed, 02 Feb 2011 06:29:23 -0500 Received: from mo-p00-ob.rzone.de ([81.169.146.160]:44236) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <bruno@HIDDEN>) id 1Pkatj-0007Nt-7G; Wed, 02 Feb 2011 06:29:19 -0500 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; t=1296646156; l=7421; s=domk; d=haible.de; h=Content-Transfer-Encoding:Content-Type:MIME-Version:In-Reply-To: References:Cc:Date:Subject:To:From:X-RZG-CLASS-ID:X-RZG-AUTH; bh=8lKnBdKfiZWChzYUXVMZU2A2mzs=; b=NiQJReQ7zCBdQs/KkCcBzpRLjp9K534R/XewKBe8ioyEtz21ZKNaka0AkIyJWcjwBHD JaKCZYJiNrvq3QPpDgmsfJZq2VfjStW9vQHBDedp83Oue+G3w0bBmGSjhbjBdgRnnuDwP e7JP3nWeZ4oRt1/gyJcur4bwEXuBI8LskWE= X-RZG-AUTH: :Ln4Re0+Ic/6oZXR1YgKryK8brksyK8dozXDwHXjf9hj/zDNRbfA44+iwyQ== X-RZG-CLASS-ID: mo00 Received: from linuix.haible.de (dslb-088-068-046-137.pools.arcor-ip.net [88.68.46.137]) by post.strato.de (klopstock mo12) (RZmta 25.2) with ESMTPA id N00214n12B8n4M ; Wed, 2 Feb 2011 12:29:05 +0100 (MET) From: Bruno Haible <bruno@HIDDEN> To: Eric Blake <eblake@HIDDEN> Subject: Re: 16-bit wchar_t on Windows and Cygwin Date: Wed, 2 Feb 2011 12:29:03 +0100 User-Agent: KMail/1.9.9 References: <201101310304.42975.bruno@HIDDEN> <4D46EA2B.1010307@HIDDEN> In-Reply-To: <4D46EA2B.1010307@HIDDEN> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <201102021229.04623.bruno@HIDDEN> X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-Received-From: 81.169.146.160 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 199.232.76.165 X-Spam-Score: -5.4 (-----) X-Debbugs-Envelope-To: submit Cc: bug-coreutils <bug-coreutils@HIDDEN>, cygwin <cygwin@HIDDEN>, bug-gnulib@HIDDEN X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -5.4 (-----) Hello Eric, > ... POSIX requires that 1 wchar_t corresponds to 1 character > ... > > What consequences does this have? > > > > 1) All code that uses the functions from <wctype.h> (wide character > > classification and mapping) or wcwidth() malfunctions on strings that > > contains Unicode characters outside the BMP, i.e. outside the range > > U+0000..U+FFFF. > > Not necessarily. Such code falls outside of POSIX, but it may still be > a well-behaved extension if given sane behavior for how to deal with > surrogates. No. Code that uses <wctype.h> and wcwidth() is written precisely according to POSIX. The problem is that this code cannot work correctly when wchar_t[] is in UTF-16 encoding. There simply is no way to define these functions in a reasonable way for surrogates. For example: U+1031E = 0xD800 0xDF1E is a letter (iswalpha should be true) U+10320 = 0xD800 0xDF20 is not a letter (iswalpha should be false) U+1D31E = 0xD834 0xDF1E is not a letter (iswalpha should be false) U+1D320 = 0xD834 0xDF20 is not a letter (iswalpha should be false) U+1D71E = 0xD835 0xDF1E is a letter (iswalpha should be true) U+1D720 = 0xD835 0xDF20 is a letter (iswalpha should be true) There is no way that a system can provide this information through a function 'iswalpha' that takes a single wchar_t argument. It would be possible to provide this information - either through a function iswalpha2 (wchar_t wc1, wchar_t wc2) that takes two wchar_t arguments, - or through a function uc_is_alpha (ucs4_t uc), but that is not POSIX, and it would require rewriting each and every piece of code that currently uses <wctype.h> in the POSIX way. > we can (try) to make the various wc* functions try to > behave as smartly as possible (as is the case with Cygwin); where those > smarts are only needed when you use surrogate pairs. The point is that this approach can work fine for mbrtowc() and wcrtomb(), but it cannot yield a working definition for the <wctype.h> functions and wcwidth(). > > 2) Code that uses mbrtowc() or wcrtomb() is also likely to malfunction. > > On Cygwin >= 1.7 mbrtowc() and wcrtomb() is implemented in an intelligent > > but somewhat surprising way: wcrtomb() may return 0, that is, produce no > > output bytes when it consumes a wchar_t. > > > Now with a chinese character outside the BMP: > > $ > > 1 4 > > $ printf 'a \xf0\xa1\x88\xb4 b\n' | wc -w -m > > 3 6 > > > > On Cygwin 1.7.5 (with LANG=C.UTF-8 and 'wc' from GNU coreutils 8.5): > > > > $ printf 'a\xf0\xa1\x88\xb4b\n' | wc -w -m > > 1 5 > > $ printf 'a \xf0\xa1\x88\xb4 b\n' | wc -w -m > > 2 7 > > > > So both the number of characters and the number of words are counted > > wrong as soon as non-BMP characters occur. > > > > Does this represent a bug in cygwin's mbrtowc routines that could be > fixed by cygwin? > > Or, does this represent a bug in coreutils for using mbrtowc one > character at a time instead of something like mbsrtowcs to do bulk > conversions? We agree that it is a bug. And it is caused by - the fact that Cygwin's wchar_t[] encoding is UTF-16, and - there is no way to define the <wctype.h> POSIX functions sanely in this setting, and - coreutils and gnulib make use of the POSIX functions. Even if coreutils were to use mbsrtowcs instead of repeated use of mbrtowc, there would be no way for it to produce the correct result without combining surrogates into entire characters. > And if we decide that cygwin's extensions are sane, how much harder is > it to characterize what a program must do to be portable to both 16-bit > and 32-bit wchar_t if they are guaranteed the same behavior for all > hosts of the same-size wchar_t? In other words, would it really require > that many #ifdefs in coreutils to portably and simultaneously support > both sizes of wchar_t? It would require 1. to change the conversions that use mbrtowc to either convert an entire string at once (use mbsrtowcs), or make a second call to mbrtowc once the first call to mbrtowc has determined a low surrogate. 2. to change all uses of <wctype.h> and wcwidth() to use different functions, either functions that take 2 wchar_t arguments, or functions that require the caller to combine the surrogates. This means, lots of logic that goes against the spirit of wchar_t in ANSI C Amd. 1 and POSIX. > > I'm more in favour of overriding wchar_t and all functions that depend on it - > > like we did successfully for the socket functions. > > > > In practice, this would mean that on Windows (both native Windows and > > Cygwin >= 1.7) the use of a 'wchar_t' module will > > - override wchar_t to be 32 bits, like in glibc, > > - cause functions from mbrtowc() to wcwidth() to be overridden. Since the > > corresponding system functions are unusable, the replacements will use the > > modules from libunistring (such as unictype/ctype-alnum and uniwidth/width). > ... > compiler primitives, like L"xyz", which result in 16-bit wchar_t > arrays, will be unusable Good point. I agree then that overriding wchar_t should better not be done. > C1x will be adding compiler support for mandatory char16_t and char32_t > types for UTF-16 and UTF-32 data, independently of whether wchar_t is > 16-bit or 32-bit; maybe the better thing is to proactively start > providing the new interfaces in <uchar.h> that will result from C1x > adoption (and convert GNU programs to use this rather than wchar_t for > character operations) > > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1516.pdf lists: A newer draft is at https://www.opengroup.org/platform/single_unix_specification/uploads/40/23495/n1548.pdf This is a good point, but would have two drawbacks: - It throws out the use of a POSIX API for a not-yet-standard API, - Performance: For the non-UTF-8 locales (ISO-8859-15, EUC-JP, and similar) on platforms like MacOS X, FreeBSD, Solaris, the 'wchar_t' representation is essentially a packed multibyte representation. Which makes mbrtowc() fast, because it does not have to do a table lookup for the conversion from/to Unicode. If you use mbrtoc32 instead of mbrtowc, you add extra runtime overhead for a conversion to Unicode, that would not be necessary when using mbrtowc(). In other words, your proposal would solve the Windows wchar_t problem, but at the price of a performance penalty on traditional Unix systems. Here's a new proposal: - Define a type 'wwchar_t' on all platforms, equivalent to uint32_t on Windows platforms and to 'wchar_t' otherwise. - Define functions 'mbrtowwc', 'iswwalpha', 'wwcwidth', and similar. Their definition will be a trivial redirection to 'mbrtowc', 'iswalpha', 'wcwidth' on most platforms, and a use of libunistring modules on Windows platforms. With this proposal, - The code that uses <wctype.h> has to be changed, but in a trivial way that introduces no complicated logic: Just change 'w' to 'ww'. Not more difficult than, say, using strtoll() instead of strtol(). - The runtime penalty on non-Windows systems is minimal. - On Windows platforms, surrogates are handled correctly, and code that uses wchar_t or <windows.h> is left alone. How does that sound? Comments? Bruno -- In memoriam Carl Friedrich Goerdeler <http://en.wikipedia.org/wiki/Carl_Friedrich_Goerdeler>
owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN
:bug#7948
; Package coreutils
.
Full text available.Received: (at submit) by debbugs.gnu.org; 31 Jan 2011 16:50:23 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Mon Jan 31 11:50:23 2011 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1PjwxK-0007y1-W1 for submit <at> debbugs.gnu.org; Mon, 31 Jan 2011 11:50:23 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <eblake@HIDDEN>) id 1PjwxI-0007xp-GR for submit <at> debbugs.gnu.org; Mon, 31 Jan 2011 11:50:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <eblake@HIDDEN>) id 1Pjx5M-0004Q3-Cc for submit <at> debbugs.gnu.org; Mon, 31 Jan 2011 11:58:41 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_HI, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:42744) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eblake@HIDDEN>) id 1Pjx5M-0004Px-9S for submit <at> debbugs.gnu.org; Mon, 31 Jan 2011 11:58:40 -0500 Received: from [140.186.70.92] (port=46328 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pjx5K-00005y-HL for bug-coreutils@HIDDEN; Mon, 31 Jan 2011 11:58:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <eblake@HIDDEN>) id 1Pjx5I-0004OT-Sw for bug-coreutils@HIDDEN; Mon, 31 Jan 2011 11:58:38 -0500 Received: from mx1.redhat.com ([209.132.183.28]:45651) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eblake@HIDDEN>) id 1Pjx5F-0004MY-D2; Mon, 31 Jan 2011 11:58:33 -0500 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id p0VGwL0L023207 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 31 Jan 2011 11:58:21 -0500 Received: from [10.3.113.114] (ovpn-113-114.phx2.redhat.com [10.3.113.114]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p0VGwJ9l013308; Mon, 31 Jan 2011 11:58:20 -0500 Message-ID: <4D46EA2B.1010307@HIDDEN> Date: Mon, 31 Jan 2011 09:58:19 -0700 From: Eric Blake <eblake@HIDDEN> Organization: Red Hat User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.7 MIME-Version: 1.0 To: Bruno Haible <bruno@HIDDEN> Subject: Re: 16-bit wchar_t on Windows and Cygwin References: <201101310304.42975.bruno@HIDDEN> In-Reply-To: <201101310304.42975.bruno@HIDDEN> X-Enigmail-Version: 1.1.2 OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------enig94CF3FEB4BA742E2A08505A3" X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.132.183.28 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 199.232.76.165 X-Spam-Score: -7.9 (-------) X-Debbugs-Envelope-To: submit Cc: bug-coreutils <bug-coreutils@HIDDEN>, cygwin <cygwin@HIDDEN>, bug-gnulib@HIDDEN X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -7.9 (-------) This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig94CF3FEB4BA742E2A08505A3 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable [adding cygwin and coreutils for a wc issue] On 01/30/2011 07:04 PM, Bruno Haible wrote: > Hi, >=20 > It is known for a long time that on native Windows, the wchar_t[] encod= ing on > strings is UTF-16. [1] Now, Corinna Vinschen has confirmed that it is t= he same > for Cygwin >=3D 1.7. [2] POSIX requires that 1 wchar_t corresponds to 1 character; so any use of surrogates to get the full benefit of UTF-16 falls outside the bounds of POSIX. At which point, the POSIX definition of those functions no longer apply, and we can (try) to make the various wc* functions try to behave as smartly as possible (as is the case with Cygwin); where those smarts are only needed when you use surrogate pairs. If cygwin's approach is correct, then maybe the thing to do is codify those smarts for all implementations with 16-bit wchar_t as an extension to POSIX that all gnulib clients can rely on, and thus minimize the #ifdefs in such clients. > What consequences does this have? >=20 > 1) All code that uses the functions from <wctype.h> (wide character > classification and mapping) or wcwidth() malfunctions on strings t= hat > contains Unicode characters outside the BMP, i.e. outside the rang= e > U+0000..U+FFFF. Not necessarily. Such code falls outside of POSIX, but it may still be a well-behaved extension if given sane behavior for how to deal with surrogates. > 2) Code that uses mbrtowc() or wcrtomb() is also likely to malfunctio= n. > On Cygwin >=3D 1.7 mbrtowc() and wcrtomb() is implemented in an in= telligent > but somewhat surprising way: wcrtomb() may return 0, that is, prod= uce no > output bytes when it consumes a wchar_t. > Now with a chinese character outside the BMP: > $ =09 > 1 4 > $ printf 'a \xf0\xa1\x88\xb4 b\n' | wc -w -m > 3 6 >=20 > On Cygwin 1.7.5 (with LANG=3DC.UTF-8 and 'wc' from GNU coreutils 8.5)= : >=20 > $ printf 'a\xf0\xa1\x88\xb4b\n' | wc -w -m > 1 5 > $ printf 'a \xf0\xa1\x88\xb4 b\n' | wc -w -m > 2 7 > > So both the number of characters and the number of words are counted > wrong as soon as non-BMP characters occur. > Does this represent a bug in cygwin's mbrtowc routines that could be fixed by cygwin? Or, does this represent a bug in coreutils for using mbrtowc one character at a time instead of something like mbsrtowcs to do bulk conversions? And if we decide that cygwin's extensions are sane, how much harder is it to characterize what a program must do to be portable to both 16-bit and 32-bit wchar_t if they are guaranteed the same behavior for all hosts of the same-size wchar_t? In other words, would it really require that many #ifdefs in coreutils to portably and simultaneously support both sizes of wchar_t? > I'm more in favour of overriding wchar_t and all functions that depend = on it - > like we did successfully for the socket functions. >=20 > In practice, this would mean that on Windows (both native Windows and > Cygwin >=3D 1.7) the use of a 'wchar_t' module will > - override wchar_t to be 32 bits, like in glibc, > - cause functions from mbrtowc() to wcwidth() to be overridden. Since= the > corresponding system functions are unusable, the replacements will = use the > modules from libunistring (such as unictype/ctype-alnum and uniwidt= h/width). That's a lot of overriding, for anything that uses wchar_t in its API, and throws out a lot of what cygwin already provides. It also means that compiler primitives, like L"xyz", which result in 16-bit wchar_t arrays, will be unusable with your 32-bit wchar_t override. In other words, I don't think it's a good idea to be doing that. C1x will be adding compiler support for mandatory char16_t and char32_t types for UTF-16 and UTF-32 data, independently of whether wchar_t is 16-bit or 32-bit; maybe the better thing is to proactively start providing the new interfaces in <uchar.h> that will result from C1x adoption (and convert GNU programs to use this rather than wchar_t for character operations), although without compiler support for u"" and U"" (and even u8""), we are no better than ditching compiler support for L"" if you force a wchar_t size override. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1516.pdf lists: 7.27 Unicode utilities <uchar.h> 1 The header <uchar.h> declares types and functions for manipulating Unic= ode characters. 2 The types declared are mbstate_t (described in 7.29.1) and size_t (described in 7.19); char16_t which is an unsigned integer type used for 16-bit characters and is the same type as uint_least16_t (described in 7.20.1.2); and char32_t which is an unsigned integer type used for 32-bit characters and is the same type as uint_least32_t (also described in 7.20.1.2). mbrtoc16 c16rtomb mbrtoc32 c32rtomb but no variants for replacing wprintf and friends (convert to multibyte and use printf and friends instead). --=20 Eric Blake eblake@HIDDEN +1-801-349-2682 Libvirt virtualization library http://libvirt.org --------------enig94CF3FEB4BA742E2A08505A3 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBCAAGBQJNRuorAAoJEKeha0olJ0Nq75oH/RpS/V6+I5kdmDbm3JNIQeS5 SwN7b6/jhycI9Hs5y/MvjSfo0auhwstLyGPutmqtDTAnJ3TRjO/NDUshuBo3vDMg 6jLLzYwqKRAyEFMmSpLygON8UIgrAScJxb5gEmRwzW1m6Y4zZojfVDpO/qRmhXfJ y+9rSgDhpU4ex3Pevg9IuGFHVNh11ClNEFm96cJjFYLK46zQXyGaY6UrZO6CkcYf bVwzLD5nWx3btYi75XdBppPvx1hA9q6e291BrAgf6IU1zhq76TX9k9D9HZIu7FEh bv8gDkYy/T5FCF4+qo2/TtOvAX3H9kbkwPUziH8lQ+fcbbt5euRvCbM/HjkfSN0= =m8Gr -----END PGP SIGNATURE----- --------------enig94CF3FEB4BA742E2A08505A3--
Eric Blake <eblake@HIDDEN>
:bug-coreutils@HIDDEN
.
Full text available.owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN
:bug#7948
; Package coreutils
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.