Assaf Gordon <assafgordon@HIDDEN>
to control <at> debbugs.gnu.org.
Full text available.Assaf Gordon <assafgordon@HIDDEN>
to control <at> debbugs.gnu.org.
Full text available.era eriksson <era@HIDDEN>
to control <at> debbugs.gnu.org.
Full text available.Received: (at submit) by debbugs.gnu.org; 4 Feb 2011 23:09:38 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Fri Feb 04 18:09:38 2011 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1PlUmY-0000aE-0p for submit <at> debbugs.gnu.org; Fri, 04 Feb 2011 18:09:38 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <warren@HIDDEN>) id 1PlUHu-0008LA-0S for submit <at> debbugs.gnu.org; Fri, 04 Feb 2011 17:37:58 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <warren@HIDDEN>) id 1PlUQ9-0005Hd-AG for submit <at> debbugs.gnu.org; Fri, 04 Feb 2011 17:46:29 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:48546) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <warren@HIDDEN>) id 1PlUQ9-0005HZ-8N for submit <at> debbugs.gnu.org; Fri, 04 Feb 2011 17:46:29 -0500 Received: from [140.186.70.92] (port=38427 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PlUQ7-0000Wo-Ta for bug-coreutils@HIDDEN; Fri, 04 Feb 2011 17:46:29 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <warren@HIDDEN>) id 1PlUQ6-0005GX-Uj for bug-coreutils@HIDDEN; Fri, 04 Feb 2011 17:46:27 -0500 Received: from etr-usa.com ([130.94.180.135]:4473) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <warren@HIDDEN>) id 1PlUQ6-0005GN-QG for bug-coreutils@HIDDEN; Fri, 04 Feb 2011 17:46:26 -0500 Received: (qmail 13302 invoked by uid 13447); 4 Feb 2011 22:46:24 -0000 Received: from unknown (HELO [172.20.0.42]) ([71.210.207.149]) (envelope-sender <warren@HIDDEN>) by 130.94.180.135 (qmail-ldap-1.03) with SMTP for <cygwin@HIDDEN>; 4 Feb 2011 22:46:24 -0000 Message-ID: <4D4C81C0.4070806@HIDDEN> Date: Fri, 04 Feb 2011 15:46:24 -0700 From: Warren Young <warren@HIDDEN> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 MIME-Version: 1.0 To: cygwin@HIDDEN, bug-gnulib@HIDDEN, bug-coreutils@HIDDEN Subject: Re: 16-bit wchar_t on Windows and Cygwin References: <20110202122102.GD2675@HIDDEN> <201102021229.04623.bruno@HIDDEN> <201102021702.57387.bruno@HIDDEN> <20110202162801.GH2675@HIDDEN> <20110202163516.GI2675@HIDDEN> In-Reply-To: <20110202163516.GI2675@HIDDEN> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: FreeBSD 4.6-4.9 X-Received-From: 130.94.180.135 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 199.232.76.165 X-Spam-Score: -6.6 (------) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Fri, 04 Feb 2011 18:09:36 -0500 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -6.6 (------) On 2/2/2011 9:35 AM, Corinna Vinschen wrote: > > If only the one's who decided that wchar_t in Cygwin should have the > same size as WCHAR_T in the underlying Windows would have thought twice > about the implications... Cygwin 1.9? Or maybe 2.0, if it breaks ABIs?
owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN:bug#7963; Package coreutils.
Full text available.Received: (at submit) by debbugs.gnu.org; 2 Feb 2011 20:34:14 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 02 15:34:14 2011 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1PkjP3-0000QA-BL for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 15:34:14 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from <andy.koppe@HIDDEN>) id 1PkjBc-000065-23 for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 15:20:20 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <andy.koppe@HIDDEN>) id 1PkjJh-0000A4-KD for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 15:28:46 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, T_DKIM_INVALID, T_TO_NO_BRKTS_FREEMAIL autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:49752) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <andy.koppe@HIDDEN>) id 1PkjJh-00009t-F9 for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 15:28:41 -0500 Received: from [140.186.70.92] (port=54032 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PkjJc-0007QX-Hn for bug-coreutils@HIDDEN; Wed, 02 Feb 2011 15:28:41 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <andy.koppe@HIDDEN>) id 1PkjJV-00006S-Cc for bug-coreutils@HIDDEN; Wed, 02 Feb 2011 15:28:36 -0500 Received: from mail-yi0-f41.google.com ([209.85.218.41]:63379) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <andy.koppe@HIDDEN>) id 1PkjJL-0008VL-GI; Wed, 02 Feb 2011 15:28:19 -0500 Received: by yia25 with SMTP id 25so210360yia.0 for <multiple recipients>; Wed, 02 Feb 2011 12:28:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type:content-transfer-encoding; bh=lHF9QxhhRzx5j7TZ9ESI7cLW5MllbJIIWZPXRRRKDqA=; b=gcYF/6o/BG9UMSYd1q8S6g1qjiLpWKOg3yZsiS5lV+J24eKQd1K+Fz7yojUyO3T9sZ WqFxUXE+zuQWtYuseFNyAK8Lugk00M1g4i42RaGBXamz/Hv+UPRejNvI6u7jLN6sthPU Z3aUuMEv0kvYYmzNlOinyY0m20GLMtbxv+ktw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=xzxzDN/JAGb+eO8ww0ucPwuaSGd+qjNkyC/Bq9vvwh5sEet/ZjaagIhR2bjIBBUdtX 0iMpURMKZ50ms7Kq7Pxo4UoZPf/mCHE2op0QvEHN07ev6Ko/DO4mXE+zFm+jSXmgggGE lO3yaddvDmTQiys7dzHNY8/esslHrkOmJZWlw= MIME-Version: 1.0 Received: by 10.236.109.141 with SMTP id s13mr19961029yhg.16.1296678497958; Wed, 02 Feb 2011 12:28:17 -0800 (PST) Received: by 10.147.172.19 with HTTP; Wed, 2 Feb 2011 12:28:17 -0800 (PST) In-Reply-To: <20110202163516.GI2675@HIDDEN> References: <20110202122102.GD2675@HIDDEN> <201102021229.04623.bruno@HIDDEN> <201102021702.57387.bruno@HIDDEN> <20110202162801.GH2675@HIDDEN> <20110202163516.GI2675@HIDDEN> Date: Wed, 2 Feb 2011 20:28:17 +0000 Message-ID: <AANLkTinVjgOWPar+8prQA2aE4FphJcm-Y1oq3c1D_wta@HIDDEN> Subject: Re: 16-bit wchar_t on Windows and Cygwin From: Andy Koppe <andy.koppe@HIDDEN> To: cygwin@HIDDEN, bug-gnulib@HIDDEN, bug-coreutils@HIDDEN Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 209.85.218.41 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 199.232.76.165 X-Spam-Score: -5.9 (-----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Wed, 02 Feb 2011 15:34:12 -0500 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Sender: debbugs-submit-bounces <at> debbugs.gnu.org Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org X-Spam-Score: -5.9 (-----) On 2 February 2011 16:35, Corinna Vinschen wrote: > On Feb =C2=A02 17:28, Corinna Vinschen wrote: >> On Feb =C2=A02 17:02, Bruno Haible wrote: >> > But if you say that the application should convert UTF-16 surrogates >> > to UTF-32 before calling iswalpha: That's certainly a requirement >> > for Cygwin 1.7.x application that want to support the entire Unicode >> > character set. But it's outside of POSIX, and many GNU programs will >> > not want to include this added complexity. Just try to apply this >> > suggestion to gnulib's quotearg.c, then estimate the time someone >> > would need to apply it also to regcomp.c, strftime.c, mbscasestr.c, >> > coreutils/src/wc.c, and so on. >> >> Cygwin's regcomp is taken from FreeBSD and is UTF-16 capable, including >> surrogate handling. =C2=A0It only required two changes in the code. > > Btw., I would be sure glad if Cygwin would use a wchar_t of 4 bytes as > well. =C2=A0The problem is that this requires too many changes at once to > work right, and it would introduce a lot of backward compatibility > problems which would have to be handled. Cygwin 1.7 might have been a good point for that change, because the lack of proper locale and charset support in previous versions meant that backward compatibility was much less of a concern than it is now. But it's a difficult change indeed, and it's not entirely clear that it's worthwhile. I guess 64-bit Cygwin (if or when it happens) might be the next opportunity. > If only the one's who decided that wchar_t in Cygwin should have the > same size as WCHAR_T in the underlying Windows would have thought twice > about the implications... Windows Unicode support was introduced with Windows NT in 1993, whereas Unicode was only extended beyond 16 bits with version 2.0 in 1996. Cygwin was first released the year before. If the Unicode extension was a consideration at all (which I'd doubt), wchar_t !=3D WCHAR probably seemed far more daunting than having to deal with surrogates at some point down the line. Andy
owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN:bug#7963; Package coreutils.
Full text available.
Received: (at submit) by debbugs.gnu.org; 2 Feb 2011 15:56:29 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 02 10:56:29 2011
Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.69)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1Pkf4G-0000JJ-Kh
for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 10:56:28 -0500
Received: from eggs.gnu.org ([140.186.70.92])
by debbugs.gnu.org with esmtp (Exim 4.69)
(envelope-from <bruno@HIDDEN>) id 1Pkf4E-0000J8-M1
for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 10:56:27 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from <bruno@HIDDEN>) id 1PkfCO-0001dW-0a
for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 11:04:52 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org
X-Spam-Level:
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_NONE,
T_DKIM_INVALID autolearn=unavailable version=3.3.1
Received: from lists.gnu.org ([199.232.76.165]:43135)
by eggs.gnu.org with esmtp (Exim 4.71)
(envelope-from <bruno@HIDDEN>) id 1PkfCI-0001cZ-UV
for submit <at> debbugs.gnu.org; Wed, 02 Feb 2011 11:04:51 -0500
Received: from [140.186.70.92] (port=42366 helo=eggs.gnu.org)
by lists.gnu.org with esmtp (Exim 4.43) id 1PkfC3-0000JF-2D
for bug-coreutils@HIDDEN; Wed, 02 Feb 2011 11:04:41 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from <bruno@HIDDEN>) id 1PkfBx-0001ad-Pa
for bug-coreutils@HIDDEN; Wed, 02 Feb 2011 11:04:27 -0500
Received: from mo-p00-ob.rzone.de ([81.169.146.160]:42315)
by eggs.gnu.org with esmtp (Exim 4.71)
(envelope-from <bruno@HIDDEN>)
id 1PkfAw-0001K6-BR; Wed, 02 Feb 2011 11:03:22 -0500
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; t=1296662600; l=2349;
s=domk; d=haible.de;
h=Content-Transfer-Encoding:Content-Type:References:In-Reply-To:
MIME-Version:Date:Subject:To:From:X-RZG-CLASS-ID:X-RZG-AUTH;
bh=iXHdg6/GlTTfwavv8hoqvECnmh0=;
b=Z+gKZtrX9aIUH9/WibpHOL2fUnzbhSDkKt5qNnuKGVENGguT03ZpfDORBvzIleV+jmJ
ZlGAjKkjjcArGwahOUabu1MJ+DKEwrvb/PqZapTheWpK787B5Nxe1Ha8sMkec7JHGN683
Tq+JzKPi+ZV6JQDiEU/om21VR4D/fo61GZI=
X-RZG-AUTH: :Ln4Re0+Ic/6oZXR1YgKryK8brksyK8dozXDwHXjf9hj/zDNRbfA44+iwyQ==
X-RZG-CLASS-ID: mo00
Received: from linuix.haible.de
(dslb-088-068-046-137.pools.arcor-ip.net [88.68.46.137])
by post.strato.de (jimi mo16) (RZmta 25.1)
with ESMTPA id L02466n12Easey ; Wed, 2 Feb 2011 17:02:58 +0100 (MET)
From: Bruno Haible <bruno@HIDDEN>
To: bug-gnulib@HIDDEN, cygwin <cygwin@HIDDEN>,
"bug-coreutils" <bug-coreutils@HIDDEN>, Eric Blake <eblake@HIDDEN>
Subject: Re: 16-bit wchar_t on Windows and Cygwin
Date: Wed, 2 Feb 2011 17:02:56 +0100
User-Agent: KMail/1.9.9
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20110202122102.GD2675@HIDDEN>
References: <20110202122102.GD2675@HIDDEN>
<201102021229.04623.bruno@HIDDEN>
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit
Message-Id: <201102021702.57387.bruno@HIDDEN>
X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta)
X-Received-From: 81.169.146.160
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2)
X-Received-From: 199.232.76.165
X-Spam-Score: -5.5 (-----)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -5.5 (-----)
Hello Corinna,
> And, please note the wording in SUSv4, for instance in
> http://calimero.vinschen.de/susv4/functions/iswalpha.html
Likewise in POSIX:2008, at the URL
http://www.opengroup.org/onlinepubs/9699919799/functions/iswalpha.html
> The wc argument is a wint_t, the value of which the application shall
> ^^^^^^ ^^^^^^^^^^^
> ensure is a wide-character code corresponding to a valid character in
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> the current locale, or equal to the value of the macro WEOF. If the
> argument has any other value, the behavior is undefined.
What this sentence means in formulas, is that when an application passes
a 'wint_t x' to iswalpha(), it has to satisfy
x == (wint_t) (wchar_t) x || x == EOF
> iswalpha takes wint_t, not wchar_t. Since sizeof (wint_t) is 4 byte,
> the function can return the correct value, provided that the application
> converts the UTF-16 surrogate to UTF-32 before calling iswalpha.
When an application does this, is passes an invalid wint_t value to
iswalpha(), according to the spec paragraph that you have just cited.
So the application uses an extension to POSIX functionality, not
POSIX itself.
I see that Cygwin 1.7.x iswalpha() works in this way you describe (but
mingw's iswalpha() doesn't). So this means that gnulib's proposed
iswwalpha(wwchar_t) function could be implemented using iswalpha()
on Cygwin 1.7.x and will not cause the Unicode based tables to be
included in the executable. This is good and nice.
But if you say that the application should convert UTF-16 surrogates
to UTF-32 before calling iswalpha: That's certainly a requirement
for Cygwin 1.7.x application that want to support the entire Unicode
character set. But it's outside of POSIX, and many GNU programs will
not want to include this added complexity. Just try to apply this
suggestion to gnulib's quotearg.c, then estimate the time someone
would need to apply it also to regcomp.c, strftime.c, mbscasestr.c,
coreutils/src/wc.c, and so on.
For this reason I propose the wwchar_t type with an API that is similar
to POSIX <wctype.h> but includes the surrogate handling, rather than
pushing it into each application's code.
Bruno
--
In memoriam Carl Friedrich Goerdeler <http://en.wikipedia.org/wiki/Carl_Friedrich_Goerdeler>
Bruno Haible <bruno@HIDDEN>:bug-coreutils@HIDDEN.
Full text available.owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN:bug#7963; Package coreutils.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.