GNU logs - #38235, boring messages


Message sent to bug-guile@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#38235: string-foldcase bug for trailing sigma
Resent-From: Andy Wingo <wingo@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guile@HIDDEN
Resent-Date: Sat, 16 Nov 2019 20:42:02 +0000
Resent-Message-ID: <handler.38235.B.157393689619608 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 38235
X-GNU-PR-Package: guile
X-GNU-PR-Keywords: 
To: 38235 <at> debbugs.gnu.org
X-Debbugs-Original-To: bug-guile@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.157393689619608
          (code B ref -1); Sat, 16 Nov 2019 20:42:02 +0000
Received: (at submit) by debbugs.gnu.org; 16 Nov 2019 20:41:36 +0000
Received: from localhost ([127.0.0.1]:39759 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1iW4sx-00056B-Rm
	for submit <at> debbugs.gnu.org; Sat, 16 Nov 2019 15:41:36 -0500
Received: from lists.gnu.org ([209.51.188.17]:46925)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <wingo@HIDDEN>) id 1iW4sw-000561-7G
 for submit <at> debbugs.gnu.org; Sat, 16 Nov 2019 15:41:34 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:53452)
 by lists.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <wingo@HIDDEN>) id 1iW4su-0002b4-9E
 for bug-guile@HIDDEN; Sat, 16 Nov 2019 15:41:33 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,URIBL_BLOCKED
 autolearn=disabled version=3.3.2
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <wingo@HIDDEN>) id 1iW4sr-0007NT-9K
 for bug-guile@HIDDEN; Sat, 16 Nov 2019 15:41:32 -0500
Received: from fanzine.igalia.com ([178.60.130.6]:57537)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
 (Exim 4.71) (envelope-from <wingo@HIDDEN>) id 1iW4sq-0007Ll-Lv
 for bug-guile@HIDDEN; Sat, 16 Nov 2019 15:41:29 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com;
 s=20170329; 
 h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-ID:Date:Subject:To:From;
 bh=jzoxyQO6pGsm6bcRZioqIYrJlu1taKjghWuJf8dUNwU=; 
 b=gxFrQgio1GjZZUcvre91Sb6TKau+XZ7SHXJdQxbZgYdmXe0e4h2g0asuGpXKpsI4Ewh9lkvspMq1A7VcbVDxjloaCPZhRNh4JDOoxO0PK7PI0pi39DIGeOm5ZQWsAHmpmyYinRQ3rFbPF9YNnWyrK0U4Nll2FCexnGrFOlyRegHNYL1d1Js5P1Xdcw+QeGna3uupzSoO3479zmVYLMTE93u4X6oC3mHuozMeUb/ESd3i5M/QU+VpZlQPEA8z5kvPOfDHnEae1ABnO2yGUHMKz5aIwhnu8yhowGV+Ddly4zIQN4Dle8MCc3GusxssQG8UG1UMYP58gXWC+BsBkURvSA==;
Received: from cha74-2-88-160-189-213.fbx.proxad.net ([88.160.189.213]
 helo=sparrow) by fanzine.igalia.com with esmtpsa 
 (Cipher TLS1.0:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim)
 id 1iW4sm-0005oK-M7
 for <bug-guile@HIDDEN>; Sat, 16 Nov 2019 21:41:24 +0100
From: Andy Wingo <wingo@HIDDEN>
Date: Sat, 16 Nov 2019 21:41:05 +0100
Message-ID: <87tv73mu5a.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no
 timestamps) [generic] [fuzzy]
X-Received-From: 178.60.130.6
X-Spam-Score: -1.6 (-)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.6 (--)

Given the following example, using (rnrs unicode):

  (string-foldcase "=CE=9C=CE=88=CE=9B=CE=9F=CE=A3")

The expected result is "=CE=BC=CE=AD=CE=BB=CE=BF=CF=83"; see R6RS libraries=
 section 1.2.  However
instead Guile's result is "=CE=BC=CE=AD=CE=BB=CE=BF=CF=82".  Note that alth=
ough =CE=A3 usually
downcases to =CF=83, at the end of a string it's =CF=82.  This test shows a
limitation of defining string-foldcase as simply (string-downcase
(string-upcase str)).




Message sent:


Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Mailer: MIME-tools 5.505 (Entity 5.505)
Content-Type: text/plain; charset=utf-8
X-Loop: help-debbugs@HIDDEN
From: help-debbugs@HIDDEN (GNU bug Tracking System)
To: Andy Wingo <wingo@HIDDEN>
Subject: bug#38235: Acknowledgement (string-foldcase bug for trailing sigma)
Message-ID: <handler.38235.B.157393689619608.ack <at> debbugs.gnu.org>
References: <87tv73mu5a.fsf@HIDDEN>
X-Gnu-PR-Message: ack 38235
X-Gnu-PR-Package: guile
Reply-To: 38235 <at> debbugs.gnu.org
Date: Sat, 16 Nov 2019 20:42:02 +0000

Thank you for filing a new bug report with debbugs.gnu.org.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
 bug-guile@HIDDEN

If you wish to submit further information on this problem, please
send it to 38235 <at> debbugs.gnu.org.

Please do not send mail to help-debbugs@HIDDEN unless you wish
to report a problem with the Bug-tracking system.

--=20
38235: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D38235
GNU Bug Tracking System
Contact help-debbugs@HIDDEN with problems


Message sent to bug-guile@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#38235: string-foldcase bug for trailing sigma
Resent-From: <tomas@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guile@HIDDEN
Resent-Date: Sun, 17 Nov 2019 11:20:02 +0000
Resent-Message-ID: <handler.38235.B.157398957427471 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 38235
X-GNU-PR-Package: guile
X-GNU-PR-Keywords: 
To: 38235 <at> debbugs.gnu.org
X-Debbugs-Original-To: bug-guile@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.157398957427471
          (code B ref -1); Sun, 17 Nov 2019 11:20:02 +0000
Received: (at submit) by debbugs.gnu.org; 17 Nov 2019 11:19:34 +0000
Received: from localhost ([127.0.0.1]:40408 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1iWIac-000791-6E
	for submit <at> debbugs.gnu.org; Sun, 17 Nov 2019 06:19:34 -0500
Received: from lists.gnu.org ([209.51.188.17]:47880)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <tomas@HIDDEN>) id 1iWIaZ-00078t-Tj
 for submit <at> debbugs.gnu.org; Sun, 17 Nov 2019 06:19:32 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:39143)
 by lists.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <tomas@HIDDEN>) id 1iWIaY-0002Py-F5
 for bug-guile@HIDDEN; Sun, 17 Nov 2019 06:19:31 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,RCVD_IN_DNSWL_NONE,
 URIBL_BLOCKED autolearn=disabled version=3.3.2
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <tomas@HIDDEN>) id 1iWIaW-0006bt-7Z
 for bug-guile@HIDDEN; Sun, 17 Nov 2019 06:19:30 -0500
Received: from mail.tuxteam.de ([5.199.139.25]:42889)
 by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16)
 (Exim 4.71) (envelope-from <tomas@HIDDEN>) id 1iWIaV-0006Zk-OG
 for bug-guile@HIDDEN; Sun, 17 Nov 2019 06:19:28 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tuxteam.de;
 s=mail; 
 h=From:In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:To:Date;
 bh=msB2wk72Ipco0Jiy8Z17Oc8cATQjFcONYPHBcZOZDGE=; 
 b=O1/H3sNMjWzpgH/xnFngawsvmH15eWlJtjz8JA9wprWzAfrLKArra7DhPLRVU2ivjoNV/oBPo4WbLzcdxlBo/giyOU9mIzmMpIdG78kYruiacp5T0vkoiqlX7+ZAGHtR2LJUQU5BibjcDbReJ6FYXCnvG1fzD+DsIQ/+Ozq7h+qi4eRU4zmLf2oV6dzVy/0HYiAtRIlM08MmTR2zs6halhKAfAVSMulhgqN/k6DaQKin8vkJybJfUCE0SeufYZwKl+1rc/nRykQiiXM43Wu94o7YDBIMxb5OLl/vV8g5H/ktLQS+eaA46OI+0LGI5qI8Bjt1M06ZvCa3//WbfSJQBQ==;
Received: from tomas by mail.tuxteam.de with local (Exim 4.80)
 (envelope-from <tomas@HIDDEN>) id 1iWIaM-00040t-TV
 for bug-guile@HIDDEN; Sun, 17 Nov 2019 12:19:18 +0100
Date: Sun, 17 Nov 2019 12:19:18 +0100
Message-ID: <20191117111918.GA15143@HIDDEN>
References: <87tv73mu5a.fsf@HIDDEN>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="FL5UXtIhxfXey3p5"
Content-Disposition: inline
In-Reply-To: <87tv73mu5a.fsf@HIDDEN>
User-Agent: Mutt/1.5.21 (2010-09-15)
From: <tomas@HIDDEN>
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy]
X-Received-From: 5.199.139.25
X-Spam-Score: -1.3 (-)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.3 (--)


--FL5UXtIhxfXey3p5
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Nov 16, 2019 at 09:41:05PM +0100, Andy Wingo wrote:
> Given the following example, using (rnrs unicode):
>=20
>   (string-foldcase "=CE=9C=CE=88=CE=9B=CE=9F=CE=A3")

Good catch. I think there's even a worse example: dotless
and dotted I [1]. Here it seems even impossible to do
up- and downcase correctly without knowing the language
context.

Cheers
[1] https://en.wikipedia.org/wiki/%C4%B0
-- tom=C3=A1s

--FL5UXtIhxfXey3p5
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAl3RLLYACgkQBcgs9XrR2kYLqgCffjW+xLAhkMeLqP/gR3wG79yN
96QAn1uNFevak0LtvUhdghbeuvbVGHPH
=MB7J
-----END PGP SIGNATURE-----

--FL5UXtIhxfXey3p5--




Message sent to bug-guile@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#38235: string-foldcase bug for trailing sigma
Resent-From: John Cowan <cowan@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-guile@HIDDEN
Resent-Date: Sun, 17 Nov 2019 18:14:02 +0000
Resent-Message-ID: <handler.38235.B38235.157401444015599 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 38235
X-GNU-PR-Package: guile
X-GNU-PR-Keywords: 
To: Andy Wingo <wingo@HIDDEN>, tomas@HIDDEN
Cc: 38235 <at> debbugs.gnu.org
Received: via spool by 38235-submit <at> debbugs.gnu.org id=B38235.157401444015599
          (code B ref 38235); Sun, 17 Nov 2019 18:14:02 +0000
Received: (at 38235) by debbugs.gnu.org; 17 Nov 2019 18:14:00 +0000
Received: from localhost ([127.0.0.1]:42546 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1iWP3f-00043W-K1
	for submit <at> debbugs.gnu.org; Sun, 17 Nov 2019 13:13:59 -0500
Received: from mail-qk1-f169.google.com ([209.85.222.169]:46796)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <cowan@HIDDEN>) id 1iWP3d-00043J-2A
 for 38235 <at> debbugs.gnu.org; Sun, 17 Nov 2019 13:13:57 -0500
Received: by mail-qk1-f169.google.com with SMTP id h15so12454750qka.13
 for <38235 <at> debbugs.gnu.org>; Sun, 17 Nov 2019 10:13:57 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=ccil-org.20150623.gappssmtp.com; s=20150623;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=5O4jivPCu71/wLIoE6lphqgFQVhFrvc9FNEwY7Qi6m0=;
 b=ohtCiNbgMD+BonVSGT+DmkmXqu1PGip5smzXw7MfMivKpDswINtwaMGbMf5ZDSKFY+
 Gw73X8TwuVKgINcZbr3lU1+yVX279toGNvJ+07QiU2n2IpHg8jpnrpKk9s4xIEUYG4Ib
 WSnaHuzZyMfsuzLmpDzaTsTN5sYDlV3BPFgg8B0ooSS0JcOLc6hGDLyXMpND3KFNoi2Y
 YBGyUabOdPQhyiCtLf9vUZhwedK6e5Ydp52uvijlNW6Z7BoYnwOaXxEmHrQyhBuEob2b
 Ip0TrzK4OSLfqYTSKkz8rgocN7nvVwownPGrjLjb4ydOJ1+TKtA+XbGglHz0XJaKB1lc
 lUfg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=5O4jivPCu71/wLIoE6lphqgFQVhFrvc9FNEwY7Qi6m0=;
 b=AL6f67juLxmZVm7JwtGhNafCGHxlCB6Gh72gcrg217NU6bUOcDWH9UlJjHDtDI7yuF
 KWD95Ds0ASpolu0h5r5H2xHp8QcXM8Lt7NpaqtnRKpQ220UhnkDGv/sRYBUgZbv86dsB
 tuZr+cdpmapNRzOhz0Jkf78SXZyO7e2uBS4RZmMGwLlrwIJmNd2kNMLq2yukns2glwJQ
 hsyjUgnPai5lIiWW1zfy+/eN6qHNCtFabCgigjz3Mq+rVQC7DYp2+ZreGElkWXSMWYkC
 HQLhVpCyyJ+LSOB5K1NWd/t3555fk1Wqr9T670rNxZ+ygO9zxwIzT9087tT3Jw7jMiTM
 6TZw==
X-Gm-Message-State: APjAAAX/3BqV7Jxorig4IvPPtRVpnUZYEKEbXOVgClLXE98w3TUmrk/I
 G0Isn/rC4d7qRNQ8614Y4jVxsrwrp7Ia+ibeU8SNaw==
X-Google-Smtp-Source: APXvYqzevOt4yftRzavDgY0JiBDqSqC69uYhVpME30aRTSWD96uqBZb8I0kCfdoRtctuvnoML5wdfCHidW5fUgjbwWU=
X-Received: by 2002:a37:6f07:: with SMTP id k7mr20484077qkc.118.1574014431508; 
 Sun, 17 Nov 2019 10:13:51 -0800 (PST)
MIME-Version: 1.0
References: <87tv73mu5a.fsf@HIDDEN>
In-Reply-To: <87tv73mu5a.fsf@HIDDEN>
From: John Cowan <cowan@HIDDEN>
Date: Sun, 17 Nov 2019 13:13:42 -0500
Message-ID: <CAD2gp_TC2tmAJ7kj4v_3VYvqM_7g-0TBCC8U58KHxMDBzCdOWQ@HIDDEN>
Content-Type: multipart/alternative; boundary="0000000000000fcec405978ecbde"
X-Spam-Score: 0.0 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

--0000000000000fcec405978ecbde
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sat, Nov 16, 2019 at 3:42 PM Andy Wingo <wingo@HIDDEN> wrote:


> The expected result is "=CE=BC=CE=AD=CE=BB=CE=BF=CF=83"; see R6RS librari=
es section 1.2.  However
> instead Guile's result is "=CE=BC=CE=AD=CE=BB=CE=BF=CF=82".  Note that al=
though =CE=A3 usually
> downcases to =CF=83, at the end of a string it's =CF=82.


More precisely, it downcases to =CF=83 if a letter follows and to =CF=82 if=
 not
(being at the end of a string is a particular case).  However, this is not
actually always Greekly correct:  the string "=CE=A6=CE=99=CE=9B=CE=9F=CE=
=A3." with a period at the
end downcases to "=CF=86=CE=B9=CE=BB=CE=BF=CF=82." if it is the word =CF=86=
=CE=AF=CE=BB=CE=BF=CF=82 'friend' (without its
proper accent) at the end of a sentence, but as "=CF=86=CE=B9=CE=BB=CE=BF=
=CF=82." if it is an
abbreviation for =CF=86=CE=B9=CE=BB=CE=BF=CF=83=CE=BF=CF=86=CE=AF=CE=B1 'ph=
ilosophy'.  For this reason, R7RS does not
require mapping to  =CF=82 in this situation as R6RS does.

This test shows a
> limitation of defining string-foldcase as simply (string-downcase
> (string-upcase str)).
>

As explained in Unicode section 5.18, the foldcase mappings (in <
https://www.unicode.org/Public/UNIDATA/CaseFolding.txt>, the lines with
status C and F) actually create a set of equivalence classes that are
closed under {upper,lower,title}case mapping, and then choose a single
character to represent each class.  This is usually the unique lowercase
character, but not always: in Cherokee it is the uppercase character, and
in the set {=CE=A3, =CF=83, =CF=82} it is  =CF=83.

On Sun, Nov 17, 2019 at 6:20 AM <tomas@HIDDEN> wrote:

Good catch. I think there's even a worse example: dotless
> and dotted I [1]. Here it seems even impossible to do
> up- and downcase correctly without knowing the language
> context.
>

Language-specific case mappings are explicitly out of Scheme's remit: they
have to be performed by specialized libraries.  There is an additional
situation in Lithuanian dictionaries (but not running text): an "i" with a
tone accent is represented as "i" + dot above + accent, like this:  "i=CC=
=87=CC=81".
However, this dot above must be dropped when uppercasing, producing
ordinary "=C3=8D".

--0000000000000fcec405978ecbde
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=
=3D"gmail_attr">On Sat, Nov 16, 2019 at 3:42 PM Andy Wingo &lt;<a href=3D"m=
ailto:wingo@HIDDEN">wingo@HIDDEN</a>&gt; wrote:<br></div><div>=C2=A0<=
/div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-le=
ft:1px #ccc solid;padding-left:1ex">The expected result is &quot;=CE=BC=CE=
=AD=CE=BB=CE=BF=CF=83&quot;; see R6RS libraries section 1.2.=C2=A0 However<=
br>
instead Guile&#39;s result is &quot;=CE=BC=CE=AD=CE=BB=CE=BF=CF=82&quot;.=
=C2=A0 Note that although =CE=A3 usually<br>
downcases to =CF=83, at the end of a string it&#39;s =CF=82.</blockquote><d=
iv><br></div><div>More precisely, it downcases to

=CF=83 if a letter follows and to

=CF=82 if not (being at the end of a string is a particular case).=C2=A0 Ho=
wever, this is not actually always Greekly correct:=C2=A0 the string &quot;=
=CE=A6=CE=99=CE=9B=CE=9F=CE=A3.&quot; with a period at the end downcases to=
 &quot;=CF=86=CE=B9=CE=BB=CE=BF=CF=82.&quot; if it is the word =CF=86=CE=AF=
=CE=BB=CE=BF=CF=82 &#39;friend&#39; (without its proper accent) at the end =
of a sentence, but as &quot;=CF=86=CE=B9=CE=BB=CE=BF=CF=82.&quot; if it is =
an abbreviation for =CF=86=CE=B9=CE=BB=CE=BF=CF=83=CE=BF=CF=86=CE=AF=CE=B1 =
&#39;philosophy&#39;.=C2=A0 For this reason, R7RS does not require mapping =
to=C2=A0 =CF=82 in this situation as R6RS does.</div><div><br></div><blockq=
uote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc =
solid;padding-left:1ex">This test shows a<br>
limitation of defining string-foldcase as simply (string-downcase<br>
(string-upcase str)).<br></blockquote><div><br></div><div>As explained in U=
nicode section 5.18, the foldcase mappings (in &lt;<a href=3D"https://www.u=
nicode.org/Public/UNIDATA/CaseFolding.txt">https://www.unicode.org/Public/U=
NIDATA/CaseFolding.txt</a>&gt;, the lines with status C and F) actually cre=
ate a set of equivalence classes that are closed under {upper,lower,title}c=
ase mapping, and then choose a single character to represent each class.=C2=
=A0 This is usually the unique lowercase character, but not always: in Cher=
okee it is the uppercase character, and in the set {=CE=A3, =CF=83, =CF=82}=
 it is=C2=A0

=CF=83.=C2=A0=C2=A0</div><div><br></div><div><div dir=3D"ltr" class=3D"gmai=
l_attr">On Sun, Nov 17, 2019 at 6:20 AM &lt;<a href=3D"mailto:tomas@tuxteam=
.de">tomas@HIDDEN</a>&gt; wrote:<br></div><div dir=3D"ltr" class=3D"gma=
il_attr"><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0p=
x 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Good c=
atch. I think there&#39;s even a worse example: dotless<br>
and dotted I [1]. Here it seems even impossible to do<br>
up- and downcase correctly without knowing the language<br>
context.<br></blockquote><div><br></div><div>Language-specific case mapping=
s are explicitly out of Scheme&#39;s remit: they have to be performed by sp=
ecialized libraries.=C2=A0 There is an additional situation in Lithuanian d=
ictionaries (but not running text): an &quot;i&quot; with a tone accent is =
represented as &quot;i&quot;=C2=A0+ dot above=C2=A0+ accent, like this:=C2=
=A0 &quot;i=CC=87=CC=81&quot;.=C2=A0 However, this dot above must be droppe=
d when uppercasing, producing ordinary &quot;=C3=8D&quot;.</div></div></div=
></div>

--0000000000000fcec405978ecbde--





Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.