X-Loop: help-debbugs@HIDDEN Subject: bug#38235: string-foldcase bug for trailing sigma Resent-From: Andy Wingo <wingo@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Sat, 16 Nov 2019 20:42:02 +0000 Resent-Message-ID: <handler.38235.B.157393689619608 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: report 38235 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: 38235 <at> debbugs.gnu.org X-Debbugs-Original-To: bug-guile@HIDDEN Received: via spool by submit <at> debbugs.gnu.org id=B.157393689619608 (code B ref -1); Sat, 16 Nov 2019 20:42:02 +0000 Received: (at submit) by debbugs.gnu.org; 16 Nov 2019 20:41:36 +0000 Received: from localhost ([127.0.0.1]:39759 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1iW4sx-00056B-Rm for submit <at> debbugs.gnu.org; Sat, 16 Nov 2019 15:41:36 -0500 Received: from lists.gnu.org ([209.51.188.17]:46925) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <wingo@HIDDEN>) id 1iW4sw-000561-7G for submit <at> debbugs.gnu.org; Sat, 16 Nov 2019 15:41:34 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:53452) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from <wingo@HIDDEN>) id 1iW4su-0002b4-9E for bug-guile@HIDDEN; Sat, 16 Nov 2019 15:41:33 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,URIBL_BLOCKED autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <wingo@HIDDEN>) id 1iW4sr-0007NT-9K for bug-guile@HIDDEN; Sat, 16 Nov 2019 15:41:32 -0500 Received: from fanzine.igalia.com ([178.60.130.6]:57537) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from <wingo@HIDDEN>) id 1iW4sq-0007Ll-Lv for bug-guile@HIDDEN; Sat, 16 Nov 2019 15:41:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-ID:Date:Subject:To:From; bh=jzoxyQO6pGsm6bcRZioqIYrJlu1taKjghWuJf8dUNwU=; b=gxFrQgio1GjZZUcvre91Sb6TKau+XZ7SHXJdQxbZgYdmXe0e4h2g0asuGpXKpsI4Ewh9lkvspMq1A7VcbVDxjloaCPZhRNh4JDOoxO0PK7PI0pi39DIGeOm5ZQWsAHmpmyYinRQ3rFbPF9YNnWyrK0U4Nll2FCexnGrFOlyRegHNYL1d1Js5P1Xdcw+QeGna3uupzSoO3479zmVYLMTE93u4X6oC3mHuozMeUb/ESd3i5M/QU+VpZlQPEA8z5kvPOfDHnEae1ABnO2yGUHMKz5aIwhnu8yhowGV+Ddly4zIQN4Dle8MCc3GusxssQG8UG1UMYP58gXWC+BsBkURvSA==; Received: from cha74-2-88-160-189-213.fbx.proxad.net ([88.160.189.213] helo=sparrow) by fanzine.igalia.com with esmtpsa (Cipher TLS1.0:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim) id 1iW4sm-0005oK-M7 for <bug-guile@HIDDEN>; Sat, 16 Nov 2019 21:41:24 +0100 From: Andy Wingo <wingo@HIDDEN> Date: Sat, 16 Nov 2019 21:41:05 +0100 Message-ID: <87tv73mu5a.fsf@HIDDEN> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] [fuzzy] X-Received-From: 178.60.130.6 X-Spam-Score: -1.6 (-) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.6 (--) Given the following example, using (rnrs unicode): (string-foldcase "=CE=9C=CE=88=CE=9B=CE=9F=CE=A3") The expected result is "=CE=BC=CE=AD=CE=BB=CE=BF=CF=83"; see R6RS libraries= section 1.2. However instead Guile's result is "=CE=BC=CE=AD=CE=BB=CE=BF=CF=82". Note that alth= ough =CE=A3 usually downcases to =CF=83, at the end of a string it's =CF=82. This test shows a limitation of defining string-foldcase as simply (string-downcase (string-upcase str)).
Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) Content-Type: text/plain; charset=utf-8 X-Loop: help-debbugs@HIDDEN From: help-debbugs@HIDDEN (GNU bug Tracking System) To: Andy Wingo <wingo@HIDDEN> Subject: bug#38235: Acknowledgement (string-foldcase bug for trailing sigma) Message-ID: <handler.38235.B.157393689619608.ack <at> debbugs.gnu.org> References: <87tv73mu5a.fsf@HIDDEN> X-Gnu-PR-Message: ack 38235 X-Gnu-PR-Package: guile Reply-To: 38235 <at> debbugs.gnu.org Date: Sat, 16 Nov 2019 20:42:02 +0000 Thank you for filing a new bug report with debbugs.gnu.org. This is an automatically generated reply to let you know your message has been received. Your message is being forwarded to the package maintainers and other interested parties for their attention; they will reply in due course. Your message has been sent to the package maintainer(s): bug-guile@HIDDEN If you wish to submit further information on this problem, please send it to 38235 <at> debbugs.gnu.org. Please do not send mail to help-debbugs@HIDDEN unless you wish to report a problem with the Bug-tracking system. --=20 38235: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D38235 GNU Bug Tracking System Contact help-debbugs@HIDDEN with problems
X-Loop: help-debbugs@HIDDEN Subject: bug#38235: string-foldcase bug for trailing sigma Resent-From: <tomas@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Sun, 17 Nov 2019 11:20:02 +0000 Resent-Message-ID: <handler.38235.B.157398957427471 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 38235 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: 38235 <at> debbugs.gnu.org X-Debbugs-Original-To: bug-guile@HIDDEN Received: via spool by submit <at> debbugs.gnu.org id=B.157398957427471 (code B ref -1); Sun, 17 Nov 2019 11:20:02 +0000 Received: (at submit) by debbugs.gnu.org; 17 Nov 2019 11:19:34 +0000 Received: from localhost ([127.0.0.1]:40408 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1iWIac-000791-6E for submit <at> debbugs.gnu.org; Sun, 17 Nov 2019 06:19:34 -0500 Received: from lists.gnu.org ([209.51.188.17]:47880) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <tomas@HIDDEN>) id 1iWIaZ-00078t-Tj for submit <at> debbugs.gnu.org; Sun, 17 Nov 2019 06:19:32 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:39143) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from <tomas@HIDDEN>) id 1iWIaY-0002Py-F5 for bug-guile@HIDDEN; Sun, 17 Nov 2019 06:19:31 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,RCVD_IN_DNSWL_NONE, URIBL_BLOCKED autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <tomas@HIDDEN>) id 1iWIaW-0006bt-7Z for bug-guile@HIDDEN; Sun, 17 Nov 2019 06:19:30 -0500 Received: from mail.tuxteam.de ([5.199.139.25]:42889) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from <tomas@HIDDEN>) id 1iWIaV-0006Zk-OG for bug-guile@HIDDEN; Sun, 17 Nov 2019 06:19:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tuxteam.de; s=mail; h=From:In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:To:Date; bh=msB2wk72Ipco0Jiy8Z17Oc8cATQjFcONYPHBcZOZDGE=; b=O1/H3sNMjWzpgH/xnFngawsvmH15eWlJtjz8JA9wprWzAfrLKArra7DhPLRVU2ivjoNV/oBPo4WbLzcdxlBo/giyOU9mIzmMpIdG78kYruiacp5T0vkoiqlX7+ZAGHtR2LJUQU5BibjcDbReJ6FYXCnvG1fzD+DsIQ/+Ozq7h+qi4eRU4zmLf2oV6dzVy/0HYiAtRIlM08MmTR2zs6halhKAfAVSMulhgqN/k6DaQKin8vkJybJfUCE0SeufYZwKl+1rc/nRykQiiXM43Wu94o7YDBIMxb5OLl/vV8g5H/ktLQS+eaA46OI+0LGI5qI8Bjt1M06ZvCa3//WbfSJQBQ==; Received: from tomas by mail.tuxteam.de with local (Exim 4.80) (envelope-from <tomas@HIDDEN>) id 1iWIaM-00040t-TV for bug-guile@HIDDEN; Sun, 17 Nov 2019 12:19:18 +0100 Date: Sun, 17 Nov 2019 12:19:18 +0100 Message-ID: <20191117111918.GA15143@HIDDEN> References: <87tv73mu5a.fsf@HIDDEN> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="FL5UXtIhxfXey3p5" Content-Disposition: inline In-Reply-To: <87tv73mu5a.fsf@HIDDEN> User-Agent: Mutt/1.5.21 (2010-09-15) From: <tomas@HIDDEN> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 5.199.139.25 X-Spam-Score: -1.3 (-) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.3 (--) --FL5UXtIhxfXey3p5 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Nov 16, 2019 at 09:41:05PM +0100, Andy Wingo wrote: > Given the following example, using (rnrs unicode): >=20 > (string-foldcase "=CE=9C=CE=88=CE=9B=CE=9F=CE=A3") Good catch. I think there's even a worse example: dotless and dotted I [1]. Here it seems even impossible to do up- and downcase correctly without knowing the language context. Cheers [1] https://en.wikipedia.org/wiki/%C4%B0 -- tom=C3=A1s --FL5UXtIhxfXey3p5 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAl3RLLYACgkQBcgs9XrR2kYLqgCffjW+xLAhkMeLqP/gR3wG79yN 96QAn1uNFevak0LtvUhdghbeuvbVGHPH =MB7J -----END PGP SIGNATURE----- --FL5UXtIhxfXey3p5--
X-Loop: help-debbugs@HIDDEN Subject: bug#38235: string-foldcase bug for trailing sigma Resent-From: John Cowan <cowan@HIDDEN> Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> Resent-CC: bug-guile@HIDDEN Resent-Date: Sun, 17 Nov 2019 18:14:02 +0000 Resent-Message-ID: <handler.38235.B38235.157401444015599 <at> debbugs.gnu.org> Resent-Sender: help-debbugs@HIDDEN X-GNU-PR-Message: followup 38235 X-GNU-PR-Package: guile X-GNU-PR-Keywords: To: Andy Wingo <wingo@HIDDEN>, tomas@HIDDEN Cc: 38235 <at> debbugs.gnu.org Received: via spool by 38235-submit <at> debbugs.gnu.org id=B38235.157401444015599 (code B ref 38235); Sun, 17 Nov 2019 18:14:02 +0000 Received: (at 38235) by debbugs.gnu.org; 17 Nov 2019 18:14:00 +0000 Received: from localhost ([127.0.0.1]:42546 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1iWP3f-00043W-K1 for submit <at> debbugs.gnu.org; Sun, 17 Nov 2019 13:13:59 -0500 Received: from mail-qk1-f169.google.com ([209.85.222.169]:46796) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <cowan@HIDDEN>) id 1iWP3d-00043J-2A for 38235 <at> debbugs.gnu.org; Sun, 17 Nov 2019 13:13:57 -0500 Received: by mail-qk1-f169.google.com with SMTP id h15so12454750qka.13 for <38235 <at> debbugs.gnu.org>; Sun, 17 Nov 2019 10:13:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ccil-org.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=5O4jivPCu71/wLIoE6lphqgFQVhFrvc9FNEwY7Qi6m0=; b=ohtCiNbgMD+BonVSGT+DmkmXqu1PGip5smzXw7MfMivKpDswINtwaMGbMf5ZDSKFY+ Gw73X8TwuVKgINcZbr3lU1+yVX279toGNvJ+07QiU2n2IpHg8jpnrpKk9s4xIEUYG4Ib WSnaHuzZyMfsuzLmpDzaTsTN5sYDlV3BPFgg8B0ooSS0JcOLc6hGDLyXMpND3KFNoi2Y YBGyUabOdPQhyiCtLf9vUZhwedK6e5Ydp52uvijlNW6Z7BoYnwOaXxEmHrQyhBuEob2b Ip0TrzK4OSLfqYTSKkz8rgocN7nvVwownPGrjLjb4ydOJ1+TKtA+XbGglHz0XJaKB1lc lUfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5O4jivPCu71/wLIoE6lphqgFQVhFrvc9FNEwY7Qi6m0=; b=AL6f67juLxmZVm7JwtGhNafCGHxlCB6Gh72gcrg217NU6bUOcDWH9UlJjHDtDI7yuF KWD95Ds0ASpolu0h5r5H2xHp8QcXM8Lt7NpaqtnRKpQ220UhnkDGv/sRYBUgZbv86dsB tuZr+cdpmapNRzOhz0Jkf78SXZyO7e2uBS4RZmMGwLlrwIJmNd2kNMLq2yukns2glwJQ hsyjUgnPai5lIiWW1zfy+/eN6qHNCtFabCgigjz3Mq+rVQC7DYp2+ZreGElkWXSMWYkC HQLhVpCyyJ+LSOB5K1NWd/t3555fk1Wqr9T670rNxZ+ygO9zxwIzT9087tT3Jw7jMiTM 6TZw== X-Gm-Message-State: APjAAAX/3BqV7Jxorig4IvPPtRVpnUZYEKEbXOVgClLXE98w3TUmrk/I G0Isn/rC4d7qRNQ8614Y4jVxsrwrp7Ia+ibeU8SNaw== X-Google-Smtp-Source: APXvYqzevOt4yftRzavDgY0JiBDqSqC69uYhVpME30aRTSWD96uqBZb8I0kCfdoRtctuvnoML5wdfCHidW5fUgjbwWU= X-Received: by 2002:a37:6f07:: with SMTP id k7mr20484077qkc.118.1574014431508; Sun, 17 Nov 2019 10:13:51 -0800 (PST) MIME-Version: 1.0 References: <87tv73mu5a.fsf@HIDDEN> In-Reply-To: <87tv73mu5a.fsf@HIDDEN> From: John Cowan <cowan@HIDDEN> Date: Sun, 17 Nov 2019 13:13:42 -0500 Message-ID: <CAD2gp_TC2tmAJ7kj4v_3VYvqM_7g-0TBCC8U58KHxMDBzCdOWQ@HIDDEN> Content-Type: multipart/alternative; boundary="0000000000000fcec405978ecbde" X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.0 (-) --0000000000000fcec405978ecbde Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Nov 16, 2019 at 3:42 PM Andy Wingo <wingo@HIDDEN> wrote: > The expected result is "=CE=BC=CE=AD=CE=BB=CE=BF=CF=83"; see R6RS librari= es section 1.2. However > instead Guile's result is "=CE=BC=CE=AD=CE=BB=CE=BF=CF=82". Note that al= though =CE=A3 usually > downcases to =CF=83, at the end of a string it's =CF=82. More precisely, it downcases to =CF=83 if a letter follows and to =CF=82 if= not (being at the end of a string is a particular case). However, this is not actually always Greekly correct: the string "=CE=A6=CE=99=CE=9B=CE=9F=CE= =A3." with a period at the end downcases to "=CF=86=CE=B9=CE=BB=CE=BF=CF=82." if it is the word =CF=86= =CE=AF=CE=BB=CE=BF=CF=82 'friend' (without its proper accent) at the end of a sentence, but as "=CF=86=CE=B9=CE=BB=CE=BF= =CF=82." if it is an abbreviation for =CF=86=CE=B9=CE=BB=CE=BF=CF=83=CE=BF=CF=86=CE=AF=CE=B1 'ph= ilosophy'. For this reason, R7RS does not require mapping to =CF=82 in this situation as R6RS does. This test shows a > limitation of defining string-foldcase as simply (string-downcase > (string-upcase str)). > As explained in Unicode section 5.18, the foldcase mappings (in < https://www.unicode.org/Public/UNIDATA/CaseFolding.txt>, the lines with status C and F) actually create a set of equivalence classes that are closed under {upper,lower,title}case mapping, and then choose a single character to represent each class. This is usually the unique lowercase character, but not always: in Cherokee it is the uppercase character, and in the set {=CE=A3, =CF=83, =CF=82} it is =CF=83. On Sun, Nov 17, 2019 at 6:20 AM <tomas@HIDDEN> wrote: Good catch. I think there's even a worse example: dotless > and dotted I [1]. Here it seems even impossible to do > up- and downcase correctly without knowing the language > context. > Language-specific case mappings are explicitly out of Scheme's remit: they have to be performed by specialized libraries. There is an additional situation in Lithuanian dictionaries (but not running text): an "i" with a tone accent is represented as "i" + dot above + accent, like this: "i=CC= =87=CC=81". However, this dot above must be dropped when uppercasing, producing ordinary "=C3=8D". --0000000000000fcec405978ecbde Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><br><br><div class=3D"gmail_quote"><div dir=3D"ltr" class= =3D"gmail_attr">On Sat, Nov 16, 2019 at 3:42 PM Andy Wingo <<a href=3D"m= ailto:wingo@HIDDEN">wingo@HIDDEN</a>> wrote:<br></div><div>=C2=A0<= /div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-le= ft:1px #ccc solid;padding-left:1ex">The expected result is "=CE=BC=CE= =AD=CE=BB=CE=BF=CF=83"; see R6RS libraries section 1.2.=C2=A0 However<= br> instead Guile's result is "=CE=BC=CE=AD=CE=BB=CE=BF=CF=82".= =C2=A0 Note that although =CE=A3 usually<br> downcases to =CF=83, at the end of a string it's =CF=82.</blockquote><d= iv><br></div><div>More precisely, it downcases to =CF=83 if a letter follows and to =CF=82 if not (being at the end of a string is a particular case).=C2=A0 Ho= wever, this is not actually always Greekly correct:=C2=A0 the string "= =CE=A6=CE=99=CE=9B=CE=9F=CE=A3." with a period at the end downcases to= "=CF=86=CE=B9=CE=BB=CE=BF=CF=82." if it is the word =CF=86=CE=AF= =CE=BB=CE=BF=CF=82 'friend' (without its proper accent) at the end = of a sentence, but as "=CF=86=CE=B9=CE=BB=CE=BF=CF=82." if it is = an abbreviation for =CF=86=CE=B9=CE=BB=CE=BF=CF=83=CE=BF=CF=86=CE=AF=CE=B1 = 'philosophy'.=C2=A0 For this reason, R7RS does not require mapping = to=C2=A0 =CF=82 in this situation as R6RS does.</div><div><br></div><blockq= uote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc = solid;padding-left:1ex">This test shows a<br> limitation of defining string-foldcase as simply (string-downcase<br> (string-upcase str)).<br></blockquote><div><br></div><div>As explained in U= nicode section 5.18, the foldcase mappings (in <<a href=3D"https://www.u= nicode.org/Public/UNIDATA/CaseFolding.txt">https://www.unicode.org/Public/U= NIDATA/CaseFolding.txt</a>>, the lines with status C and F) actually cre= ate a set of equivalence classes that are closed under {upper,lower,title}c= ase mapping, and then choose a single character to represent each class.=C2= =A0 This is usually the unique lowercase character, but not always: in Cher= okee it is the uppercase character, and in the set {=CE=A3, =CF=83, =CF=82}= it is=C2=A0 =CF=83.=C2=A0=C2=A0</div><div><br></div><div><div dir=3D"ltr" class=3D"gmai= l_attr">On Sun, Nov 17, 2019 at 6:20 AM <<a href=3D"mailto:tomas@tuxteam= .de">tomas@HIDDEN</a>> wrote:<br></div><div dir=3D"ltr" class=3D"gma= il_attr"><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0p= x 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Good c= atch. I think there's even a worse example: dotless<br> and dotted I [1]. Here it seems even impossible to do<br> up- and downcase correctly without knowing the language<br> context.<br></blockquote><div><br></div><div>Language-specific case mapping= s are explicitly out of Scheme's remit: they have to be performed by sp= ecialized libraries.=C2=A0 There is an additional situation in Lithuanian d= ictionaries (but not running text): an "i" with a tone accent is = represented as "i"=C2=A0+ dot above=C2=A0+ accent, like this:=C2= =A0 "i=CC=87=CC=81".=C2=A0 However, this dot above must be droppe= d when uppercasing, producing ordinary "=C3=8D".</div></div></div= ></div> --0000000000000fcec405978ecbde--
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.