Paul Eggert <eggert@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Received: (at 16812) by debbugs.gnu.org; 8 Mar 2014 18:52:53 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sat Mar 08 13:52:53 2014 Received: from localhost ([127.0.0.1]:56910 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1WMMMe-0006iz-Vf for submit <at> debbugs.gnu.org; Sat, 08 Mar 2014 13:52:53 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:40608) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <eggert@HIDDEN>) id 1WMMMd-0006is-0e for 16812 <at> debbugs.gnu.org; Sat, 08 Mar 2014 13:52:51 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 6D8D739E8013 for <16812 <at> debbugs.gnu.org>; Sat, 8 Mar 2014 10:52:50 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JmwC5-dD73+5 for <16812 <at> debbugs.gnu.org>; Sat, 8 Mar 2014 10:52:49 -0800 (PST) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id BA6BC39E8008 for <16812 <at> debbugs.gnu.org>; Sat, 8 Mar 2014 10:52:49 -0800 (PST) Message-ID: <531B6701.5030802@HIDDEN> Date: Sat, 08 Mar 2014 10:52:49 -0800 From: Paul Eggert <eggert@HIDDEN> Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: 16812 <at> debbugs.gnu.org Subject: Re: Eszett handling Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 16812 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.3 (--) 'grep' is conforming to its specification, even though it's not as useful as it might be when searching German text. The situation with 'ß'/'SS' is different than the situation with 'lj'/'Lj'/'LJ' because in the latter case 'grep' is dealing only with individual characters. There's a related issue with 'ß' versus the recently-introduced capital sharp-S 'ẞ'. These do not match each other with 'grep --ignore-case' in the current savannah git master. This is an unfortunate property of how the glibc regex code behaves: the regex code uppercases both pattern and data before comparing, but in the standard German locale 'ß' is unchanged by uppercasing. I'll leave this bug open as it is an awkward situation. Fixing it would require changing the glibc regex code, which is a big deal -- it would have some performance implications in a lot of programs. So I'm not optimistic about fixing it any time soon.
bug-grep@HIDDEN
:bug#16812
; Package grep
.
Full text available.Received: (at submit) by debbugs.gnu.org; 20 Feb 2014 16:54:50 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Feb 20 11:54:50 2014 Received: from localhost ([127.0.0.1]:33824 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1WGWtd-0001QJ-Cg for submit <at> debbugs.gnu.org; Thu, 20 Feb 2014 11:54:50 -0500 Received: from eggs.gnu.org ([208.118.235.92]:41130) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <jsmeix@HIDDEN>) id 1WGQXw-0005gR-Du for submit <at> debbugs.gnu.org; Thu, 20 Feb 2014 05:08:00 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <jsmeix@HIDDEN>) id 1WGQXk-0007iq-Nt for submit <at> debbugs.gnu.org; Thu, 20 Feb 2014 05:07:55 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:55898) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <jsmeix@HIDDEN>) id 1WGQXk-0007im-Ll for submit <at> debbugs.gnu.org; Thu, 20 Feb 2014 05:07:48 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42773) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <jsmeix@HIDDEN>) id 1WGQXe-0004am-N6 for bug-grep@HIDDEN; Thu, 20 Feb 2014 05:07:48 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <jsmeix@HIDDEN>) id 1WGQXY-0007gN-Sn for bug-grep@HIDDEN; Thu, 20 Feb 2014 05:07:42 -0500 Received: from cantor2.suse.de ([195.135.220.15]:39138 helo=mx2.suse.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <jsmeix@HIDDEN>) id 1WGQXY-0007fF-KT for bug-grep@HIDDEN; Thu, 20 Feb 2014 05:07:36 -0500 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 2C7B7AD11; Thu, 20 Feb 2014 10:07:34 +0000 (UTC) Date: Thu, 20 Feb 2014 11:07:34 +0100 (CET) From: Johannes Meixner <jsmeix@HIDDEN> To: bug-grep@HIDDEN Subject: Re: bug#16812: Eszett handling In-Reply-To: <20140219185918.GA2438@HIDDEN> Message-ID: <alpine.LNX.2.00.1402201051240.8941@HIDDEN> References: <20140219185918.GA2438@HIDDEN> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="2013985540-1468786226-1392890854=:8941" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Thu, 20 Feb 2014 11:54:47 -0500 Cc: mathstuf@HIDDEN X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -5.0 (-----) This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --2013985540-1468786226-1392890854=:8941 Content-Type: TEXT/PLAIN; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Hello, On Feb 19 13:59 Ben Boeckel wrote (excerpt): > [ I am not subscribed; please keep me on the CC. ] ... > I had a thought about how the German eszett was handled ... > Basically, it seems that grep doesn't support alternates when changing > case. The uppercase of '=C3=9F' is either 'SS' or '?' depending on the > context As far as I understand it you are talking about "Unicode case folding". As far as I know grep does not support "Unicode case folding". Currently grep works on a pure "character by character" base where each character could be in UTF-8 encoding (a possible encoding for Unicode characters) so that grep supports the UTF-8 encoding which could be misunderstood that grep supports Unicode but the latter is not true. For more details see the various (usually very long mail threads) regarding "grep -i" in particular together with UTF-8. For example on http://lists.gnu.org/archive/html/bug-grep/2012-06/threads.html#00011 mail threads like "Ignore case handling of special unicode characters (case folding)" which is http://savannah.gnu.org/bugs/?36682 or the mail thread "grep -i (case-insensitive) is broken with UTF8" Kind Regards Johannes Meixner --=20 SUSE LINUX Products GmbH -- Maxfeldstrasse 5 -- 90409 Nuernberg -- German= y HRB 16746 (AG Nuernberg) GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffe= r --2013985540-1468786226-1392890854=:8941--
bug-grep@HIDDEN
:bug#16812
; Package grep
.
Full text available.Received: (at 16812) by debbugs.gnu.org; 19 Feb 2014 20:28:05 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 19 15:28:05 2014 Received: from localhost ([127.0.0.1]:60662 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1WGDkS-0007qx-Hw for submit <at> debbugs.gnu.org; Wed, 19 Feb 2014 15:28:04 -0500 Received: from mx1.redhat.com ([209.132.183.28]:39641) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <eblake@HIDDEN>) id 1WGDkO-0007qW-PP for 16812 <at> debbugs.gnu.org; Wed, 19 Feb 2014 15:28:02 -0500 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s1JKRxBh008257 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 19 Feb 2014 15:27:59 -0500 Received: from [10.3.113.83] (ovpn-113-83.phx2.redhat.com [10.3.113.83]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s1JKRwvc027936; Wed, 19 Feb 2014 15:27:58 -0500 Message-ID: <530513CE.8000507@HIDDEN> Date: Wed, 19 Feb 2014 13:27:58 -0700 From: Eric Blake <eblake@HIDDEN> Organization: Red Hat, Inc. User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: mathstuf@HIDDEN, 16812 <at> debbugs.gnu.org Subject: Re: bug#16812: Eszett handling References: <20140219185918.GA2438@HIDDEN> In-Reply-To: <20140219185918.GA2438@HIDDEN> X-Enigmail-Version: 1.6 OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="WtaembBnr02bTo7sIQVqagCHHB66qIFew" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Spam-Score: -5.6 (-----) X-Debbugs-Envelope-To: 16812 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -5.6 (-----) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --WtaembBnr02bTo7sIQVqagCHHB66qIFew Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 02/19/2014 11:59 AM, Ben Boeckel wrote: > [ I am not subscribed; please keep me on the CC. ] >=20 > Hi, >=20 >>From the new grep announcement on LWN[1], I had a thought about how the= > German eszett was handled. It seems that it hasn't been handled at all.= > This may fall to the same resolution as the recent LJ/Lj thread[2] > though. >=20 > Basically, it seems that grep doesn't support alternates when changing > case. The uppercase of '=C3=9F' is either 'SS' or '=E1=BA=9E' depending= on the > context[3]. Alas, in terms of POSIX functionality, we can only change case between single-character entities. Changing =C3=9F to SS is a single->multi-character change; it is DIFFERENT than the Turkish i situation (there, although we change between single-byte and multi-byte, the changes are still always single character). Similar problems apply to Greek trailing sigma, which is also a context-sensitive change operati= on. As long as we are stuck using the POSIX definition of case changes on a character-by-character basis, where the input and output are 1:1 character mappings, we cannot handle the German eszett case specially. For PROPER handling of locale-sensitive case rules, we'd need full Unicode rules that operate on words, rather than characters, which quickly gets out of scope of what we can do in POSIX regex. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --WtaembBnr02bTo7sIQVqagCHHB66qIFew Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJTBRPOAAoJEKeha0olJ0NqvBoH/Ajj45Eh9kCjUd9zRkmv2nGv uWx+WtHH4ICbLSM9s+cTzGvqBn+U+n4K1IUpwgCsnGLFnjQhYxh2rxBktuxsbWd0 D0s0EAjNooB7drhah7uLT91qOcxOOkPqeed0LlkphMmCazwro/qgdp5HaBluxBPJ NyC9EpzE/L0aOkrKtd0el9bcVOrcEhslPo3bpBFuINVgb3YRPSs0FQlHKG85tmyG YyeoiB0/rBr5qI4oqPxabwsjeQkj0uA1GxB2t02BM4yWoN5w1yEPGjepDGiNOU1u gdAVSXRkq1UJ3gkVc1vHV5qG4YplFrV/gsfCKsmxHIEufuEv44X6951C5t83XxM= =Iay+ -----END PGP SIGNATURE----- --WtaembBnr02bTo7sIQVqagCHHB66qIFew--
bug-grep@HIDDEN
:bug#16812
; Package grep
.
Full text available.Received: (at submit) by debbugs.gnu.org; 19 Feb 2014 19:03:04 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Feb 19 14:03:04 2014 Received: from localhost ([127.0.0.1]:60564 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1WGCQB-0005Ue-5i for submit <at> debbugs.gnu.org; Wed, 19 Feb 2014 14:03:03 -0500 Received: from eggs.gnu.org ([208.118.235.92]:57326) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from <mathstuf@HIDDEN>) id 1WGCMx-0005NO-Kc for submit <at> debbugs.gnu.org; Wed, 19 Feb 2014 13:59:44 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <mathstuf@HIDDEN>) id 1WGCMo-0007CX-AL for submit <at> debbugs.gnu.org; Wed, 19 Feb 2014 13:59:38 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:59875) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <mathstuf@HIDDEN>) id 1WGCMo-0007CS-7H for submit <at> debbugs.gnu.org; Wed, 19 Feb 2014 13:59:34 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59005) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <mathstuf@HIDDEN>) id 1WGCMk-0001TW-4N for bug-grep@HIDDEN; Wed, 19 Feb 2014 13:59:34 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <mathstuf@HIDDEN>) id 1WGCMd-0006vQ-9l for bug-grep@HIDDEN; Wed, 19 Feb 2014 13:59:30 -0500 Received: from mail-ie0-x22a.google.com ([2607:f8b0:4001:c03::22a]:65032) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <mathstuf@HIDDEN>) id 1WGCMd-0006tA-4X for bug-grep@HIDDEN; Wed, 19 Feb 2014 13:59:23 -0500 Received: by mail-ie0-f170.google.com with SMTP id rl12so550487iec.1 for <bug-grep@HIDDEN>; Wed, 19 Feb 2014 10:59:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:reply-to:mime-version:content-type :content-disposition:content-transfer-encoding:user-agent; bh=7rwUelcuoIcLsbj7/WCGd8jsx9JpY7rCdki13aA48Jg=; b=SzclpPVSUoUgNqRLDRSF6iBv/6y4meCkNiBA8X2Ri1fZxVsGrh6lDWLiwr7T1316Li aboUfSYVnicR/vl/dHgtDoV5QV+UlD37rR8v1xIR33X+BcNDyEmcYTaV5fl0iTPvoyZ/ r+NE4jPw6Xayf1CMOUdL3vJF44FQW5hdng5qJPyjqsyNZtDtwUEt3k2dM+irs8r3n1vh BgSFeR3y0UYOdtOhRjWMlsCSgCXRecsBazqvu4Gw56xz7+qntTE0Anr51mK0/Al9CSwZ PSnLkMh81fWUhajWV3vxMbpG7qJhEayc+bxOjfKhpENKbN7e2YoL6F6p1M96Xelr6BII 4YiQ== X-Received: by 10.43.129.70 with SMTP id hh6mr1984902icc.68.1392836361742; Wed, 19 Feb 2014 10:59:21 -0800 (PST) Received: from erythro (tripoint.kitware.com. [66.194.253.20]) by mx.google.com with ESMTPSA id ai4sm52247382igd.3.2014.02.19.10.59.19 for <bug-grep@HIDDEN> (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 19 Feb 2014 10:59:19 -0800 (PST) Date: Wed, 19 Feb 2014 13:59:18 -0500 From: Ben Boeckel <mathstuf@HIDDEN> To: bug-grep@HIDDEN Subject: Eszett handling Message-ID: <20140219185918.GA2438@HIDDEN> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="r5Pyd7+fXNt84Ff3" Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Wed, 19 Feb 2014 14:03:01 -0500 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: mathstuf@HIDDEN List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -4.0 (----) --r5Pyd7+fXNt84Ff3 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit [ I am not subscribed; please keep me on the CC. ] Hi, From the new grep announcement on LWN[1], I had a thought about how the German eszett was handled. It seems that it hasn't been handled at all. This may fall to the same resolution as the recent LJ/Lj thread[2] though. Basically, it seems that grep doesn't support alternates when changing case. The uppercase of 'ß' is either 'SS' or 'ẞ' depending on the context[3]. From some poking, only the latter is supported. My thought[4] was that the code would generate '[ßSS]' which would be wrong when matching and would instead need to do '(ß|SS)'. It now seems that '(ß|SS|ẞ)' or even '(ß|[sS][sS]|ẞ)' would need to be generated instead using the new code. I've attached a test case I wrote based on 'turkish-eyes'. I release it to the public domain. Thanks, --Ben [1]https://lwn.net/Articles/586899/ [2]https://lists.gnu.org/archive/html/bug-grep/2014-02/msg00004.html [3]https://en.wikipedia.org/wiki/Capital_%C3%9F [4]https://lwn.net/Articles/587010/ --r5Pyd7+fXNt84Ff3 Content-Type: text/plain; charset=utf-8 Content-Disposition: attachment; filename=german-eszett Content-Transfer-Encoding: 8bit #!/bin/sh # Ensure that case-insensitive matching works with German eszett . "${srcdir=.}/init.sh"; path_prepend_ ../src require_en_utf8_locale_ require_compiled_in_MB_support fail=0 L=de_DE.UTF-8 ss=$(printf '\303\237') # lowercase eszett SS=$(printf '\341\272\236') # uppercase eszett # Ensure that this matches: # printf 'ß:SS ß:ẞ\n'|LC_ALL=de_DE.UTF-8 grep -i 'SS:ß ẞ:ß' data="$ss:SS $ss:$SS" search_str="SS:$ss $SS:$ss " printf "$data\n" > in || framework_failure_ for opt in -E -F -G; do LC_ALL=$L grep $opt -i "$search_str" in > out || fail=1 compare out in || fail=1 done Exit $fail --r5Pyd7+fXNt84Ff3--
mathstuf@HIDDEN
:bug-grep@HIDDEN
.
Full text available.bug-grep@HIDDEN
:bug#16812
; Package grep
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.