GNU logs - #16812, boring messages


Message sent to bug-grep@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#16812: Eszett handling
Resent-From: Ben Boeckel <mathstuf@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Wed, 19 Feb 2014 19:04:01 +0000
Resent-Message-ID: <handler.16812.B.139283658421125 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 16812
X-GNU-PR-Package: grep
X-GNU-PR-Keywords: 
To: 16812 <at> debbugs.gnu.org
X-Debbugs-Original-To: bug-grep@HIDDEN
Reply-To: mathstuf@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.139283658421125
          (code B ref -1); Wed, 19 Feb 2014 19:04:01 +0000
Received: (at submit) by debbugs.gnu.org; 19 Feb 2014 19:03:04 +0000
Received: from localhost ([127.0.0.1]:60564 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WGCQB-0005Ue-5i
	for submit <at> debbugs.gnu.org; Wed, 19 Feb 2014 14:03:03 -0500
Received: from eggs.gnu.org ([208.118.235.92]:57326)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <mathstuf@HIDDEN>) id 1WGCMx-0005NO-Kc
 for submit <at> debbugs.gnu.org; Wed, 19 Feb 2014 13:59:44 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <mathstuf@HIDDEN>) id 1WGCMo-0007CX-AL
 for submit <at> debbugs.gnu.org; Wed, 19 Feb 2014 13:59:38 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM,
 T_DKIM_INVALID autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:59875)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <mathstuf@HIDDEN>) id 1WGCMo-0007CS-7H
 for submit <at> debbugs.gnu.org; Wed, 19 Feb 2014 13:59:34 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59005)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <mathstuf@HIDDEN>) id 1WGCMk-0001TW-4N
 for bug-grep@HIDDEN; Wed, 19 Feb 2014 13:59:34 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <mathstuf@HIDDEN>) id 1WGCMd-0006vQ-9l
 for bug-grep@HIDDEN; Wed, 19 Feb 2014 13:59:30 -0500
Received: from mail-ie0-x22a.google.com ([2607:f8b0:4001:c03::22a]:65032)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <mathstuf@HIDDEN>) id 1WGCMd-0006tA-4X
 for bug-grep@HIDDEN; Wed, 19 Feb 2014 13:59:23 -0500
Received: by mail-ie0-f170.google.com with SMTP id rl12so550487iec.1
 for <bug-grep@HIDDEN>; Wed, 19 Feb 2014 10:59:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=date:from:to:subject:message-id:reply-to:mime-version:content-type
 :content-disposition:content-transfer-encoding:user-agent;
 bh=7rwUelcuoIcLsbj7/WCGd8jsx9JpY7rCdki13aA48Jg=;
 b=SzclpPVSUoUgNqRLDRSF6iBv/6y4meCkNiBA8X2Ri1fZxVsGrh6lDWLiwr7T1316Li
 aboUfSYVnicR/vl/dHgtDoV5QV+UlD37rR8v1xIR33X+BcNDyEmcYTaV5fl0iTPvoyZ/
 r+NE4jPw6Xayf1CMOUdL3vJF44FQW5hdng5qJPyjqsyNZtDtwUEt3k2dM+irs8r3n1vh
 BgSFeR3y0UYOdtOhRjWMlsCSgCXRecsBazqvu4Gw56xz7+qntTE0Anr51mK0/Al9CSwZ
 PSnLkMh81fWUhajWV3vxMbpG7qJhEayc+bxOjfKhpENKbN7e2YoL6F6p1M96Xelr6BII
 4YiQ==
X-Received: by 10.43.129.70 with SMTP id hh6mr1984902icc.68.1392836361742;
 Wed, 19 Feb 2014 10:59:21 -0800 (PST)
Received: from erythro (tripoint.kitware.com. [66.194.253.20])
 by mx.google.com with ESMTPSA id ai4sm52247382igd.3.2014.02.19.10.59.19
 for <bug-grep@HIDDEN>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Wed, 19 Feb 2014 10:59:19 -0800 (PST)
Date: Wed, 19 Feb 2014 13:59:18 -0500
From: Ben Boeckel <mathstuf@HIDDEN>
Message-ID: <20140219185918.GA2438@HIDDEN>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="r5Pyd7+fXNt84Ff3"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.5.21 (2010-09-15)
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
 (bad octet value).
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
 (bad octet value).
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.0 (----)
X-Mailman-Approved-At: Wed, 19 Feb 2014 14:03:01 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -4.0 (----)


--r5Pyd7+fXNt84Ff3
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

[ I am not subscribed; please keep me on the CC. ]

Hi,

From the new grep announcement on LWN[1], I had a thought about how the
German eszett was handled. It seems that it hasn't been handled at all.
This may fall to the same resolution as the recent LJ/Lj thread[2]
though.

Basically, it seems that grep doesn't support alternates when changing
case. The uppercase of 'ß' is either 'SS' or 'ẞ' depending on the
context[3]. From some poking, only the latter is supported. My
thought[4] was that the code would generate '[ßSS]' which would be wrong
when matching and would instead need to do '(ß|SS)'. It now seems that
'(ß|SS|ẞ)' or even '(ß|[sS][sS]|ẞ)' would need to be generated instead
using the new code.

I've attached a test case I wrote based on 'turkish-eyes'. I release it
to the public domain.

Thanks,

--Ben

[1]https://lwn.net/Articles/586899/
[2]https://lists.gnu.org/archive/html/bug-grep/2014-02/msg00004.html
[3]https://en.wikipedia.org/wiki/Capital_%C3%9F
[4]https://lwn.net/Articles/587010/

--r5Pyd7+fXNt84Ff3
Content-Type: text/plain; charset=utf-8
Content-Disposition: attachment; filename=german-eszett
Content-Transfer-Encoding: 8bit

#!/bin/sh
# Ensure that case-insensitive matching works with German eszett

. "${srcdir=.}/init.sh"; path_prepend_ ../src

require_en_utf8_locale_
require_compiled_in_MB_support

fail=0

L=de_DE.UTF-8

ss=$(printf '\303\237')     # lowercase eszett
SS=$(printf '\341\272\236') # uppercase eszett

# Ensure that this matches:
# printf 'ß:SS ß:ẞ\n'|LC_ALL=de_DE.UTF-8 grep -i 'SS:ß ẞ:ß'

      data="$ss:SS $ss:$SS"
search_str="SS:$ss $SS:$ss "
printf "$data\n" > in || framework_failure_

for opt in -E -F -G; do
  LC_ALL=$L grep $opt -i "$search_str" in > out || fail=1
  compare out in || fail=1
done

Exit $fail

--r5Pyd7+fXNt84Ff3--




Message sent:


Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Mailer: MIME-tools 5.503 (Entity 5.503)
Content-Type: text/plain; charset=utf-8
X-Loop: help-debbugs@HIDDEN
From: help-debbugs@HIDDEN (GNU bug Tracking System)
To: mathstuf@HIDDEN
Subject: bug#16812: Acknowledgement (Eszett handling)
Message-ID: <handler.16812.B.139283658421125.ack <at> debbugs.gnu.org>
References: <20140219185918.GA2438@HIDDEN>
X-Gnu-PR-Message: ack 16812
X-Gnu-PR-Package: grep
Reply-To: 16812 <at> debbugs.gnu.org
Date: Wed, 19 Feb 2014 19:04:02 +0000

Thank you for filing a new bug report with debbugs.gnu.org.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
 bug-grep@HIDDEN

If you wish to submit further information on this problem, please
send it to 16812 <at> debbugs.gnu.org.

Please do not send mail to help-debbugs@HIDDEN unless you wish
to report a problem with the Bug-tracking system.

--=20
16812: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D16812
GNU Bug Tracking System
Contact help-debbugs@HIDDEN with problems


Message sent to bug-grep@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#16812: Eszett handling
Resent-From: Eric Blake <eblake@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Wed, 19 Feb 2014 20:29:01 +0000
Resent-Message-ID: <handler.16812.B16812.139284168530195 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 16812
X-GNU-PR-Package: grep
X-GNU-PR-Keywords: 
To: mathstuf@HIDDEN, 16812 <at> debbugs.gnu.org
Received: via spool by 16812-submit <at> debbugs.gnu.org id=B16812.139284168530195
          (code B ref 16812); Wed, 19 Feb 2014 20:29:01 +0000
Received: (at 16812) by debbugs.gnu.org; 19 Feb 2014 20:28:05 +0000
Received: from localhost ([127.0.0.1]:60662 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WGDkS-0007qx-Hw
	for submit <at> debbugs.gnu.org; Wed, 19 Feb 2014 15:28:04 -0500
Received: from mx1.redhat.com ([209.132.183.28]:39641)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <eblake@HIDDEN>) id 1WGDkO-0007qW-PP
 for 16812 <at> debbugs.gnu.org; Wed, 19 Feb 2014 15:28:02 -0500
Received: from int-mx09.intmail.prod.int.phx2.redhat.com
 (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22])
 by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s1JKRxBh008257
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
 Wed, 19 Feb 2014 15:27:59 -0500
Received: from [10.3.113.83] (ovpn-113-83.phx2.redhat.com [10.3.113.83])
 by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id
 s1JKRwvc027936; Wed, 19 Feb 2014 15:27:58 -0500
Message-ID: <530513CE.8000507@HIDDEN>
Date: Wed, 19 Feb 2014 13:27:58 -0700
From: Eric Blake <eblake@HIDDEN>
Organization: Red Hat, Inc.
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:24.0) Gecko/20100101 Thunderbird/24.3.0
MIME-Version: 1.0
References: <20140219185918.GA2438@HIDDEN>
In-Reply-To: <20140219185918.GA2438@HIDDEN>
X-Enigmail-Version: 1.6
OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg
Content-Type: multipart/signed; micalg=pgp-sha256;
 protocol="application/pgp-signature";
 boundary="WtaembBnr02bTo7sIQVqagCHHB66qIFew"
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22
X-Spam-Score: -5.6 (-----)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.6 (-----)

This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--WtaembBnr02bTo7sIQVqagCHHB66qIFew
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 02/19/2014 11:59 AM, Ben Boeckel wrote:
> [ I am not subscribed; please keep me on the CC. ]
>=20
> Hi,
>=20
>>From the new grep announcement on LWN[1], I had a thought about how the=

> German eszett was handled. It seems that it hasn't been handled at all.=

> This may fall to the same resolution as the recent LJ/Lj thread[2]
> though.
>=20
> Basically, it seems that grep doesn't support alternates when changing
> case. The uppercase of '=C3=9F' is either 'SS' or '=E1=BA=9E' depending=
 on the
> context[3].

Alas, in terms of POSIX functionality, we can only change case between
single-character entities.  Changing =C3=9F to SS is a
single->multi-character change; it is DIFFERENT than the Turkish i
situation (there, although we change between single-byte and multi-byte,
the changes are still always single character).  Similar problems apply
to Greek trailing sigma, which is also a context-sensitive change operati=
on.

As long as we are stuck using the POSIX definition of case changes on a
character-by-character basis, where the input and output are 1:1
character mappings, we cannot handle the German eszett case specially.
For PROPER handling of locale-sensitive case rules, we'd need full
Unicode rules that operate on words, rather than characters, which
quickly gets out of scope of what we can do in POSIX regex.


--=20
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


--WtaembBnr02bTo7sIQVqagCHHB66qIFew
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBCAAGBQJTBRPOAAoJEKeha0olJ0NqvBoH/Ajj45Eh9kCjUd9zRkmv2nGv
uWx+WtHH4ICbLSM9s+cTzGvqBn+U+n4K1IUpwgCsnGLFnjQhYxh2rxBktuxsbWd0
D0s0EAjNooB7drhah7uLT91qOcxOOkPqeed0LlkphMmCazwro/qgdp5HaBluxBPJ
NyC9EpzE/L0aOkrKtd0el9bcVOrcEhslPo3bpBFuINVgb3YRPSs0FQlHKG85tmyG
YyeoiB0/rBr5qI4oqPxabwsjeQkj0uA1GxB2t02BM4yWoN5w1yEPGjepDGiNOU1u
gdAVSXRkq1UJ3gkVc1vHV5qG4YplFrV/gsfCKsmxHIEufuEv44X6951C5t83XxM=
=Iay+
-----END PGP SIGNATURE-----

--WtaembBnr02bTo7sIQVqagCHHB66qIFew--




Message sent to bug-grep@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#16812: Eszett handling
Resent-From: Johannes Meixner <jsmeix@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Thu, 20 Feb 2014 16:55:02 +0000
Resent-Message-ID: <handler.16812.B.13929152905482 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 16812
X-GNU-PR-Package: grep
X-GNU-PR-Keywords: 
To: 16812 <at> debbugs.gnu.org
Cc: mathstuf@HIDDEN
X-Debbugs-Original-To: bug-grep@HIDDEN
Received: via spool by submit <at> debbugs.gnu.org id=B.13929152905482
          (code B ref -1); Thu, 20 Feb 2014 16:55:02 +0000
Received: (at submit) by debbugs.gnu.org; 20 Feb 2014 16:54:50 +0000
Received: from localhost ([127.0.0.1]:33824 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WGWtd-0001QJ-Cg
	for submit <at> debbugs.gnu.org; Thu, 20 Feb 2014 11:54:50 -0500
Received: from eggs.gnu.org ([208.118.235.92]:41130)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <jsmeix@HIDDEN>) id 1WGQXw-0005gR-Du
 for submit <at> debbugs.gnu.org; Thu, 20 Feb 2014 05:08:00 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <jsmeix@HIDDEN>) id 1WGQXk-0007iq-Nt
 for submit <at> debbugs.gnu.org; Thu, 20 Feb 2014 05:07:55 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled
 version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:55898)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <jsmeix@HIDDEN>) id 1WGQXk-0007im-Ll
 for submit <at> debbugs.gnu.org; Thu, 20 Feb 2014 05:07:48 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:42773)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <jsmeix@HIDDEN>) id 1WGQXe-0004am-N6
 for bug-grep@HIDDEN; Thu, 20 Feb 2014 05:07:48 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <jsmeix@HIDDEN>) id 1WGQXY-0007gN-Sn
 for bug-grep@HIDDEN; Thu, 20 Feb 2014 05:07:42 -0500
Received: from cantor2.suse.de ([195.135.220.15]:39138 helo=mx2.suse.de)
 by eggs.gnu.org with esmtp (Exim 4.71)
 (envelope-from <jsmeix@HIDDEN>) id 1WGQXY-0007fF-KT
 for bug-grep@HIDDEN; Thu, 20 Feb 2014 05:07:36 -0500
Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254])
 by mx2.suse.de (Postfix) with ESMTP id 2C7B7AD11;
 Thu, 20 Feb 2014 10:07:34 +0000 (UTC)
Date: Thu, 20 Feb 2014 11:07:34 +0100 (CET)
From: Johannes Meixner <jsmeix@HIDDEN>
In-Reply-To: <20140219185918.GA2438@HIDDEN>
Message-ID: <alpine.LNX.2.00.1402201051240.8941@HIDDEN>
References: <20140219185918.GA2438@HIDDEN>
User-Agent: Alpine 2.00 (LNX 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED;
 BOUNDARY="2013985540-1468786226-1392890854=:8941"
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x
X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address
 (bad octet value).
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -5.0 (-----)
X-Mailman-Approved-At: Thu, 20 Feb 2014 11:54:47 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -5.0 (-----)

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--2013985540-1468786226-1392890854=:8941
Content-Type: TEXT/PLAIN; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable


Hello,

On Feb 19 13:59 Ben Boeckel wrote (excerpt):
> [ I am not subscribed; please keep me on the CC. ]
...
> I had a thought about how the German eszett was handled
...
> Basically, it seems that grep doesn't support alternates when changing
> case. The uppercase of '=C3=9F' is either 'SS' or '?' depending on the
> context

As far as I understand it you are talking about
"Unicode case folding".

As far as I know grep does not support "Unicode case folding".

Currently grep works on a pure "character by character" base
where each character could be in UTF-8 encoding (a possible
encoding for Unicode characters) so that grep supports
the UTF-8 encoding which could be misunderstood that
grep supports Unicode but the latter is not true.

For more details see the various (usually very long mail threads)
regarding "grep -i" in particular together with UTF-8.

For example on

http://lists.gnu.org/archive/html/bug-grep/2012-06/threads.html#00011

mail threads like
"Ignore case handling of special unicode characters (case folding)"
which is
http://savannah.gnu.org/bugs/?36682
or the mail thread
"grep -i (case-insensitive) is broken with UTF8"


Kind Regards
Johannes Meixner
--=20
SUSE LINUX Products GmbH -- Maxfeldstrasse 5 -- 90409 Nuernberg -- German=
y
HRB 16746 (AG Nuernberg) GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffe=
r
--2013985540-1468786226-1392890854=:8941--




Message sent to bug-grep@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#16812: Eszett handling
References: <20140219185918.GA2438@HIDDEN>
In-Reply-To: <20140219185918.GA2438@HIDDEN>
Resent-From: Paul Eggert <eggert@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-grep@HIDDEN
Resent-Date: Sat, 08 Mar 2014 18:53:01 +0000
Resent-Message-ID: <handler.16812.B16812.139430477325858 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 16812
X-GNU-PR-Package: grep
X-GNU-PR-Keywords: 
To: 16812 <at> debbugs.gnu.org
Received: via spool by 16812-submit <at> debbugs.gnu.org id=B16812.139430477325858
          (code B ref 16812); Sat, 08 Mar 2014 18:53:01 +0000
Received: (at 16812) by debbugs.gnu.org; 8 Mar 2014 18:52:53 +0000
Received: from localhost ([127.0.0.1]:56910 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WMMMe-0006iz-Vf
	for submit <at> debbugs.gnu.org; Sat, 08 Mar 2014 13:52:53 -0500
Received: from smtp.cs.ucla.edu ([131.179.128.62]:40608)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <eggert@HIDDEN>) id 1WMMMd-0006is-0e
 for 16812 <at> debbugs.gnu.org; Sat, 08 Mar 2014 13:52:51 -0500
Received: from localhost (localhost.localdomain [127.0.0.1])
 by smtp.cs.ucla.edu (Postfix) with ESMTP id 6D8D739E8013
 for <16812 <at> debbugs.gnu.org>; Sat,  8 Mar 2014 10:52:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu
Received: from smtp.cs.ucla.edu ([127.0.0.1])
 by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id JmwC5-dD73+5 for <16812 <at> debbugs.gnu.org>;
 Sat,  8 Mar 2014 10:52:49 -0800 (PST)
Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net
 [108.0.233.62])
 by smtp.cs.ucla.edu (Postfix) with ESMTPSA id BA6BC39E8008
 for <16812 <at> debbugs.gnu.org>; Sat,  8 Mar 2014 10:52:49 -0800 (PST)
Message-ID: <531B6701.5030802@HIDDEN>
Date: Sat, 08 Mar 2014 10:52:49 -0800
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:24.0) Gecko/20100101 Thunderbird/24.3.0
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.3 (--)

'grep' is conforming to its specification, even though it's not as 
useful as it might be when searching German text.  The situation with 
'ß'/'SS' is different than the situation with 'lj'/'Lj'/'LJ' because in the 
latter case 'grep' is dealing only with individual characters.

There's a related issue with 'ß' versus the recently-introduced capital 
sharp-S 'ẞ'.  These do not match each other with 'grep --ignore-case' in 
the current savannah git master.  This is an unfortunate property of how 
the glibc regex code behaves: the regex code uppercases both pattern and 
data before comparing, but in the standard German locale 'ß' is 
unchanged by uppercasing.

I'll leave this bug open as it is an awkward situation.  Fixing it would 
require changing the glibc regex code, which is a big deal -- it would 
have some performance implications in a lot of programs.  So I'm not 
optimistic about fixing it any time soon.




Message received at control <at> debbugs.gnu.org:


Received: (at control) by debbugs.gnu.org; 27 Apr 2014 01:14:50 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Apr 26 21:14:50 2014
Received: from localhost ([127.0.0.1]:59414 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.80)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1WeDg9-0008Gf-Qp
	for submit <at> debbugs.gnu.org; Sat, 26 Apr 2014 21:14:50 -0400
Received: from smtp.cs.ucla.edu ([131.179.128.62]:60988)
 by debbugs.gnu.org with esmtp (Exim 4.80)
 (envelope-from <eggert@HIDDEN>) id 1WeDg6-0008GR-8N
 for control <at> debbugs.gnu.org; Sat, 26 Apr 2014 21:14:47 -0400
Received: from localhost (localhost.localdomain [127.0.0.1])
 by smtp.cs.ucla.edu (Postfix) with ESMTP id B84C8A60008
 for <control <at> debbugs.gnu.org>; Sat, 26 Apr 2014 18:14:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu
Received: from smtp.cs.ucla.edu ([127.0.0.1])
 by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id neeB468xqJ29 for <control <at> debbugs.gnu.org>;
 Sat, 26 Apr 2014 18:14:37 -0700 (PDT)
Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net
 [108.0.233.62])
 by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 3F41F39E801A
 for <control <at> debbugs.gnu.org>; Sat, 26 Apr 2014 18:14:37 -0700 (PDT)
Message-ID: <535C59FC.70903@HIDDEN>
Date: Sat, 26 Apr 2014 18:14:36 -0700
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:24.0) Gecko/20100101 Thunderbird/24.4.0
MIME-Version: 1.0
To: control <at> debbugs.gnu.org
Subject: grep buglist maintenance
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -3.0 (---)
X-Debbugs-Envelope-To: control
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.0 (---)

severity 16812 wishlist
severity 17280 wishlist
tags 15199 + moreinfo
tags 16444 + moreinfo
thanks





Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.