GNU bug report logs -
#16911
[PATCH] grep: fix bugs with -i and titlecase
Previous Next
Reported by: Paul Eggert <eggert <at> cs.ucla.edu>
Date: Sat, 1 Mar 2014 06:54:02 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 16911 in the body.
You can then email your comments to 16911 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#16911
; Package
grep
.
(Sat, 01 Mar 2014 06:54:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Sat, 01 Mar 2014 06:54:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Tags: patch
The attached patch, which I've pushed, fixes a problem with grep -i and
titlecase that's been bugging me ever since someone pointed out some
titlecase issues on the grep mailing list a few weeks ago. It affects
dfa.c, so I expect it'll fix a similar problem with gawk.
[0001-grep-fix-bugs-with-i-and-titlecase.patch (text/plain, attachment)]
bug closed, send any further explanations to
16911 <at> debbugs.gnu.org and Paul Eggert <eggert <at> cs.ucla.edu>
Request was from
Paul Eggert <eggert <at> cs.ucla.edu>
to
control <at> debbugs.gnu.org
.
(Sat, 01 Mar 2014 06:56:03 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16911
; Package
grep
.
(Sat, 01 Mar 2014 13:32:02 GMT)
Full text and
rfc822 format available.
Message #10 received at 16911 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 02/28/2014 11:53 PM, Paul Eggert wrote:
> Tags: patch
>
> The attached patch, which I've pushed, fixes a problem with grep -i and
> titlecase that's been bugging me ever since someone pointed out some
> titlecase issues on the grep mailing list a few weeks ago. It affects
> dfa.c, so I expect it'll fix a similar problem with gawk.
>
>
> + grep -i no longer mishandles patterns containing titlecase characters.
> + For example, in a locale containing the titlecase character
> + 'Lj' (U+01C8 LATIN CAPITAL LETTER L WITH SMALL LETTER J),
> + 'grep -i Lj' now matches 'LJ' (U+01C7 LATIN CAPITAL LETTER LJ).
Does it also match the lower case version? In other words, are all
three cases of this character treated as equivalent? It might help to
mention all three characters in the NEWS blurb.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16911
; Package
grep
.
(Sat, 01 Mar 2014 23:08:02 GMT)
Full text and
rfc822 format available.
Message #13 received at 16911 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Eric Blake wrote:
> It might help to mention all three characters in the NEWS blurb.
Thanks, I pushed the attached patch.
[0001-doc-describe-titlecase-fix-better.patch (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16911
; Package
grep
.
(Sun, 02 Mar 2014 00:50:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 16911 <at> debbugs.gnu.org (full text, mbox):
Thanks for those patches.
I'm seeing that new test fail on OS/X 10.8.5 and don't have
time to pursue it right away, so in case someone else does, ...
[using the same "in" file created by the test]
$ src/grep -Ei '(Lj)\1' in
LjLj
$ src/grep -Ei '(Lj)' in
ljlj
LjLj
LJLJ
$ src/grep -Ei '(Lj)\1' in
LjLj
$ src/grep -Ei '(lj)\1' in
ljlj
LJLJ
$ src/grep -Ei '(LJ)\1' in
ljlj
LJLJ
Here's the relevant part of the test-suite.log file:
+ LC_ALL=en_US.UTF-8
+ export LC_ALL
+ fail=0
+ LJ='\307\207'
+ Lj='\307\210'
+ lj='\307\211'
++ printf '\307\210\n'
+ pattern=$'<C7>\210'
+ printf '\307\211\307\211\n\307\210\307\210\n\307\207\307\207\n'
+ grep -i $'<C7>\210' in
+ compare in out
+ compare_dev_null_ in out
+ test 2 = 2
+ test xin = x/dev/null
+ test xout = x/dev/null
+ return 2
+ case $? in
+ compare_ in out
+ diff -u in out
+ pattern='(Lj)\1'
+ grep -Ei '(Lj)\1' in
+ compare in out
+ compare_dev_null_ in out
+ test 2 = 2
+ test xin = x/dev/null
+ test xout = x/dev/null
+ return 2
+ case $? in
+ compare_ in out
+ diff -u in out
--- in 2014-03-01 16:22:38.000000000 -0800
+++ out 2014-03-01 16:22:38.000000000 -0800
@@ -1,3 +1 @@
-ljlj
LjLj
-LJLJ
+ fail=1
Did not alter fixed versions and reopened.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 03 Mar 2014 07:11:01 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16911
; Package
grep
.
(Mon, 03 Mar 2014 07:28:01 GMT)
Full text and
rfc822 format available.
Message #21 received at 16911 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
[I've reopened 16911 since the bug's not fixed on OS X.]
Here's my guess. In glibc's en_US locale, 'Lj' is considered to be both
uppercase and lowercase; but in OS X's en_US locale, it's considered to
be neither uppercase nor lowercase. If so, the attached gnulib patch
should fix the problem (though I can't easily test this). Could you
please give it a try?
By the way, I'd like to remove the need for grep's local differences
from the glibc regex code. I assume it's there only to pacify GCC's
warnings flags, and we can do that with pragmas in gnulib. One fix at a
time, though.
[regex-osx.diff (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16911
; Package
grep
.
(Tue, 04 Mar 2014 03:19:02 GMT)
Full text and
rfc822 format available.
Message #24 received at 16911 <at> debbugs.gnu.org (full text, mbox):
On Sun, Mar 2, 2014 at 11:27 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> [I've reopened 16911 since the bug's not fixed on OS X.]
>
> Here's my guess. In glibc's en_US locale, 'Lj' is considered to be both
> uppercase and lowercase; but in OS X's en_US locale, it's considered to be
> neither uppercase nor lowercase. If so, the attached gnulib patch should
> fix the problem (though I can't easily test this). Could you please give it
> a try?
Hi Paul,
That patch does indeed solve the problem.
> By the way, I'd like to remove the need for grep's local differences from
> the glibc regex code. I assume it's there only to pacify GCC's warnings
> flags, and we can do that with pragmas in gnulib. One fix at a time,
> though.
You're right. It was only to avoid warnings from gcc, and using #pragmas
is a better approach, in a project like grep where we rarely modify that code.
Thanks!
Jim
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16911
; Package
grep
.
(Wed, 05 Mar 2014 19:38:02 GMT)
Full text and
rfc822 format available.
Message #27 received at 16911 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 03/03/2014 07:18 PM, Jim Meyering wrote:
> You're right. It was only to avoid warnings from gcc, and using #pragmas
> is a better approach, in a project like grep where we rarely modify that code.
I just now checked, and without the grep diffs there are no warnings
when I configure with grep's 'configure --enable-gcc-warnings' on Fedora
20 (gcc (GCC) 4.8.2 20131212 (Red Hat 4.8.2-7)). Possibly GCC got
smarter, or possibly the pragmas in gnulib regex now suffice. So I've
removed the grep diffs with the attached patch for now; if warnings come
back (older compilers maybe?) we can add more pragmas to the gnulib copy.
[0001-maint-remove-differences-from-gnulib-regex-code.patch (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16911
; Package
grep
.
(Thu, 06 Mar 2014 21:21:02 GMT)
Full text and
rfc822 format available.
Message #30 received at 16911 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 03/01/2014 03:07 PM, Paul Eggert wrote:
> Eric Blake wrote:
>> It might help to mention all three characters in the NEWS blurb.
>
> Thanks, I pushed the attached patch.
I see now that my documentation fix went too far, as it promised
behavior that the regex code does not in fact implement. The plan is to
fix the DFA code to match what the regex code does, and the first step
is to remove the promises that aren't being kept now (when the regex
code is used). I pushed the attach documentation patch.
[0001-doc-do-not-overpromise-ignore-case-s-behavior.patch (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16911
; Package
grep
.
(Fri, 07 Mar 2014 05:58:02 GMT)
Full text and
rfc822 format available.
Message #33 received at 16911 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Jim Meyering wrote:
> That patch does indeed solve the problem.
OK, thanks. I think only part of the patch is actually needed and I see
potential problems with the other part, so I installed the former into
gnulib (see attached) and will leave the latter for later.
[0001-regex-port-to-OS-X-10.8.5-en_US.UTF-8-locale.patch (text/plain, attachment)]
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Sat, 08 Mar 2014 02:43:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
bug acknowledged by developer.
(Sat, 08 Mar 2014 02:43:02 GMT)
Full text and
rfc822 format available.
Message #38 received at 16911-done <at> debbugs.gnu.org (full text, mbox):
I think this bug should be fixed on OS X now, so I'm marking it as done.
We can reopen it later if I'm wrong.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16911
; Package
grep
.
(Sat, 08 Mar 2014 03:12:01 GMT)
Full text and
rfc822 format available.
Message #41 received at 16911 <at> debbugs.gnu.org (full text, mbox):
On Fri, Mar 7, 2014 at 6:42 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> I think this bug should be fixed on OS X now, so I'm marking it as done. We
> can reopen it later if I'm wrong.
Confirmed: it's still fixed. Thanks again.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#16911
; Package
grep
.
(Sat, 08 Mar 2014 03:12:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 05 Apr 2014 11:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 11 years and 33 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.