GNU bug report logs - #18580
[PATCH] dfa: check end of an input buffer after a transition in non-UTF8 multibyte locales

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Mon, 29 Sep 2014 00:13:02 UTC

Severity: normal

Tags: patch

Done: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 18580 in the body.
You can then email your comments to 18580 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Mon, 29 Sep 2014 00:13:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Mon, 29 Sep 2014 00:13:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: bug-grep <at> gnu.org
Subject: [PATCH] dfa: check end of an input buffer after a transition in
 non-UTF8 multibyte locales
Date: Mon, 29 Sep 2014 09:11:17 +0900
[Message part 1 (text/plain, inline)]
If a state has neither ANYCHAR nor MBCSET and next character is eolbyte,
the next state is -1.  So exit loop, checked whether a position is end
of buffer or not.

However, if a state has either ANYCHAR or MBCSET, even if next character
is eolbyte, next state mayn't be -1.  So we must check whether a
position is end of buffer or not, otherwise may run over the buffer.
[0001-dfa-check-end-of-an-input-buffer-after-a-transition-.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Mon, 29 Sep 2014 00:34:02 GMT) Full text and rfc822 format available.

Message #8 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 18580 <at> debbugs.gnu.org
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Sun, 28 Sep 2014 17:32:45 -0700
Thanks, can you provide a test case that illustrates the problem?  We could add 
it to the test suite.




Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Mon, 29 Sep 2014 00:53:02 GMT) Full text and rfc822 format available.

Message #11 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 18580 <at> debbugs.gnu.org
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Sun, 28 Sep 2014 17:52:09 -0700
Also, there are two calls to transit_state but that patch affects only one of 
them.  Why shouldn't both calls be patched?




Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Mon, 29 Sep 2014 05:59:01 GMT) Full text and rfc822 format available.

Message #14 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 18580 <at> debbugs.gnu.org
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Mon, 29 Sep 2014 14:58:38 +0900
[Message part 1 (text/plain, inline)]
Thanks for the review.

> Thanks, can you provide a test case that illustrates the problem?  We
> could add it to the test suite.

I checked both values of `p' and `end' after first transit_state in
following operation with GDB, but I can't have generated core dump by
this bug yet.

  $ echo | env LC_ALL=ja_JP.eucJP src/grep '..........'

> Also, there are two calls to transit_state but that patch affects only
> one of them.  Why shouldn't both calls be patched?

I only tested at the first call of transit_state (because above test case
doesn't pass the second call), but I think that we also fix at the second
call as you say.

[0001-dfa-check-end-of-an-input-buffer-after-a-transition-.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Wed, 01 Oct 2014 15:16:01 GMT) Full text and rfc822 format available.

Message #17 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 18580 <at> debbugs.gnu.org
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Thu, 02 Oct 2014 00:15:44 +0900
[Message part 1 (text/plain, inline)]
I haven't found a clear test case yet.  However, I found another bug
while the investigation.  We also reproduce it on grep-2.19 and grep-2.20,
but it can be fixed by the patch for this bug.

  $ printf 'a\naa\n' | env LC_ALL=zh_CN src/grep ..
  a
  aa

I added a test case and changed the title for the previous patch.
[0001-dfa-fix-behavior-after-a-transition-for-ANYCHAR-or-M.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Wed, 01 Oct 2014 15:43:02 GMT) Full text and rfc822 format available.

Message #20 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: 18580 <at> debbugs.gnu.org
Cc: Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Thu, 02 Oct 2014 00:42:37 +0900
Norihiro Tanaka wrote:
> I haven't found a clear test case yet.

I haven't found a clear test case yet, but I have already found to run
over the end of the input buffer with below.

  $ printf '' | env LC_ALL=zh_CN src/grep -z .





Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Thu, 02 Oct 2014 14:54:01 GMT) Full text and rfc822 format available.

Message #23 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 18580 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Thu, 2 Oct 2014 07:53:28 -0700
On Wed, Oct 1, 2014 at 8:42 AM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Norihiro Tanaka wrote:
>> I haven't found a clear test case yet.
>
> I haven't found a clear test case yet, but I have already found to run
> over the end of the input buffer with below.
>
>   $ printf '' | env LC_ALL=zh_CN src/grep -z .

Thanks. This will work, if nothing better comes up, since
when running ASAN-enabled binaries, this evokes an abort:

   LC_ALL=zh_CN src/grep -z . < /dev/null

[note that I dropped the "env" and using input redirection
is slightly better for debugging than using a pipe]




Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Thu, 02 Oct 2014 15:36:02 GMT) Full text and rfc822 format available.

Message #26 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: 18580 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Fri, 03 Oct 2014 00:35:14 +0900
Thanks.

Jim Meyering wrote:
> Thanks. This will work, if nothing better comes up, since
> when running ASAN-enabled binaries, this evokes an abort:
> 
>    LC_ALL=zh_CN src/grep -z . < /dev/null
> 
> [note that I dropped the "env" and using input redirection
> is slightly better for debugging than using a pipe]

This is reproduced in current master only.  I changed dfa.c to test it
easily, compile and run.

--
diff --git a/src/dfa.c b/src/dfa.c
index 4f45fff..51d5879 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -3351,6 +3351,7 @@ dfaexec_main (struct dfa *d, char const *begin, char *end,
               /* Can match with a multibyte character (and multi character
                  collating element).  Transition table might be updated.  */
               s = transit_state (d, s, &p, (unsigned char *) end);
+printf ("p = %x, end = %x\n", p, end);
               mbp = p;
               trans = d->trans;
             }
--

The result is below.

$ env LC_ALL=zh_CN src/grep -z . </dev/null
p = 80821a5, end = 80821a5
p = 80821a6, end = 80821a5
p = 80821a7, end = 80821a5
p = 80821a8, end = 80821a5
p = 80821a9, end = 80821a5
p = 80821aa, end = 80821a5
p = 80821ab, end = 80821a5
p = 80821ac, end = 80821a5
p = 80821ad, end = 80821a5
p = 80821ae, end = 80821a5
p = 80821af, end = 80821a5
p = 80821b0, end = 80821a5
p = 80821b1, end = 80821a5
p = 80821b2, end = 80821a5
p = 80821b3, end = 80821a5
p = 80821b4, end = 80821a5
p = 80821b5, end = 80821a5
p = 80821b6, end = 80821a5
p = 80821b7, end = 80821a5
p = 80821b8, end = 80821a5
p = 80821b9, end = 80821a5
p = 80821ba, end = 80821a5
p = 80821bb, end = 80821a5
p = 80821bc, end = 80821a5
p = 80821bd, end = 80821a5
p = 80821be, end = 80821a5
p = 80821bf, end = 80821a5
p = 80821c0, end = 80821a5
p = 80821c1, end = 80821a5






Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Thu, 02 Oct 2014 15:48:02 GMT) Full text and rfc822 format available.

Message #29 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: 18580 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Fri, 03 Oct 2014 00:46:55 +0900
Norihiro Tanaka wrote:
> This is reproduced in current master only.  I changed dfa.c to test it
> easily, compile and run.

I tryed it on grep-2.20, but didn't pass even EGexecute.  So I think
there may be also an another bug in front of there in master.





Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Thu, 02 Oct 2014 23:14:02 GMT) Full text and rfc822 format available.

Message #32 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 18580 <at> debbugs.gnu.org, Jim Meyering <jim <at> meyering.net>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Fri, 03 Oct 2014 08:13:45 +0900
[Message part 1 (text/plain, inline)]
Norihiro Tanaka wrote:
> I tryed it on grep-2.20, but didn't pass even EGexecute.  So I think
> there may be also an another bug in front of there in master.

In current master, grep tests for matching with eolbyte to speed-up to
match with binary file, and `execute' is called in this process.

I recognize that an input buffer to pass to `execute' must have the
eolbyte at the top as sentimental, although I don't think that this bug
is caused by that.
[0001-grep-testing-matching-to-with-eolbyte-put-sentimenta.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Thu, 02 Oct 2014 23:58:01 GMT) Full text and rfc822 format available.

Message #35 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: 18580 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Fri, 03 Oct 2014 08:56:53 +0900
[Message part 1 (text/plain, inline)]
Norihiro Tanaka wrote:
> In current master, grep tests for matching with eolbyte to speed-up to
> match with binary file, and `execute' is called in this process.
> 
> I recognize that an input buffer to pass to `execute' must have the
> eolbyte at the top as sentimental, although I don't think that this bug
> is caused by that.

Sorry, the previous patch was wrong.  I corrected it.
[0001-grep-testing-matching-with-an-eolbyte-put-a-sentinel.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Sat, 04 Oct 2014 12:13:02 GMT) Full text and rfc822 format available.

Message #38 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: 18580 <at> debbugs.gnu.org
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, Jim Meyering <jim <at> meyering.net>
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Sat, 04 Oct 2014 21:12:50 +0900
[Message part 1 (text/plain, inline)]
I confirmed an additional byte isn't required at the top of an input
buffer.  However, I also confirmed an additional byte is required at the
end of the input buffer.  dfaexec will temporarily replace it with
eolbyte as sentinel.

Sorry, I changed the patch again.
[0001-grep-testing-matching-with-an-eolbyte-add-a-byte-to-.patch (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Sat, 04 Oct 2014 16:39:02 GMT) Full text and rfc822 format available.

Message #41 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 18580 <at> debbugs.gnu.org
Cc: Jim Meyering <jim <at> meyering.net>
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Sat, 04 Oct 2014 09:37:56 -0700
Norihiro Tanaka wrote:
> However, I also confirmed an additional byte is required at the
> end of the input buffer.  dfaexec will temporarily replace it with
> eolbyte as sentinel.

Thanks, I pushed that after adjusting the checkin log message.  I will try to 
get to the other patches in this bug report soon.





Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Sun, 05 Oct 2014 04:57:01 GMT) Full text and rfc822 format available.

Message #44 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 18580 <at> debbugs.gnu.org, Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Sat, 4 Oct 2014 21:55:56 -0700
[Message part 1 (text/plain, inline)]
On Sat, Oct 4, 2014 at 9:37 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Norihiro Tanaka wrote:
>>
>> However, I also confirmed an additional byte is required at the
>> end of the input buffer.  dfaexec will temporarily replace it with
>> eolbyte as sentinel.
>
>
> Thanks, I pushed that after adjusting the checkin log message.  I will try
> to get to the other patches in this bug report soon.

Actually, we need yet another byte at the end, and one more prior:

When I built with ASAN and rawhide's gcc version 4.9.1 20140930
(Red Hat 4.9.1-11) (GCC), using this command:

  make CFLAGS=-ggdb3 AM_CFLAGS=-fsanitize=address \
    AM_LDFLAGS='-fsanitize=address -static-libasan' check

I saw two test failures. You can see that the first test triggers an access
one past the end, and all others trigger an access one prior to the beginning.
Here is a summary of the problems:

  $ grep Memory tests/inconsistent-range.log tests/empty.log
  tests/inconsistent-range.log:    [32, 34) 'eolbytes' <== Memory
access at offset 34 overflows this variable
  tests/empty.log:    [32, 34) 'eolbytes' <== Memory access at offset
31 underflows this variable
  tests/empty.log:    [32, 34) 'eolbytes' <== Memory access at offset
31 underflows this variable
  tests/empty.log:    [32, 34) 'eolbytes' <== Memory access at offset
31 underflows this variable
  tests/empty.log:    [32, 34) 'eolbytes' <== Memory access at offset
31 underflows this variable
  tests/empty.log:    [32, 34) 'eolbytes' <== Memory access at offset
31 underflows this variable
  tests/empty.log:    [32, 34) 'eolbytes' <== Memory access at offset
31 underflows this variable
  tests/empty.log:    [32, 34) 'eolbytes' <== Memory access at offset
31 underflows this variable
  tests/empty.log:    [32, 34) 'eolbytes' <== Memory access at offset
31 underflows this variable

Here are the first two, in more detail:

  ==25556==ERROR: AddressSanitizer: stack-buffer-overflow on address
0x7fff1d5f8432 at pc 0x0000004b63e5 bp 0x7fff1d5f7e20 sp
0x7fff1d5f7e18
  READ of size 1 at 0x7fff1d5f8432 thread T0
      #0 0x4b63e4 in mbs_to_wchar /home/j/w/co/grep/src/dfa.c:482
      #1 0x4c5c81 in transit_state /home/j/w/co/grep/src/dfa.c:3184
      #2 0x4c6d03 in dfaexec_main /home/j/w/co/grep/src/dfa.c:3353
      #3 0x4c782c in dfaexec_mb /home/j/w/co/grep/src/dfa.c:3449
      #4 0x4c78ea in dfaexec /home/j/w/co/grep/src/dfa.c:3466
      #5 0x4ce116 in EGexecute /home/j/w/co/grep/src/dfasearch.c:310
      #6 0x4b4ac9 in main /home/j/w/co/grep/src/grep.c:2518
      #7 0x7f97de4c90df in __libc_start_main (/lib64/libc.so.6+0x200df)
      #8 0x406dd6 (/home/j/w/co/grep/src/grep+0x406dd6)

  Address 0x7fff1d5f8432 is located in stack of thread T0 at offset 34 in frame
      #0 0x4b32fe in main /home/j/w/co/grep/src/grep.c:2099

    This frame has 6 object(s):
      [32, 34) 'eolbytes' <== Memory access at offset 34 overflows this variable
      [96, 104) 'keyalloc'


  ==25501==ERROR: AddressSanitizer: stack-buffer-underflow on address
0x7fff1faadb4f at pc 0x0000004d3a87 bp 0x7fff1faad6e0 sp
0x7fff1faad6d8
  READ of size 1 at 0x7fff1faadb4f thread T0
      #0 0x4d3a86 in bm_delta2_search /home/j/w/co/grep/src/kwset.c:534
      #1 0x4d4e3c in bmexec_trans /home/j/w/co/grep/src/kwset.c:663
      #2 0x4d4f49 in bmexec /home/j/w/co/grep/src/kwset.c:678
      #3 0x4d5d9e in kwsexec /home/j/w/co/grep/src/kwset.c:848
      #4 0x4d691d in Fexecute /home/j/w/co/grep/src/kwsearch.c:128
      #5 0x4b4ac9 in main /home/j/w/co/grep/src/grep.c:2518
      #6 0x7f40210110df in __libc_start_main (/lib64/libc.so.6+0x200df)
      #7 0x406dd6 (/home/j/w/co/grep/src/grep+0x406dd6)

  Address 0x7fff1faadb4f is located in stack of thread T0 at offset 31 in frame
      #0 0x4b32fe in main /home/j/w/co/grep/src/grep.c:2099

    This frame has 6 object(s):
      [32, 34) 'eolbytes' <== Memory access at offset 31 underflows
this variable
      [96, 104) 'keyalloc'

I've attached the patch I am about to push:
[0001-grep-avoid-stack-buffer-read-underrun-and-overrun.patch (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Sun, 05 Oct 2014 05:55:02 GMT) Full text and rfc822 format available.

Message #47 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: 18580 <at> debbugs.gnu.org, Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Sat, 04 Oct 2014 22:54:04 -0700
Thanks for catching that.  Obviously the patch should go in, but I'm mystified 
as to why we need two bytes' worth of sentinels after the buffer.  I suspect 
there's another bug lurking in there, related to the bugs earlier in this report.

That is, the two-byte trailing sentinel seems to be related to the problem that 
the code that calls transit_state_singlebyte can jump over two bytes when it 
should jump past just one.  The whole area is a bit of a mess.  (For example, 
transit_state_singlebyte always returns the same value -- what's up with that!?)




Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Sun, 05 Oct 2014 07:28:02 GMT) Full text and rfc822 format available.

Message #50 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 18580 <at> debbugs.gnu.org
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Sun, 05 Oct 2014 16:27:22 +0900
Jim Meyering wrote:
Actually, we need yet another byte at the end, and one more prior:

> When I built with ASAN and rawhide's gcc version 4.9.1 20140930
> (Red Hat 4.9.1-11) (GCC), using this command:
> 
>   make CFLAGS=-ggdb3 AM_CFLAGS=-fsanitize=address \
>     AM_LDFLAGS='-fsanitize=address -static-libasan' check
> 
> I saw two test failures. You can see that the first test triggers an access
> one past the end, and all others trigger an access one prior to the beginning.
> Here is a summary of the problems:

Thanks, if begline option (-x) is set, one more prior is used.  OTOH,
for an access one past the end, I believe the another patch attached
with this bug fixes it.  dfaexec for non-UTF8 locales doesn't check the
end of an input buffer and doesn't count newline correctly.





Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Sun, 05 Oct 2014 16:51:02 GMT) Full text and rfc822 format available.

Message #53 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 18580 <at> debbugs.gnu.org
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Sun, 5 Oct 2014 09:49:59 -0700
[Message part 1 (text/plain, inline)]
On Sun, Oct 5, 2014 at 12:27 AM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Jim Meyering wrote:
> Actually, we need yet another byte at the end, and one more prior:
>
>> When I built with ASAN and rawhide's gcc version 4.9.1 20140930
>> (Red Hat 4.9.1-11) (GCC), using this command:
>>
>>   make CFLAGS=-ggdb3 AM_CFLAGS=-fsanitize=address \
>>     AM_LDFLAGS='-fsanitize=address -static-libasan' check
>>
>> I saw two test failures. You can see that the first test triggers an access
>> one past the end, and all others trigger an access one prior to the beginning.
>> Here is a summary of the problems:
>
> Thanks, if begline option (-x) is set, one more prior is used.  OTOH,
> for an access one past the end, I believe the another patch attached
> with this bug fixes it.  dfaexec for non-UTF8 locales doesn't check the
> end of an input buffer and doesn't count newline correctly.

Thank you.
I have added a test case to your post-transit_state buffer-length
checking patch, and shrank the eolbytes buffer by one byte, now
that I see this patch prevents the overrun. Here's the patch I'll
push later today:
[0001-dfa-check-end-of-input-buffer-after-transition-in-no.patch (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Mon, 06 Oct 2014 00:06:02 GMT) Full text and rfc822 format available.

Message #56 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 18580 <at> debbugs.gnu.org
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Mon, 06 Oct 2014 09:05:39 +0900
Jim Meyering <jim <at> meyering.net> wrote:
> Thank you.
> I have added a test case to your post-transit_state buffer-length
> checking patch, and shrank the eolbytes buffer by one byte, now
> that I see this patch prevents the overrun. Here's the patch I'll
> push later today:

Thanks.  Could you also add a test case in a patch attached with a
following URL to it?  As it's result wrong, should be tested.

http://debbugs.gnu.org/cgi/bugreport.cgi?msg=17;bug=18580





Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Mon, 06 Oct 2014 03:08:02 GMT) Full text and rfc822 format available.

Message #59 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 18580 <at> debbugs.gnu.org
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Sun, 5 Oct 2014 20:07:30 -0700
[Message part 1 (text/plain, inline)]
On Sun, Oct 5, 2014 at 5:05 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Jim Meyering <jim <at> meyering.net> wrote:
>> Thank you.
>> I have added a test case to your post-transit_state buffer-length
>> checking patch, and shrank the eolbytes buffer by one byte, now
>> that I see this patch prevents the overrun. Here's the patch I'll
>> push later today:
>
> Thanks.  Could you also add a test case in a patch attached with a
> following URL to it?  As it's result wrong, should be tested.
>
> http://debbugs.gnu.org/cgi/bugreport.cgi?msg=17;bug=18580

Sorry I missed that message.
The commit log text is a good explanation, and deserves to be
a comment in the code, so I have written a patch in your name
to factor out those two duplicate blocks, also adding your commit
log message as a comment.  Since this commit is in your name,
I would appreciate a careful review and an explicit "ACK" (or
suggestion for correction or improvement) from you before I push it.

I will also add your test in a separate upcoming commit.
[0001-dfa-factor-out-a-new-nontrivial-block-of-duplicated-.patch (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Mon, 06 Oct 2014 04:02:01 GMT) Full text and rfc822 format available.

Message #62 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 18580 <at> debbugs.gnu.org
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Sun, 5 Oct 2014 21:00:38 -0700
[Message part 1 (text/plain, inline)]
On Sun, Oct 5, 2014 at 5:05 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Jim Meyering <jim <at> meyering.net> wrote:
>> Thank you.
>> I have added a test case to your post-transit_state buffer-length
>> checking patch, and shrank the eolbytes buffer by one byte, now
>> that I see this patch prevents the overrun. Here's the patch I'll
>> push later today:
>
> Thanks.  Could you also add a test case in a patch attached with a
> following URL to it?  As it's result wrong, should be tested.
>
> http://debbugs.gnu.org/cgi/bugreport.cgi?msg=17;bug=18580

Here is another patch in your name, adding that test.
This time, I've added a NEWS entry.
As before, please read it carefully and let me know if you
have any suggestion before I push.
[0001-dfa-test-for-just-fixed-bug.patch (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Mon, 06 Oct 2014 15:42:01 GMT) Full text and rfc822 format available.

Message #65 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 18580 <at> debbugs.gnu.org
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Tue, 07 Oct 2014 00:41:29 +0900
Jim Meyering <jim <at> meyering.net> wrote:
> Here is another patch in your name, adding that test.
> This time, I've added a NEWS entry.
> As before, please read it carefully and let me know if you
> have any suggestion before I push.

Thanks for the review and addition of NEWS entry.  I don't have any
sugestions.  Please push it.





Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Mon, 06 Oct 2014 15:53:02 GMT) Full text and rfc822 format available.

Message #68 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 18580 <at> debbugs.gnu.org
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Tue, 07 Oct 2014 00:52:32 +0900
Jim Meyering <jim <at> meyering.net> wrote:
> Sorry I missed that message.
> The commit log text is a good explanation, and deserves to be
> a comment in the code, so I have written a patch in your name
> to factor out those two duplicate blocks, also adding your commit
> log message as a comment.  Since this commit is in your name,
> I would appreciate a careful review and an explicit "ACK" (or
> suggestion for correction or improvement) from you before I push it.
> 
> I will also add your test in a separate upcoming commit.

Thanks for the review and the suggestion.  Sorry, I want to replace a
macro definition to a inilne function, but I have no idea...





Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Mon, 06 Oct 2014 19:28:01 GMT) Full text and rfc822 format available.

Message #71 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 18580 <at> debbugs.gnu.org
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Mon, 6 Oct 2014 12:27:02 -0700
On Mon, Oct 6, 2014 at 8:52 AM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Jim Meyering <jim <at> meyering.net> wrote:
>> Sorry I missed that message.
>> The commit log text is a good explanation, and deserves to be
>> a comment in the code, so I have written a patch in your name
>> to factor out those two duplicate blocks, also adding your commit
>> log message as a comment.  Since this commit is in your name,
>> I would appreciate a careful review and an explicit "ACK" (or
>> suggestion for correction or improvement) from you before I push it.
>>
>> I will also add your test in a separate upcoming commit.
>
> Thanks for the review and the suggestion.  Sorry, I want to replace a
> macro definition to a inilne function, but I have no idea...

If you don't want your name on it, let me know.
I listed you as the author mainly for the comment.
I too would have preferred an inline function, but feel that the number
of parameters would be too large. Think of this as a stopgap
measure to avoid risk of divergence in those two blocks of code.




Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Tue, 07 Oct 2014 14:43:02 GMT) Full text and rfc822 format available.

Message #74 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 18580 <at> debbugs.gnu.org
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Tue, 07 Oct 2014 23:42:17 +0900
Jim Meyering <jim <at> meyering.net> wrote:
> If you don't want your name on it, let me know.
> I listed you as the author mainly for the comment.
> I too would have preferred an inline function, but feel that the number
> of parameters would be too large. Think of this as a stopgap
> measure to avoid risk of divergence in those two blocks of code.

I agree.  I also think that the number of parameters would be too large
by using inline functions forcely.  As a result, I think your fix is
better.  Could you push it?





Information forwarded to bug-grep <at> gnu.org:
bug#18580; Package grep. (Tue, 07 Oct 2014 15:40:01 GMT) Full text and rfc822 format available.

Message #77 received at 18580 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 18580 <at> debbugs.gnu.org
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Tue, 7 Oct 2014 08:39:03 -0700
On Mon, Oct 6, 2014 at 8:41 AM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Jim Meyering <jim <at> meyering.net> wrote:
>> Here is another patch in your name, adding that test.
>> This time, I've added a NEWS entry.
>> As before, please read it carefully and let me know if you
>> have any suggestion before I push.
>
> Thanks for the review and addition of NEWS entry.  I don't have any
> sugestions.  Please push it.

I used git bisect to determine precisely where this happened:

  git bisect start v2.19 v2.18
  git bisect run sh -c 'make WERROR_CFLAGS= && test $(printf "a\naa\n"
| LC_ALL=zh_CN src/grep ..|wc -l) = 1'

It found v2.18-123-geb3292b, so I've updated the commit log message
accordingly, and will push shortly.




Reply sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
You have taken responsibility. (Wed, 08 Oct 2014 22:37:02 GMT) Full text and rfc822 format available.

Notification sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
bug acknowledged by developer. (Wed, 08 Oct 2014 22:37:02 GMT) Full text and rfc822 format available.

Message #82 received at 18580-done <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Jim Meyering <jim <at> meyering.net>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 18580-done <at> debbugs.gnu.org
Subject: Re: bug#18580: [PATCH] dfa: check end of an input buffer after a
 transition in non-UTF8 multibyte locales
Date: Thu, 09 Oct 2014 07:36:30 +0900
Jim Meyering <jim <at> meyering.net> wrote:
> I used git bisect to determine precisely where this happened:
> 
>   git bisect start v2.19 v2.18
>   git bisect run sh -c 'make WERROR_CFLAGS= && test $(printf "a\naa\n"
> | LC_ALL=zh_CN src/grep ..|wc -l) = 1'
> 
> It found v2.18-123-geb3292b, so I've updated the commit log message
> accordingly, and will push shortly.

Thanks for analysis and push.  Closed.





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 06 Nov 2014 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 172 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.