GNU bug report logs - #34525
replace-regexp missing some matches

Previous Next

Packages: emacs, cc-mode;

Reported by: Daniel Lopez <daniel.lopez999 <at> gmail.com>

Date: Mon, 18 Feb 2019 08:31:01 UTC

Severity: normal

Done: Alan Mackenzie <acm <at> muc.de>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 34525 in the body.
You can then email your comments to 34525 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#34525; Package emacs. (Mon, 18 Feb 2019 08:31:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Daniel Lopez <daniel.lopez999 <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 18 Feb 2019 08:31:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Daniel Lopez <daniel.lopez999 <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: replace-regexp missing some matches
Date: Mon, 18 Feb 2019 08:28:35 +0000
[Message part 1 (text/plain, inline)]
Reproduce:

- Start "emacs -Q" and open the file BitmapFontFace.h
- Evaluate the expression (replace-regexp "\\<Bitmap\\>" "SharedBitmap")
- The text "Replaced 8 occurrences" appears in the echo area.

Problem:

There were actually 12 occurrences (ie. of the word "Bitmap" surrounded 
by word boundaries) in the file that should have been replaced. If I now 
move point back to the start of the buffer and evaluate the expression 
again, it says "Replaced 4 occurrences".

The exact number of incorrect replacements perhaps varies over time. 
That is, I can test it five times in a row and get 8 initial replacments 
each time, but after trying some other search terms, messing with the 
file, restarting Emacs etc, I try my initial test again and then maybe 
it consistently replaces 10 the first time, for a while. So your exact 
numbers may vary.

I debugged the Lisp as far as I could and it appears to be wrong answers 
coming out of the re-search-forward C call that is in 
isearch-search-fun-default.

The bug filters up to a number of string replacement user actions - I 
first noticed it when trying to do this replacement interactively with 
query-replace on word boundaries (C-u M-%), entering "Bitmap" as search 
string, then "SharedBitmap" as replacement string. Trying now, as I 
press space repeatedly about once a second to confirm each one, I see 
the pink highlight skip valid matches to ask me about one that is 
further down even while I see the skipped one highlighted in blue a few 
lines above, and in the end it may have replaced only 6-8 of the 
occurrences. Though, if I press 'n' instead of space to skip without 
making any replacements, it does visit all of the occurrences.

I see from the Lisp that plain (non-regexp) query-replace on word 
boundaries gets preprocessed into the equivalent regexp search as in my 
initial example. I don't think there are any problems with plain string 
search and replacement.

Some more experimental observations:

- The replacement text can be any string instead of "SharedBitmap", eg. 
"qwertyasdfgh", "qwer", etc, and the bug still happens. The number of 
matches seems to be related to the length of the replacement string. 
Currently 12 character replacement strings are causing replace-regexp to 
make 8 replacements on the first call for me, while 4 character strings 
cause 7 replacements. 6 character replacement strings - ie. same length 
as "Bitmap" - always work, replacing all 12 occurrences.

- The bug doesn't happen in fundamental-mode, nor c-mode, js-mode, 
text-mode or any other major modes I tried.

- I've seen this happen in other of my C++ files where I was making the 
same replacement, so the problem's not precisely unique to this one. 
I've been trying to simplify this one but haven't found anything much 
more revealing so far. For example if I delete all the comments and 
blank lines, then the first replacement finds 9 occurrences out of 10. 
If I cut the file in half by deleting line 140 onwards, the first 
replacement finds 3 occurrences out of 6. But if I do something very 
simple like just pasting "Bitmap<PixelType>" on 100 consecutive lines, 
it's not fooled and it replaces them all.

I've tried this in GNU Emacs 26.1 on Arch Linux and 25.2.1 on Windows 7 
and am seeing the same behaviour in both.

Thanks,
Daniel
[BitmapFontFace.h (text/x-chdr, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34525; Package emacs. (Mon, 18 Feb 2019 08:40:02 GMT) Full text and rfc822 format available.

Message #8 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Lopez <daniel.lopez999 <at> gmail.com>
To: 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: Acknowledgement (replace-regexp missing some matches)
Date: Mon, 18 Feb 2019 08:37:04 +0000
> - The bug doesn't happen in fundamental-mode, nor c-mode, js-mode,
text-mode or any other major modes I tried.

Whoops, I should just clarify this - it doesn't happen in any of those 
modes, but it is happening in c++-mode.

Daniel




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34525; Package emacs. (Mon, 18 Feb 2019 15:51:01 GMT) Full text and rfc822 format available.

Message #11 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Daniel Lopez <daniel.lopez999 <at> gmail.com>, Alan Mackenzie <acm <at> muc.de>
Cc: 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Mon, 18 Feb 2019 17:50:16 +0200
> From: Daniel Lopez <daniel.lopez999 <at> gmail.com>
> Date: Mon, 18 Feb 2019 08:28:35 +0000
> 
> - Start "emacs -Q" and open the file BitmapFontFace.h
> - Evaluate the expression (replace-regexp "\\<Bitmap\\>" "SharedBitmap")
> - The text "Replaced 8 occurrences" appears in the echo area.
> 
> Problem:
> 
> There were actually 12 occurrences (ie. of the word "Bitmap" surrounded 
> by word boundaries) in the file that should have been replaced. If I now 
> move point back to the start of the buffer and evaluate the expression 
> again, it says "Replaced 4 occurrences".
> 
> The exact number of incorrect replacements perhaps varies over time. 
> That is, I can test it five times in a row and get 8 initial replacments 
> each time, but after trying some other search terms, messing with the 
> file, restarting Emacs etc, I try my initial test again and then maybe 
> it consistently replaces 10 the first time, for a while. So your exact 
> numbers may vary.

C++ Mode plays some funky games with the syntax of '<', so maybe this
is the consequence.  CC'ing Alan, in the hope that he might have
something interesting to say about this.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34525; Package emacs. (Mon, 18 Feb 2019 16:51:02 GMT) Full text and rfc822 format available.

Message #14 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Daniel Lopez <daniel.lopez999 <at> gmail.com>, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Mon, 18 Feb 2019 16:46:38 +0000
Hello, Eli.

On Mon, Feb 18, 2019 at 17:50:16 +0200, Eli Zaretskii wrote:
> > From: Daniel Lopez <daniel.lopez999 <at> gmail.com>
> > Date: Mon, 18 Feb 2019 08:28:35 +0000

> > - Start "emacs -Q" and open the file BitmapFontFace.h
> > - Evaluate the expression (replace-regexp "\\<Bitmap\\>" "SharedBitmap")
> > - The text "Replaced 8 occurrences" appears in the echo area.

> > Problem:

> > There were actually 12 occurrences (ie. of the word "Bitmap" surrounded 
> > by word boundaries) in the file that should have been replaced. If I now 
> > move point back to the start of the buffer and evaluate the expression 
> > again, it says "Replaced 4 occurrences".

> > The exact number of incorrect replacements perhaps varies over time. 
> > That is, I can test it five times in a row and get 8 initial replacments 
> > each time, but after trying some other search terms, messing with the 
> > file, restarting Emacs etc, I try my initial test again and then maybe 
> > it consistently replaces 10 the first time, for a while. So your exact 
> > numbers may vary.

> C++ Mode plays some funky games with the syntax of '<', so maybe this
> is the consequence.  CC'ing Alan, in the hope that he might have
> something interesting to say about this.

Acknowledged.  I can reproduce the error, but as yet have no idea of the
cause.  I'm looking into it.

> Thanks.

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Mon, 18 Feb 2019 21:15:01 GMT) Full text and rfc822 format available.

Message #17 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Daniel Lopez <daniel.lopez999 <at> gmail.com>, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Mon, 18 Feb 2019 21:10:26 +0000
Hello, again.

On Mon, Feb 18, 2019 at 17:50:16 +0200, Eli Zaretskii wrote:
> > From: Daniel Lopez <daniel.lopez999 <at> gmail.com>
> > Date: Mon, 18 Feb 2019 08:28:35 +0000

> > - Start "emacs -Q" and open the file BitmapFontFace.h
> > - Evaluate the expression (replace-regexp "\\<Bitmap\\>" "SharedBitmap")
> > - The text "Replaced 8 occurrences" appears in the echo area.

> > Problem:

> > There were actually 12 occurrences (ie. of the word "Bitmap" surrounded 
> > by word boundaries) in the file that should have been replaced. If I now 
> > move point back to the start of the buffer and evaluate the expression 
> > again, it says "Replaced 4 occurrences".

> > The exact number of incorrect replacements perhaps varies over time. 
> > That is, I can test it five times in a row and get 8 initial replacments 
> > each time, but after trying some other search terms, messing with the 
> > file, restarting Emacs etc, I try my initial test again and then maybe 
> > it consistently replaces 10 the first time, for a while. So your exact 
> > numbers may vary.

> C++ Mode plays some funky games with the syntax of '<', so maybe this
> is the consequence.  CC'ing Alan, in the hope that he might have
> something interesting to say about this.

I built a version of CC Mode with the use of the category properties on
< and > disabled.  This version used straight syntax-table text
properties instead.  The bug still happened.

So, although I should probably get rid of this too "clever" use of
category properties, they seem not to be the cause of this bug.

> Thanks.

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Wed, 20 Feb 2019 17:12:02 GMT) Full text and rfc822 format available.

Message #20 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Daniel Lopez <daniel.lopez999 <at> gmail.com>, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Wed, 20 Feb 2019 17:07:22 +0000
Hello again, Eli.

On Mon, Feb 18, 2019 at 17:50:16 +0200, Eli Zaretskii wrote:
> > From: Daniel Lopez <daniel.lopez999 <at> gmail.com>
> > Date: Mon, 18 Feb 2019 08:28:35 +0000

> > - Start "emacs -Q" and open the file BitmapFontFace.h
> > - Evaluate the expression (replace-regexp "\\<Bitmap\\>" "SharedBitmap")
> > - The text "Replaced 8 occurrences" appears in the echo area.

> > Problem:

> > There were actually 12 occurrences (ie. of the word "Bitmap" surrounded 
> > by word boundaries) in the file that should have been replaced. If I now 
> > move point back to the start of the buffer and evaluate the expression 
> > again, it says "Replaced 4 occurrences".

> > The exact number of incorrect replacements perhaps varies over time. 
> > That is, I can test it five times in a row and get 8 initial replacments 
> > each time, but after trying some other search terms, messing with the 
> > file, restarting Emacs etc, I try my initial test again and then maybe 
> > it consistently replaces 10 the first time, for a while. So your exact 
> > numbers may vary.

> C++ Mode plays some funky games with the syntax of '<', so maybe this
> is the consequence.  CC'ing Alan, in the hope that he might have
> something interesting to say about this.

I have a lot interesting to say about this.

Firstly, it is a difficult bug.

I see the bug in master and Emacs 26.1 but not on Emacs 25.3.  For the
latter two, I tested with both -Q and --no-desktop in the command line.
The bug would thus appear to be independent of the CC Mode version in
use.

The bug is definitely some sort of interaction between the regexp
"\\<Bitmap\\>" and the opening template delimiter < in code lines such
as:
    
    std::vector< Bitmap<PixelType> > m_bitmaps;
                       ^

.  If a space is inserted before the marked <, the bug doesn't happen.
Neither does it happen if the < is removed altogether.  That < has a
category property whose symbol contains the syntax-table property "(>
" (i.e. "open paren" syntax with matching paren being ">").

I have checked, with a throwaway testing command, that all the <s
following "Bitmap" do indeed have this syntax-table property.  I also
checked the arguments supplied to each call of re-search-forward, and
they are also always correct.

Here is a list of things which DON'T affect the bug:
(i) Building CC Mode so as not to use category text properties.
(ii) Disabling query-replace-lazy-highlight and/or
isearch-lazy-highlight.
(iii) Disabling font-locking.
(iv) Building replace.el and isearch.el without lexical binding.
(v) syntax-propertize-function, which is nil.

However, on instrumenting the two crucial functions replace-search and
isearch-search-fun-default for edebug, the bug was no longer seen.
Setting Edebug's "default mode" to "continue" (with C-x C-a C-m c) to
prevent edebug breaking at the beginning of a function, caused the bug to
be seen again.  So edebug is not of much help for this bug.

I put an invocation of `redisplay' at the top of replace-search.  This
increased the number of matches found from 6, 7, or 8 to about 10 (from a
total of 12).  On adding, additionally, an invocation of
(recenter-top-bottom 0) at the end of replace-search, all 12 matches were
found and replaced.  On removing the `redisplay' call, leaving the
recenter-top-bottom, the number of matches was only 7 or 8.

Thus, if the next occurence of Bitmap is visible on the screen when Emacs
searches for it, it seems to get found.  Sort of...  Most of the time.

Gradually increasing the argument to the above recenter-top-bottom
reduced the number of replaced matches.  With the `redisplay' there, with
(recenter-top-bottom 60), there were just 7 matches.

I think there's some unwanted interaction between redisplay, the syntax
functionality (in particular, the syntax-table text property), and the
regexp machinery.

I don't have any great ideas as how to make further progress here.
Suggestions would be welcome.

> Thanks.

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Wed, 20 Feb 2019 18:03:02 GMT) Full text and rfc822 format available.

Message #23 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alan Mackenzie <acm <at> muc.de>
Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Wed, 20 Feb 2019 20:02:45 +0200
> Date: Wed, 20 Feb 2019 17:07:22 +0000
> Cc: Daniel Lopez <daniel.lopez999 <at> gmail.com>, 34525 <at> debbugs.gnu.org
> From: Alan Mackenzie <acm <at> muc.de>
> 
> I think there's some unwanted interaction between redisplay, the syntax
> functionality (in particular, the syntax-table text property), and the
> regexp machinery.

If you enlarge the dimensions of the window and/or decrease the size
of the font, so that most or all of the text is visible at once, does
that affect the number of matches found?




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Wed, 20 Feb 2019 19:03:02 GMT) Full text and rfc822 format available.

Message #26 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Wed, 20 Feb 2019 18:58:50 +0000
Hello, Eli.

On Wed, Feb 20, 2019 at 20:02:45 +0200, Eli Zaretskii wrote:
> > Date: Wed, 20 Feb 2019 17:07:22 +0000
> > Cc: Daniel Lopez <daniel.lopez999 <at> gmail.com>, 34525 <at> debbugs.gnu.org
> > From: Alan Mackenzie <acm <at> muc.de>

> > I think there's some unwanted interaction between redisplay, the syntax
> > functionality (in particular, the syntax-table text property), and the
> > regexp machinery.

> If you enlarge the dimensions of the window and/or decrease the size
> of the font, so that most or all of the text is visible at once, does
> that affect the number of matches found?

It appears not to.  I reduced the font size from 11 to 6, and enlarged
the frame to compensate, giving a window displaying 95 lines.

C-u M-% Bitmap -> SharedBitmap, on pressing <space>s, visibly skipped
lazily highlighted occurrences of Bitmap.  In the end, just 6 matches
(out of 12) got replaced.

Just as a matter of interest, my initial observations were on a Linux
virtual terminal (with a fixed window size of 65 lines), the above was
in X-Windows.

It would seem there is more to it than redisplay.

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Wed, 20 Feb 2019 19:28:01 GMT) Full text and rfc822 format available.

Message #29 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alan Mackenzie <acm <at> muc.de>
Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Wed, 20 Feb 2019 21:27:24 +0200
> Date: Wed, 20 Feb 2019 18:58:50 +0000
> Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org
> From: Alan Mackenzie <acm <at> muc.de>
> 
> It would seem there is more to it than redisplay.

Maybe look at this from a different angle: what do we have in C++ mode
that isn't present in C mode, and could potentially affect this use
case?




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Wed, 20 Feb 2019 21:28:02 GMT) Full text and rfc822 format available.

Message #32 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Lopez <daniel.lopez999 <at> gmail.com>
To: Alan Mackenzie <acm <at> muc.de>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Wed, 20 Feb 2019 21:25:27 +0000
[Message part 1 (text/plain, inline)]
Hi Eli and Alan,

Thanks for investigating.

Don't know if this is of much help, but here's a simpler test file 
(BitmapFontFace4.h).

If I delete all the "YBitmapZ" lines so that only the full-word 
occurrences of "Bitmap" are left in the file, then C-u M-% replaces 
everything but replace-regexp doesn't still.

Daniel
[BitmapFontFace4.h (text/x-chdr, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Wed, 20 Feb 2019 21:34:02 GMT) Full text and rfc822 format available.

Message #35 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Wed, 20 Feb 2019 21:30:03 +0000
Hello, Eli.

On Wed, Feb 20, 2019 at 21:27:24 +0200, Eli Zaretskii wrote:
> > Date: Wed, 20 Feb 2019 18:58:50 +0000
> > Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org
> > From: Alan Mackenzie <acm <at> muc.de>

> > It would seem there is more to it than redisplay.

> Maybe look at this from a different angle: what do we have in C++ mode
> that isn't present in C mode, and could potentially affect this use
> case?

Well, the most obvious thing is the category text property whose value
is the symbol c-<-as-paren-syntax.  This symbol's plist is

    (risky-local-variable t syntax-table (4 . 62))

.  I can't think of anything else at the moment.

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Thu, 21 Feb 2019 03:42:01 GMT) Full text and rfc822 format available.

Message #38 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alan Mackenzie <acm <at> muc.de>
Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Thu, 21 Feb 2019 05:40:47 +0200
> Date: Wed, 20 Feb 2019 21:30:03 +0000
> Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org
> From: Alan Mackenzie <acm <at> muc.de>
> 
> > Maybe look at this from a different angle: what do we have in C++ mode
> > that isn't present in C mode, and could potentially affect this use
> > case?
> 
> Well, the most obvious thing is the category text property whose value
> is the symbol c-<-as-paren-syntax.  This symbol's plist is
> 
>     (risky-local-variable t syntax-table (4 . 62))
> 
> .  I can't think of anything else at the moment.

If you remove that, does the problem go away?




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Fri, 22 Feb 2019 16:31:01 GMT) Full text and rfc822 format available.

Message #41 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Daniel Lopez <daniel.lopez999 <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Fri, 22 Feb 2019 16:26:03 +0000
Hello, Daniel.

On Wed, Feb 20, 2019 at 21:25:27 +0000, Daniel Lopez wrote:
> Hi Eli and Alan,

> Thanks for investigating.

> Don't know if this is of much help, but here's a simpler test file 
> (BitmapFontFace4.h).

Thanks for this.  I will have a look at it later.

> If I delete all the "YBitmapZ" lines so that only the full-word 
> occurrences of "Bitmap" are left in the file, then C-u M-% replaces 
> everything but replace-regexp doesn't still.

What I have established so far is that it is the < template delimiter
interacting in some unknown way with Bitmap, which is causing the bug.
For example, when I replace "Bitmap<" with "Bitmap <", the bug doesn't
happen.  The bug doesn't happen either if these <s are deleted.

I suspect there is some sort of bug in the regexp matching engine, but I
haven't been able to pin this down, yet.

Thanks for reporting this interesting bug!

> Daniel

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Sun, 24 Feb 2019 17:43:02 GMT) Full text and rfc822 format available.

Message #44 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: daniel.lopez999 <at> gmail.com, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Sun, 24 Feb 2019 17:37:46 +0000
Hello, everybody.

On Thu, Feb 21, 2019 at 05:40:47 +0200, Eli Zaretskii wrote:
> > Date: Wed, 20 Feb 2019 21:30:03 +0000
> > Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org
> > From: Alan Mackenzie <acm <at> muc.de>

> > > Maybe look at this from a different angle: what do we have in C++ mode
> > > that isn't present in C mode, and could potentially affect this use
> > > case?

> > Well, the most obvious thing is the category text property whose value
> > is the symbol c-<-as-paren-syntax.  This symbol's plist is

> >     (risky-local-variable t syntax-table (4 . 62))

> > .  I can't think of anything else at the moment.

> If you remove that, does the problem go away?

I'm afraid I didn't get around to trying that.

But I've been busy with GDB.

The query-replace word ends up calling re-search-forward.
Fre_search_forward ends up calling re_search_2 (which is called
rpl_re_search_2 in gdb.  :-( ).

This calls re_match_2_internal, which scans through the compiled regexp,
"\<Bitmap\>".

Up till now, we have said yes to replace the first Bitmap with
SharedBitmap in query-replace.  Emacs is now seeking out the second
occurrence of Bitmap, which is on L69 of the OP's test file, and looks
like "Bitmap<", where the < has a syntax-table text property of (4 . 62),
an opening paren which matches ">".

re_natch_2_internal finds its way to case wordbeg: to handle the "\<" of
the regexp.  It invokes UPDATE_SYNTAX_TABLE (charpos) to get the syntax
for the "B" it has already found.

Sadly, UPDATE_SYNTAX_TABLE sets its internal structure gl_state not for
the current contents of position 1948, but the contents of 1948 before
the change at the top of the buffer (Bitmap -> SharedBitmap) was made.
So it picks up the syntax for the "<" rather than the "B".

Since this syntax, (4 . 62) is not the start of a word,
re_match_2_internal returns a failure result.

I think the glitch is in the text property interval handling code.  It is
as though after the replacement of Bitmap by SharedBitmap, the interval
starting positions have not been adjusted for the extra six characters.

I tested this theory by putting a space between the Bitmap and <, and
attempting a query-replace of Bitmap with 1234567Bitmap.  The error still
occurred.  In this buffer, the original replacement then appears to work.

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Sun, 24 Feb 2019 17:57:01 GMT) Full text and rfc822 format available.

Message #47 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alan Mackenzie <acm <at> muc.de>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Sun, 24 Feb 2019 19:56:13 +0200
> Date: Sun, 24 Feb 2019 17:37:46 +0000
> Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org,
>   Stefan Monnier <monnier <at> iro.umontreal.ca>
> From: Alan Mackenzie <acm <at> muc.de>
> 
> The query-replace word ends up calling re-search-forward.
> Fre_search_forward ends up calling re_search_2 (which is called
> rpl_re_search_2 in gdb.  :-( ).
> 
> This calls re_match_2_internal, which scans through the compiled regexp,
> "\<Bitmap\>".
> 
> Up till now, we have said yes to replace the first Bitmap with
> SharedBitmap in query-replace.  Emacs is now seeking out the second
> occurrence of Bitmap, which is on L69 of the OP's test file, and looks
> like "Bitmap<", where the < has a syntax-table text property of (4 . 62),
> an opening paren which matches ">".
> 
> re_natch_2_internal finds its way to case wordbeg: to handle the "\<" of
> the regexp.  It invokes UPDATE_SYNTAX_TABLE (charpos) to get the syntax
> for the "B" it has already found.
> 
> Sadly, UPDATE_SYNTAX_TABLE sets its internal structure gl_state not for
> the current contents of position 1948, but the contents of 1948 before
> the change at the top of the buffer (Bitmap -> SharedBitmap) was made.
> So it picks up the syntax for the "<" rather than the "B".

Are you saying that we've modified buffer text, but
re_match_2_internal still holds to a C pointer to buffer text before
the change?  If so, it's a simple manner of recomputing the C pointer
using the buffer position after the change, right?  We do such things
in a few places, like coding.c, by recording the offset of the text
before the change and reapplying it after the change.

> I think the glitch is in the text property interval handling code.  It is
> as though after the replacement of Bitmap by SharedBitmap, the interval
> starting positions have not been adjusted for the extra six characters.

If the code has variables that record C pointers to buffer text, those
need to be updated after every change, of else they will become
invalid.

But I'm surprised we have such blatant bugs in such veteran code, so
I'm probably missing something.  Can you describe the above again,
this time showing the relevant code fragments and variables involved
in this?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Sun, 24 Feb 2019 21:06:02 GMT) Full text and rfc822 format available.

Message #50 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Sun, 24 Feb 2019 21:00:58 +0000
Hello, Eli.

On Sun, Feb 24, 2019 at 19:56:13 +0200, Eli Zaretskii wrote:
> > Date: Sun, 24 Feb 2019 17:37:46 +0000
> > Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org,
> >   Stefan Monnier <monnier <at> iro.umontreal.ca>
> > From: Alan Mackenzie <acm <at> muc.de>

> > The query-replace word ends up calling re-search-forward.
> > Fre_search_forward ends up calling re_search_2 (which is called
> > rpl_re_search_2 in gdb.  :-( ).

> > This calls re_match_2_internal, which scans through the compiled regexp,
> > "\<Bitmap\>".

> > Up till now, we have said yes to replace the first Bitmap with
> > SharedBitmap in query-replace.  Emacs is now seeking out the second
> > occurrence of Bitmap, which is on L69 of the OP's test file, and looks
> > like "Bitmap<", where the < has a syntax-table text property of (4 . 62),
> > an opening paren which matches ">".

> > re_natch_2_internal finds its way to case wordbeg: to handle the "\<" of
> > the regexp.  It invokes UPDATE_SYNTAX_TABLE (charpos) to get the syntax
> > for the "B" it has already found.

> > Sadly, UPDATE_SYNTAX_TABLE sets its internal structure gl_state not for
> > the current contents of position 1948, but the contents of 1948 before
> > the change at the top of the buffer (Bitmap -> SharedBitmap) was made.
> > So it picks up the syntax for the "<" rather than the "B".

> Are you saying that we've modified buffer text, but
> re_match_2_internal still holds to a C pointer to buffer text before
> the change?

I don't think that's the case.  The relevant buffer pointers/sizes are
calculated (in search_buffer_re) as

    p1 = BEGV_ADDR;
    s1 = GPT_BYTE - BEGV_BYTE;
    p2 = GAP_END_ADDR;
    s2 = ZV_BYTE - GPT_BYTE;

each time before a search.

> If so, it's a simple manner of recomputing the C pointer using the
> buffer position after the change, right?  We do such things in a few
> places, like coding.c, by recording the offset of the text before the
> change and reapplying it after the change.

> > I think the glitch is in the text property interval handling code.
> > It is as though after the replacement of Bitmap by SharedBitmap, the
> > interval starting positions have not been adjusted for the extra six
> > characters.

> If the code has variables that record C pointers to buffer text, those
> need to be updated after every change, of else they will become
> invalid.

> But I'm surprised we have such blatant bugs in such veteran code, ....

The bug was introduced sometime between 25.3 and 26.1.  I tried to bisect
the commits between 25.2 and 26.1, but couldn't, because autogen.sh was
broken in lots of the pertinent commits, so I couldn't build these Emacs
versions.

> .... so I'm probably missing something.  Can you describe the above
> again, this time showing the relevant code fragments and variables
> involved in this?

I'm afraid my gdb session is too long and chaotic to extract anything
meaningful out of.  I'll have to recreate it more purposefully, to get
these results.  Not tonight!

We'll get this sorted out.

> Thanks.

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Mon, 25 Feb 2019 20:13:02 GMT) Full text and rfc822 format available.

Message #53 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alan Mackenzie <acm <at> muc.de>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Mon, 25 Feb 2019 22:11:57 +0200
> Date: Sun, 24 Feb 2019 21:00:58 +0000
> Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> From: Alan Mackenzie <acm <at> muc.de>
> 
> > > Sadly, UPDATE_SYNTAX_TABLE sets its internal structure gl_state not for
> > > the current contents of position 1948, but the contents of 1948 before
> > > the change at the top of the buffer (Bitmap -> SharedBitmap) was made.
> > > So it picks up the syntax for the "<" rather than the "B".
> 
> > Are you saying that we've modified buffer text, but
> > re_match_2_internal still holds to a C pointer to buffer text before
> > the change?
> 
> I don't think that's the case.  The relevant buffer pointers/sizes are
> calculated (in search_buffer_re) as
> 
>     p1 = BEGV_ADDR;
>     s1 = GPT_BYTE - BEGV_BYTE;
>     p2 = GAP_END_ADDR;
>     s2 = ZV_BYTE - GPT_BYTE;
> 
> each time before a search.

So you are saying that gl_state uses a stale offset, which should have
been updated due to the previous replacements?




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Mon, 25 Feb 2019 20:54:01 GMT) Full text and rfc822 format available.

Message #56 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Mon, 25 Feb 2019 20:48:58 +0000
Hello, Eli.

On Mon, Feb 25, 2019 at 22:11:57 +0200, Eli Zaretskii wrote:
> > Date: Sun, 24 Feb 2019 21:00:58 +0000
> > Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> > From: Alan Mackenzie <acm <at> muc.de>

> > > > Sadly, UPDATE_SYNTAX_TABLE sets its internal structure gl_state not for
> > > > the current contents of position 1948, but the contents of 1948 before
> > > > the change at the top of the buffer (Bitmap -> SharedBitmap) was made.
> > > > So it picks up the syntax for the "<" rather than the "B".

> > > Are you saying that we've modified buffer text, but
> > > re_match_2_internal still holds to a C pointer to buffer text before
> > > the change?

> > I don't think that's the case.  The relevant buffer pointers/sizes are
> > calculated (in search_buffer_re) as

> >     p1 = BEGV_ADDR;
> >     s1 = GPT_BYTE - BEGV_BYTE;
> >     p2 = GAP_END_ADDR;
> >     s2 = ZV_BYTE - GPT_BYTE;

> > each time before a search.

> So you are saying that gl_state uses a stale offset, which should have
> been updated due to the previous replacements?

That's what I think is happening, yes.  I need to gather more evidence,
though.

I've spent a lot of today fighting git and 2-year-old Emacs's build
systems trying to bisect the repository to find what introduced the bug.

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Tue, 26 Feb 2019 13:56:01 GMT) Full text and rfc822 format available.

Message #59 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Tue, 26 Feb 2019 13:50:48 +0000
Hello, Eli.

On Mon, Feb 25, 2019 at 22:11:57 +0200, Eli Zaretskii wrote:
> > Date: Sun, 24 Feb 2019 21:00:58 +0000
> > Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> > From: Alan Mackenzie <acm <at> muc.de>

> > > > Sadly, UPDATE_SYNTAX_TABLE sets its internal structure gl_state not for
> > > > the current contents of position 1948, but the contents of 1948 before
> > > > the change at the top of the buffer (Bitmap -> SharedBitmap) was made.
> > > > So it picks up the syntax for the "<" rather than the "B".

> > > Are you saying that we've modified buffer text, but
> > > re_match_2_internal still holds to a C pointer to buffer text before
> > > the change?

> > I don't think that's the case.  The relevant buffer pointers/sizes are
> > calculated (in search_buffer_re) as

> >     p1 = BEGV_ADDR;
> >     s1 = GPT_BYTE - BEGV_BYTE;
> >     p2 = GAP_END_ADDR;
> >     s2 = ZV_BYTE - GPT_BYTE;

> > each time before a search.

> So you are saying that gl_state uses a stale offset, which should have
> been updated due to the previous replacements?

More precisely, I think that the interval containing "Bitmap<" has not
been adjusted after the replacement of "Bitmap.h" by "SharedBitmap.h"
early in the .h file.

After this buffer change, adjust_intervals_for_insertion gets called.
This adds 6 onto the ->position field of each interval "adjusting all of
its ancestors by adding LENGTH to them", according to the comment at the
head of adjust_intervals_for_insertion.

Note this only adjusts the ancestors of that interval early in the .h
file, not all intervals in the tree.

gl_state contains a cached interval, gl_state->backward_i, and there is
no guarantee that its ->position will have been updated by
adjust_intervals_for_insertion.  In the current bug, I believe it hasn't
been adjusted.

The function update_syntax_table uses gl_state->backward_i to manoevre
its way to the current interval using update_interval.  If
gl_state->backward_i->position hasn't already been adjusted for the
insertion, the interval update_interval returns won't have been adjusted
either.

I'm reasonably sure this is what's happening:
adjust_intervals_for_insertion is failing to adjust the cached intervals
in gl_state.  It's a nasty cache invalidation problem.

I don't know how best to fix this.  Maybe a_i_f_insertion/deletion could
set a global flag which would signal to update_syntax_table that its
intervals are not reliable.  But that's horribly ugly.

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Tue, 26 Feb 2019 15:06:01 GMT) Full text and rfc822 format available.

Message #62 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Tue, 26 Feb 2019 15:00:28 +0000
Hello, again, Eli.

On Tue, Feb 26, 2019 at 13:50:48 +0000, Alan Mackenzie wrote:
> On Mon, Feb 25, 2019 at 22:11:57 +0200, Eli Zaretskii wrote:
> > > Date: Sun, 24 Feb 2019 21:00:58 +0000
> > > Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> > > From: Alan Mackenzie <acm <at> muc.de>

> > > > > Sadly, UPDATE_SYNTAX_TABLE sets its internal structure gl_state not for
> > > > > the current contents of position 1948, but the contents of 1948 before
> > > > > the change at the top of the buffer (Bitmap -> SharedBitmap) was made.
> > > > > So it picks up the syntax for the "<" rather than the "B".

> > > > Are you saying that we've modified buffer text, but
> > > > re_match_2_internal still holds to a C pointer to buffer text before
> > > > the change?

> > > I don't think that's the case.  The relevant buffer pointers/sizes are
> > > calculated (in search_buffer_re) as

> > >     p1 = BEGV_ADDR;
> > >     s1 = GPT_BYTE - BEGV_BYTE;
> > >     p2 = GAP_END_ADDR;
> > >     s2 = ZV_BYTE - GPT_BYTE;

> > > each time before a search.

> > So you are saying that gl_state uses a stale offset, which should have
> > been updated due to the previous replacements?

> More precisely, I think that the interval containing "Bitmap<" has not
> been adjusted after the replacement of "Bitmap.h" by "SharedBitmap.h"
> early in the .h file.

> After this buffer change, adjust_intervals_for_insertion gets called.
> This adds 6 onto the ->position field of each interval "adjusting all of
> its ancestors by adding LENGTH to them", according to the comment at the
> head of adjust_intervals_for_insertion.

> Note this only adjusts the ancestors of that interval early in the .h
> file, not all intervals in the tree.

> gl_state contains a cached interval, gl_state->backward_i, and there is
> no guarantee that its ->position will have been updated by
> adjust_intervals_for_insertion.  In the current bug, I believe it hasn't
> been adjusted.

> The function update_syntax_table uses gl_state->backward_i to manoevre
> its way to the current interval using update_interval.  If
> gl_state->backward_i->position hasn't already been adjusted for the
> insertion, the interval update_interval returns won't have been adjusted
> either.

> I'm reasonably sure this is what's happening:
> adjust_intervals_for_insertion is failing to adjust the cached intervals
> in gl_state.  It's a nasty cache invalidation problem.

> I don't know how best to fix this.  Maybe a_i_f_insertion/deletion could
> set a global flag which would signal to update_syntax_table that its
> intervals are not reliable.  But that's horribly ugly.

How about the following idea:
(i) We introduce a new boolean flag `adjusted' into struct interval.
(ii) When we adjust ->position in an interval in
  adjust_intervals_for_insertion/deletion, we set `adjusted' there.
(iii) At the end of a_i_f_insertion/deletion, we adjust gl_state's
  intervals, going to the parent as long as `adjusted' is not yet true.
(iv) We clear all the set `adjusted' flags.

A simpler, but slower, alternative would be to set gl_state's intervals
to NULL on any buffer change earlier in the buffer.

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Tue, 26 Feb 2019 15:37:02 GMT) Full text and rfc822 format available.

Message #65 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alan Mackenzie <acm <at> muc.de>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Tue, 26 Feb 2019 17:36:33 +0200
> Date: Tue, 26 Feb 2019 13:50:48 +0000
> Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> From: Alan Mackenzie <acm <at> muc.de>
> 
> > So you are saying that gl_state uses a stale offset, which should have
> > been updated due to the previous replacements?
> 
> More precisely, I think that the interval containing "Bitmap<" has not
> been adjusted after the replacement of "Bitmap.h" by "SharedBitmap.h"
> early in the .h file.

In general, I don't believe this can happen, because otherwise we
would see the faces in the wrong places.  The interval tree does get
updated after each buffer change.

However, if some interval data is cached, as you seem to imply:

> gl_state contains a cached interval, gl_state->backward_i, and there is
> no guarantee that its ->position will have been updated by
> adjust_intervals_for_insertion.  In the current bug, I believe it hasn't
> been adjusted.

then yes, that cached interval data might not be updated.  But I
wonder why you say "there is no guarantee" -- don't you actually see
stale data there in this scenario?

> The function update_syntax_table uses gl_state->backward_i to manoevre
> its way to the current interval using update_interval.  If
> gl_state->backward_i->position hasn't already been adjusted for the
> insertion, the interval update_interval returns won't have been adjusted
> either.
> 
> I'm reasonably sure this is what's happening:
> adjust_intervals_for_insertion is failing to adjust the cached intervals
> in gl_state.  It's a nasty cache invalidation problem.

Do we really have to guess here?  Isn't there anyone who knows how
this works?  Stefan?




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Tue, 26 Feb 2019 15:40:02 GMT) Full text and rfc822 format available.

Message #68 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alan Mackenzie <acm <at> muc.de>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Tue, 26 Feb 2019 17:39:44 +0200
> Date: Tue, 26 Feb 2019 15:00:28 +0000
> Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
> From: Alan Mackenzie <acm <at> muc.de>
> 
> A simpler, but slower, alternative would be to set gl_state's intervals
> to NULL on any buffer change earlier in the buffer.

If you implement this simpler method as an experiment, does the
problem go away?




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Tue, 26 Feb 2019 16:17:01 GMT) Full text and rfc822 format available.

Message #71 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Tue, 26 Feb 2019 16:11:29 +0000
Hello, Eli.

On Tue, Feb 26, 2019 at 17:39:44 +0200, Eli Zaretskii wrote:
> > Date: Tue, 26 Feb 2019 15:00:28 +0000
> > Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
> > From: Alan Mackenzie <acm <at> muc.de>

> > A simpler, but slower, alternative would be to set gl_state's intervals
> > to NULL on any buffer change earlier in the buffer.

> If you implement this simpler method as an experiment, does the
> problem go away?

Yes!  :-)

What I actually did was at the start of update_syntax_table, always load
the interval `i' from scratch, overwriting any stored interval cache in
gl_state.

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Tue, 26 Feb 2019 16:43:01 GMT) Full text and rfc822 format available.

Message #74 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alan Mackenzie <acm <at> muc.de>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Tue, 26 Feb 2019 18:42:36 +0200
> Date: Tue, 26 Feb 2019 16:11:29 +0000
> Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
> From: Alan Mackenzie <acm <at> muc.de>
> 
> > If you implement this simpler method as an experiment, does the
> > problem go away?
> 
> Yes!  :-)
> 
> What I actually did was at the start of update_syntax_table, always load
> the interval `i' from scratch, overwriting any stored interval cache in
> gl_state.

Can you show the patch you used for that?




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Tue, 26 Feb 2019 17:00:02 GMT) Full text and rfc822 format available.

Message #77 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Tue, 26 Feb 2019 16:55:05 +0000
Hello, Eli.

On Tue, Feb 26, 2019 at 18:42:36 +0200, Eli Zaretskii wrote:
> > Date: Tue, 26 Feb 2019 16:11:29 +0000
> > Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
> > From: Alan Mackenzie <acm <at> muc.de>

> > > If you implement this simpler method as an experiment, does the
> > > problem go away?

> > Yes!  :-)

> > What I actually did was at the start of update_syntax_table, always load
> > the interval `i' from scratch, overwriting any stored interval cache in
> > gl_state.

> Can you show the patch you used for that?

Sorry, I should have enclosed the patch last time round.


diff --git a/src/syntax.c b/src/syntax.c
index 4616ae296f..e280d6b63a 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -330,6 +330,10 @@ update_syntax_table (ptrdiff_t charpos, EMACS_INT count, bool init,
   bool invalidate = true;
   INTERVAL i;
 
+  /* TEMPORARY STUFF, 2019-02-26 */
+  i = interval_of (charpos, object);
+  gl_state.backward_i = gl_state.forward_i = i;
+  /* END OF TEMPORARY STUFF */
   if (init)
     {
       gl_state.old_prop = Qnil;


-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Tue, 26 Feb 2019 17:21:01 GMT) Full text and rfc822 format available.

Message #80 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alan Mackenzie <acm <at> muc.de>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Tue, 26 Feb 2019 19:20:03 +0200
> Date: Tue, 26 Feb 2019 16:55:05 +0000
> Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
> From: Alan Mackenzie <acm <at> muc.de>
> 
> --- a/src/syntax.c
> +++ b/src/syntax.c
> @@ -330,6 +330,10 @@ update_syntax_table (ptrdiff_t charpos, EMACS_INT count, bool init,
>    bool invalidate = true;
>    INTERVAL i;
>  
> +  /* TEMPORARY STUFF, 2019-02-26 */
> +  i = interval_of (charpos, object);
> +  gl_state.backward_i = gl_state.forward_i = i;
> +  /* END OF TEMPORARY STUFF */
>    if (init)
>      {
>        gl_state.old_prop = Qnil;

Does that slow down the search in any significant way?

In any case, this could be doe only if the buffer has been changed
since the last time the interval was cached, right?  We could even get
fancy and check whether the changes were before or after the cached
interval.




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Tue, 26 Feb 2019 17:29:02 GMT) Full text and rfc822 format available.

Message #83 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Tue, 26 Feb 2019 17:23:55 +0000
Hello, Eli

On Tue, Feb 26, 2019 at 19:20:03 +0200, Eli Zaretskii wrote:
> > Date: Tue, 26 Feb 2019 16:55:05 +0000
> > Cc: daniel.lopez999 <at> gmail.com, monnier <at> iro.umontreal.ca, 34525 <at> debbugs.gnu.org
> > From: Alan Mackenzie <acm <at> muc.de>

> > --- a/src/syntax.c
> > +++ b/src/syntax.c
> > @@ -330,6 +330,10 @@ update_syntax_table (ptrdiff_t charpos, EMACS_INT count, bool init,
> >    bool invalidate = true;
> >    INTERVAL i;

> > +  /* TEMPORARY STUFF, 2019-02-26 */
> > +  i = interval_of (charpos, object);
> > +  gl_state.backward_i = gl_state.forward_i = i;
> > +  /* END OF TEMPORARY STUFF */
> >    if (init)
> >      {
> >        gl_state.old_prop = Qnil;

> Does that slow down the search in any significant way?

I think it does.  Hitting the space bar between the occurrences of Bitmap
in C-u M-% feels somewhat sluggish.  But I'm also running on an
unoptimised build, which I'm not used to.

> In any case, this could be done only if the buffer has been changed
> since the last time the interval was cached, right?

I just tried that, and the bug symptoms reappeared again.  It appears to
be a bit more subtle than I thought.  But I think that should be doable.

> We could even get fancy and check whether the changes were before or
> after the cached interval.

Indeed.

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Tue, 26 Feb 2019 20:10:01 GMT) Full text and rfc822 format available.

Message #86 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: Alan Mackenzie <acm <at> muc.de>
Cc: Eli Zaretskii <eliz <at> gnu.org>, daniel.lopez999 <at> gmail.com,
 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Tue, 26 Feb 2019 15:09:54 -0500
> gl_state contains a cached interval, gl_state->backward_i, and there
> is no guarantee that its ->position will have been updated by
> adjust_intervals_for_insertion.  In the current bug, I believe it
> hasn't been adjusted.

Hmm... gl_state is not supposed to be kept "live" across buffer
modifications.  It's supposed to be used only *within* read-only
primitives which set it from scratch at the beginning (by calling
SETUP_SYNTAX_TABLE, SETUP_BUFFER_SYNTAX_TABLE, or
SETUP_SYNTAX_TABLE_FOR_OBJECT).  The backward_i and forward_i fields are
actually reset in the first call to update_syntax_table, by passing it
a true value for the `init` arg.

So the problem you describe might be due to some place where we fail to
reset gl_state before using it, or maybe it's a bug in
SETUP_*_SYNTAX_TABLE*


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Tue, 26 Feb 2019 21:50:01 GMT) Full text and rfc822 format available.

Message #89 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
Cc: Eli Zaretskii <eliz <at> gnu.org>, daniel.lopez999 <at> gmail.com,
 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Tue, 26 Feb 2019 21:45:09 +0000
Hello, Stefan.

On Tue, Feb 26, 2019 at 15:09:54 -0500, Stefan Monnier wrote:
> > gl_state contains a cached interval, gl_state->backward_i, and there
> > is no guarantee that its ->position will have been updated by
> > adjust_intervals_for_insertion.  In the current bug, I believe it
> > hasn't been adjusted.

> Hmm... gl_state is not supposed to be kept "live" across buffer
> modifications.  It's supposed to be used only *within* read-only
> primitives which set it from scratch at the beginning (by calling
> SETUP_SYNTAX_TABLE, SETUP_BUFFER_SYNTAX_TABLE, or
> SETUP_SYNTAX_TABLE_FOR_OBJECT).  The backward_i and forward_i fields are
> actually reset in the first call to update_syntax_table, by passing it
> a true value for the `init` arg.

> So the problem you describe might be due to some place where we fail to
> reset gl_state before using it, or maybe it's a bug in
> SETUP_*_SYNTAX_TABLE*

re_search_2 calls SETUP_SYNTAX_TABLE_FOR_OBJECT unconditionally near its
start.  S_S_T_F_O calls update_syntax_table with a non-zero `init'
conditioned only on parse_sexp_lookup_properties.  This initialises
gl_state.backward_i and gl_state.forward_i.

So, I agree with you, what I am seeing is impossible.  I'm seeing it,
though.

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Tue, 26 Feb 2019 22:11:01 GMT) Full text and rfc822 format available.

Message #92 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: Alan Mackenzie <acm <at> muc.de>
Cc: Eli Zaretskii <eliz <at> gnu.org>, daniel.lopez999 <at> gmail.com,
 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Tue, 26 Feb 2019 17:09:59 -0500
>> > gl_state contains a cached interval, gl_state->backward_i, and there
>> > is no guarantee that its ->position will have been updated by
>> > adjust_intervals_for_insertion.  In the current bug, I believe it
>> > hasn't been adjusted.
>
>> Hmm... gl_state is not supposed to be kept "live" across buffer
>> modifications.  It's supposed to be used only *within* read-only
>> primitives which set it from scratch at the beginning (by calling
>> SETUP_SYNTAX_TABLE, SETUP_BUFFER_SYNTAX_TABLE, or
>> SETUP_SYNTAX_TABLE_FOR_OBJECT).  The backward_i and forward_i fields are
>> actually reset in the first call to update_syntax_table, by passing it
>> a true value for the `init` arg.
>
>> So the problem you describe might be due to some place where we fail to
>> reset gl_state before using it, or maybe it's a bug in
>> SETUP_*_SYNTAX_TABLE*
>
> re_search_2 calls SETUP_SYNTAX_TABLE_FOR_OBJECT unconditionally near its
> start.  S_S_T_F_O calls update_syntax_table with a non-zero `init'
> conditioned only on parse_sexp_lookup_properties.  This initialises
> gl_state.backward_i and gl_state.forward_i.
>
> So, I agree with you, what I am seeing is impossible.  I'm seeing it,
> though.

Any chance the buffer is modified from within the regexp operation?
Or maybe some async thread?

This said, the more common problem is that UPDATE_SYNTAX_TABLE was not
called, or not for the right position, or UPDATE_SYNTAX_TABLE_FORWARD
was called when we actually moved (ever so slightly) backward or
vice-versa.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Tue, 26 Feb 2019 23:02:02 GMT) Full text and rfc822 format available.

Message #95 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: Alan Mackenzie <acm <at> muc.de>
Cc: Eli Zaretskii <eliz <at> gnu.org>, daniel.lopez999 <at> gmail.com,
 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Tue, 26 Feb 2019 18:00:59 -0500
> After this buffer change, adjust_intervals_for_insertion gets called.
> This adds 6 onto the ->position field of each interval "adjusting all of
> its ancestors by adding LENGTH to them", according to the comment at the
> head of adjust_intervals_for_insertion.
>
> Note this only adjusts the ancestors of that interval early in the .h
> file, not all intervals in the tree.

Maybe the problem comes from here.  The lazy adjustment of ->position
may be too lazy here.  This is a fairly delicate/brittle part of intervals.c

> I'm reasonably sure this is what's happening:
> adjust_intervals_for_insertion is failing to adjust the cached intervals
> in gl_state.  It's a nasty cache invalidation problem.

As mentioned earlier, I think gl_state should be sufficiently transient
that this scenario can't happen.

Elsewhere you send a sample patch and Eli asks "Does that slow down the
search in any significant way?".  The answer should be "yes!" because
this patch pretty much defeats the purpose of the gl_state cache,
recomputing the current interval (O(logN)) every time we move from one
character to the next.

Maybe another way to catch the problem is: at the end of
upate_syntax_table, call `interval_of (charpos, object)` and verify that
the returned interval is the same as the one we had selected, plus
verify that the call to `interval_of` has not modified the value
of ->position.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Wed, 27 Feb 2019 14:28:01 GMT) Full text and rfc822 format available.

Message #98 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
Cc: Eli Zaretskii <eliz <at> gnu.org>, daniel.lopez999 <at> gmail.com,
 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Wed, 27 Feb 2019 14:22:51 +0000
Hello, Stefan.

On Tue, Feb 26, 2019 at 15:09:54 -0500, Stefan Monnier wrote:
> > gl_state contains a cached interval, gl_state->backward_i, and there
> > is no guarantee that its ->position will have been updated by
> > adjust_intervals_for_insertion.  In the current bug, I believe it
> > hasn't been adjusted.

> Hmm... gl_state is not supposed to be kept "live" across buffer
> modifications.  It's supposed to be used only *within* read-only
> primitives which set it from scratch at the beginning (by calling
> SETUP_SYNTAX_TABLE, SETUP_BUFFER_SYNTAX_TABLE, or
> SETUP_SYNTAX_TABLE_FOR_OBJECT).  The backward_i and forward_i fields are
> actually reset in the first call to update_syntax_table, by passing it
> a true value for the `init` arg.

> So the problem you describe might be due to some place where we fail to
> reset gl_state before using it, or maybe it's a bug in
> SETUP_*_SYNTAX_TABLE*

I see another potential problem, and I'd like your view on it, please.

Namely, in next_interval, we have

  if (! NULL_RIGHT_CHILD (i))
    {
      i = i->right;
      while (! NULL_LEFT_CHILD (i))
        i = i->left;                  <===============

      i->position = next_position;
      return i;
    }

Here, in seeking the next interval, we go down a chain of `left's.  We
do not set the ->position field of these intervals, except for the last
one, which we return.

So the returned interval doesn't satisfy the condition that all its
parents have their ->position's set correctly.  Thus if we use this
interval as an argument to update_interval, we will likely fail.  I
think this can happen in update_syntax_table.

There is an analogous situation in previous_interval.

I might try adding code to this to set these ->position's.  Trouble is,
it might slow things down quite a bit.

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Wed, 27 Feb 2019 15:14:02 GMT) Full text and rfc822 format available.

Message #101 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
Cc: Eli Zaretskii <eliz <at> gnu.org>, daniel.lopez999 <at> gmail.com,
 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Wed, 27 Feb 2019 15:08:49 +0000
Hello again, Stefan.

On Wed, Feb 27, 2019 at 14:22:51 +0000, Alan Mackenzie wrote:
> On Tue, Feb 26, 2019 at 15:09:54 -0500, Stefan Monnier wrote:
> > > gl_state contains a cached interval, gl_state->backward_i, and there
> > > is no guarantee that its ->position will have been updated by
> > > adjust_intervals_for_insertion.  In the current bug, I believe it
> > > hasn't been adjusted.

> > Hmm... gl_state is not supposed to be kept "live" across buffer
> > modifications.  It's supposed to be used only *within* read-only
> > primitives which set it from scratch at the beginning (by calling
> > SETUP_SYNTAX_TABLE, SETUP_BUFFER_SYNTAX_TABLE, or
> > SETUP_SYNTAX_TABLE_FOR_OBJECT).  The backward_i and forward_i fields are
> > actually reset in the first call to update_syntax_table, by passing it
> > a true value for the `init` arg.

> > So the problem you describe might be due to some place where we fail to
> > reset gl_state before using it, or maybe it's a bug in
> > SETUP_*_SYNTAX_TABLE*

> I see another potential problem, and I'd like your view on it, please.

> Namely, in next_interval, we have

>   if (! NULL_RIGHT_CHILD (i))
>     {
>       i = i->right;
>       while (! NULL_LEFT_CHILD (i))
>         i = i->left;                  <===============

>       i->position = next_position;
>       return i;
>     }

> Here, in seeking the next interval, we go down a chain of `left's.  We
> do not set the ->position field of these intervals, except for the last
> one, which we return.

> So the returned interval doesn't satisfy the condition that all its
> parents have their ->position's set correctly.  Thus if we use this
> interval as an argument to update_interval, we will likely fail.  I
> think this can happen in update_syntax_table.

> There is an analogous situation in previous_interval.

> I might try adding code to this to set these ->position's.  Trouble is,
> it might slow things down quite a bit.

I've done this, and it appears to have fixed the bug.  :-)  As for the
slowdown, I haven't timed it, yet.

Here is the diff of the current state of my changes:



diff --git a/src/intervals.c b/src/intervals.c
index 524bb944e5..d37ca64bd0 100644
--- a/src/intervals.c
+++ b/src/intervals.c
@@ -654,7 +654,14 @@ next_interval (register INTERVAL interval)
     {
       i = i->right;
       while (! NULL_LEFT_CHILD (i))
-	i = i->left;
+        /* OLD STOUGH, 2019-02-27 */
+	/* i = i->left; */
+        /* NEW STOUGH, 2019-02-27 */
+        {
+          i->position = next_position + LEFT_TOTAL_LENGTH (i);
+          i = i->left;
+        }
+      /* END OF NEW STOUGH */
 
       i->position = next_position;
       return i;
@@ -691,7 +698,15 @@ previous_interval (register INTERVAL interval)
     {
       i = interval->left;
       while (! NULL_RIGHT_CHILD (i))
-	i = i->right;
+	/* OLD STOUGH, 2019-02-27 */
+        /* i = i->right; */
+        /* NEW STOUGH, 2019-02-27 */
+        {
+          i->position = interval->position - TOTAL_LENGTH (i)
+            + LEFT_TOTAL_LENGTH(i);
+          i = i->right;
+        }
+      /* END OF NEW STOUGH */
 
       i->position = interval->position - LENGTH (i);
       return i;


> >         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Wed, 27 Feb 2019 15:41:02 GMT) Full text and rfc822 format available.

Message #104 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: Alan Mackenzie <acm <at> muc.de>
Cc: Eli Zaretskii <eliz <at> gnu.org>, daniel.lopez999 <at> gmail.com,
 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Wed, 27 Feb 2019 10:40:19 -0500
>> Here, in seeking the next interval, we go down a chain of `left's.  We
>> do not set the ->position field of these intervals, except for the last
>> one, which we return.
>> So the returned interval doesn't satisfy the condition that all its
>> parents have their ->position's set correctly.
[...]
> I've done this, and it appears to have fixed the bug.  :-)

AFAICT the only place where we need the parents to have
a valid ->position is in update_interval.  So maybe another fix is to
change update_interval so it computes the parent's ->position rather
than rely on it having the right value.

I personally don't have a preference and I'm not sure which option would
be better performancewise.

If we opt (like your patch does) to have the invariant that
the ->position of parents is kept up-to-date, then maybe we should
change find_interval to guarantee this (which would basically be
a matter of moving the corresponding code from update_syntax_table where
we update the parents's ->position after calling find_interval) ?


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Wed, 27 Feb 2019 16:40:01 GMT) Full text and rfc822 format available.

Message #107 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alan Mackenzie <acm <at> muc.de>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> IRO.UMontreal.CA, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Wed, 27 Feb 2019 18:39:31 +0200
> Date: Wed, 27 Feb 2019 14:22:51 +0000
> Cc: Eli Zaretskii <eliz <at> gnu.org>, daniel.lopez999 <at> gmail.com,
>   34525 <at> debbugs.gnu.org
> From: Alan Mackenzie <acm <at> muc.de>
> 
>   if (! NULL_RIGHT_CHILD (i))
>     {
>       i = i->right;
>       while (! NULL_LEFT_CHILD (i))
>         i = i->left;                  <===============
> 
>       i->position = next_position;
>       return i;
>     }
> 
> Here, in seeking the next interval, we go down a chain of `left's.  We
> do not set the ->position field of these intervals, except for the last
> one, which we return.

The position field is just a cache, isn't it?

> So the returned interval doesn't satisfy the condition that all its
> parents have their ->position's set correctly.  Thus if we use this
> interval as an argument to update_interval, we will likely fail.  I
> think this can happen in update_syntax_table.

next_interval and previous_interval are used extensively, so I'm
having hard time believing that they have such a blatant bug.




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Wed, 27 Feb 2019 17:16:01 GMT) Full text and rfc822 format available.

Message #110 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
Cc: Eli Zaretskii <eliz <at> gnu.org>, daniel.lopez999 <at> gmail.com,
 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Wed, 27 Feb 2019 17:10:36 +0000
Hello, Stefan.

On Wed, Feb 27, 2019 at 10:40:19 -0500, Stefan Monnier wrote:
> >> Here, in seeking the next interval, we go down a chain of `left's.  We
> >> do not set the ->position field of these intervals, except for the last
> >> one, which we return.
> >> So the returned interval doesn't satisfy the condition that all its
> >> parents have their ->position's set correctly.
> [...]
> > I've done this, and it appears to have fixed the bug.  :-)

> AFAICT the only place where we need the parents to have
> a valid ->position is in update_interval.  So maybe another fix is to
> change update_interval so it computes the parent's ->position rather
> than rely on it having the right value.

I'll think about this.

> I personally don't have a preference and I'm not sure which option would
> be better performancewise.

I've done some speed testing with my function M-: (time-scroll), which
scrolls through a buffer a screenful at a time, redisplaying each place
it stops.

On xdisp.c, there was no detectable difference between versions with the
bug fix and without.

On a largish C++ file with lots of template delimiters, the corrected
version was about 4% slower on unoptimised builds.  Between comparable
optimised builds, the differences were not detectable.

> If we opt (like your patch does) to have the invariant that
> the ->position of parents is kept up-to-date, then maybe we should
> change find_interval to guarantee this (which would basically be
> a matter of moving the corresponding code from update_syntax_table where
> we update the parents's ->position after calling find_interval) ?

This would be an excellent idea, something I was going to suggest
myself.  :-)

>         Stefan

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Wed, 27 Feb 2019 17:37:02 GMT) Full text and rfc822 format available.

Message #113 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> IRO.UMontreal.CA, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Wed, 27 Feb 2019 17:31:32 +0000
Hello, Eli.

On Wed, Feb 27, 2019 at 18:39:31 +0200, Eli Zaretskii wrote:
> > Date: Wed, 27 Feb 2019 14:22:51 +0000
> > Cc: Eli Zaretskii <eliz <at> gnu.org>, daniel.lopez999 <at> gmail.com,
> >   34525 <at> debbugs.gnu.org
> > From: Alan Mackenzie <acm <at> muc.de>

> >   if (! NULL_RIGHT_CHILD (i))
> >     {
> >       i = i->right;
> >       while (! NULL_LEFT_CHILD (i))
> >         i = i->left;                  <===============

> >       i->position = next_position;
> >       return i;
> >     }

> > Here, in seeking the next interval, we go down a chain of `left's.  We
> > do not set the ->position field of these intervals, except for the last
> > one, which we return.

> The position field is just a cache, isn't it?

It's a cache, yes.  But it's used in update_interval, for example.  It
would appear bad things were happening because it wasn't being kept up to
date.

> > So the returned interval doesn't satisfy the condition that all its
> > parents have their ->position's set correctly.  Thus if we use this
> > interval as an argument to update_interval, we will likely fail.  I
> > think this can happen in update_syntax_table.

> next_interval and previous_interval are used extensively, so I'm
> having hard time believing that they have such a blatant bug.

Yes, how come we haven't seen many more consequences?

Maybe syntax-table text properties aren't as widely used as all that.

Also, the effect would have to be seen within the time between successive
calls to SETUP_SYNTAX_TABLE and friends, since each such call
reinitialises this cache, in an important sense.

And we only saw the effect when the replacement text "SharedBitmap" was
exactly twice the length of the original word "Bitmap".

Anyhow, fixing this item appears to fix the bug (see the tentative patch
in my post from 15:08:49 +0000 to Stefan).

I'm guessing this fix will be deemed too unsafe to make it into the
emacs-26 release branch.  ;-(

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Wed, 27 Feb 2019 17:42:02 GMT) Full text and rfc822 format available.

Message #116 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Alan Mackenzie <acm <at> muc.de>, daniel.lopez999 <at> gmail.com,
 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Wed, 27 Feb 2019 12:41:04 -0500
> next_interval and previous_interval are used extensively, so I'm
> having hard time believing that they have such a blatant bug.

I'm also wondering why this hasn't bitten us long ago, but the behavior
in the original bug-report is definitely weird.

E.g. I reproduced the bug using the lower-level (while
(re-search-forward RE) (replace-match)), and then added (how-many RE)
calls before re-search-forward and before replace-match: these should
always differ by 1 (since one occurrence of RE was skipped by
re-search-forward), but they often didn't (even though there was no
buffer modifications between the two how-many calls).

AFAICT the only place where the missing updates can bite us is when we
call update_interval, since it seems to be the only function that relies
on all parents having the ->position field correctly set.

update_interval is only called from update_syntax_table.

I'm actually wondering whether we should keep update_interval at all:
AFAICT update_syntax_table is almost always called "sequentially".
I.e. the new `charpos` is right next to the old one.  So a while loop
with next_interval/previous_interval should be just as efficient in
practice: a loop of next_interval/previous_interval has basically
a complexity O(n) where `n` is the distance we move, whereas
update_interval has complexity O(log n), so if `n` is almost always
1 the difference doesn't matter.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Wed, 27 Feb 2019 18:08:01 GMT) Full text and rfc822 format available.

Message #119 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alan Mackenzie <acm <at> muc.de>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> IRO.UMontreal.CA, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Wed, 27 Feb 2019 20:07:50 +0200
> Date: Wed, 27 Feb 2019 17:31:32 +0000
> Cc: monnier <at> IRO.UMontreal.CA, daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org
> From: Alan Mackenzie <acm <at> muc.de>
> 
> > next_interval and previous_interval are used extensively, so I'm
> > having hard time believing that they have such a blatant bug.
> 
> Yes, how come we haven't seen many more consequences?
> 
> Maybe syntax-table text properties aren't as widely used as all that.

I actually had the text properties in mind.  They are used all over
the place.  Faces are a feature that would make any such bugs
immediately visible.

> I'm guessing this fix will be deemed too unsafe to make it into the
> emacs-26 release branch.  ;-(

That goes without saying.




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Wed, 27 Feb 2019 18:49:02 GMT) Full text and rfc822 format available.

Message #122 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
Cc: acm <at> muc.de, daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Wed, 27 Feb 2019 20:48:05 +0200
> From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
> Cc: Alan Mackenzie <acm <at> muc.de>, daniel.lopez999 <at> gmail.com,
>         34525 <at> debbugs.gnu.org
> Date: Wed, 27 Feb 2019 12:41:04 -0500
> 
> AFAICT the only place where the missing updates can bite us is when we
> call update_interval, since it seems to be the only function that relies
> on all parents having the ->position field correctly set.
> 
> update_interval is only called from update_syntax_table.

That I could understand.  But then the fix should be in that one
function (if we keep it).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Wed, 27 Feb 2019 20:49:02 GMT) Full text and rfc822 format available.

Message #125 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: daniel.lopez999 <at> gmail.com, Stefan Monnier <monnier <at> IRO.UMontreal.CA>,
 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Wed, 27 Feb 2019 20:43:21 +0000
Hello, Eli.

On Wed, Feb 27, 2019 at 20:48:05 +0200, Eli Zaretskii wrote:
> > From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
> > Cc: Alan Mackenzie <acm <at> muc.de>, daniel.lopez999 <at> gmail.com,
> >         34525 <at> debbugs.gnu.org
> > Date: Wed, 27 Feb 2019 12:41:04 -0500

> > AFAICT the only place where the missing updates can bite us is when we
> > call update_interval, since it seems to be the only function that relies
> > on all parents having the ->position field correctly set.

> > update_interval is only called from update_syntax_table.

> That I could understand.  But then the fix should be in that one
> function (if we keep it).

It would be easy enough to set the ->position of a node when moving to it
from a child in update_literal.  This would be an alternative to the
changes I've suggested in next_interval and previous_interval.

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Thu, 28 Feb 2019 10:56:02 GMT) Full text and rfc822 format available.

Message #128 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> IRO.UMontreal.CA>
Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Thu, 28 Feb 2019 10:50:25 +0000
Hello, Eli and Stefan.

On Wed, Feb 27, 2019 at 20:07:50 +0200, Eli Zaretskii wrote:
> > Date: Wed, 27 Feb 2019 17:31:32 +0000
> > Cc: monnier <at> IRO.UMontreal.CA, daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org
> > From: Alan Mackenzie <acm <at> muc.de>

> > > next_interval and previous_interval are used extensively, so I'm
> > > having hard time believing that they have such a blatant bug.

> > Yes, how come we haven't seen many more consequences?

> > Maybe syntax-table text properties aren't as widely used as all that.

> I actually had the text properties in mind.  They are used all over
> the place.  Faces are a feature that would make any such bugs
> immediately visible.

I think the mechanism with update_interval is only used in syntax.c, and
only for the syntax-table property.

> > I'm guessing this fix will be deemed too unsafe to make it into the
> > emacs-26 release branch.  ;-(

> That goes without saying.

OK.  Progress on the bug seems to have stalled somewhat.  Now that we
understand the cause, one of us needs to decide how to fix it.  I think
we've discussed the following alternatives:

(i) Calculate ->position's in previous_interval and next_interval, as my
  tentative patch already does.
(ii) Calculate the ->position's in update_interval, on moving to
  parents.
(iii) Do away with update_interval, replacing it in syntax.c with
  previous/next_interval in while loops.

At the moment, only (i) has been tried.

Speed-wise, it seems not to make any difference in an optimised GNU
build, though it did appear to be significantly (~4%) slower on an
unoptimised build which scrolls through a C++ file with lots of
templates.  I don't think it's worth the effort to make a systematic
speed comparison between the alternatives.

(iv) Additionally, there is a cleanup wanted, where setting ->position
  in the chain of parents should be moved from update_syntax_table to
  find_interval.

In (i), the convention for ->position would be that it is valid for the
target interval together with all its parents.  In (ii) and (iii), it
would only be valid in the final target intervals found by navigation.
I think this should be explicitly stated in a comment in struct
interval.

So, where do we go from here?  If it were up to me, I would probably
chose (i), simply because it's already been done, but I've no strong
feelings over it.  I'm also willing to implement (ii), or even (iii), if
that is wanted.  In any case, I would ask one of you for a careful code
review of my changes - this stuff is easy to get wrong.

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Thu, 28 Feb 2019 17:42:02 GMT) Full text and rfc822 format available.

Message #131 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alan Mackenzie <acm <at> muc.de>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> IRO.UMontreal.CA, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Thu, 28 Feb 2019 19:41:12 +0200
> Date: Thu, 28 Feb 2019 10:50:25 +0000
> Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org
> From: Alan Mackenzie <acm <at> muc.de>
> 
> (i) Calculate ->position's in previous_interval and next_interval, as my
>   tentative patch already does.
> (ii) Calculate the ->position's in update_interval, on moving to
>   parents.
> (iii) Do away with update_interval, replacing it in syntax.c with
>   previous/next_interval in while loops.
> 
> At the moment, only (i) has been tried.
> 
> Speed-wise, it seems not to make any difference in an optimised GNU
> build, though it did appear to be significantly (~4%) slower on an
> unoptimised build which scrolls through a C++ file with lots of
> templates.  I don't think it's worth the effort to make a systematic
> speed comparison between the alternatives.
> 
> (iv) Additionally, there is a cleanup wanted, where setting ->position
>   in the chain of parents should be moved from update_syntax_table to
>   find_interval.
> 
> In (i), the convention for ->position would be that it is valid for the
> target interval together with all its parents.  In (ii) and (iii), it
> would only be valid in the final target intervals found by navigation.
> I think this should be explicitly stated in a comment in struct
> interval.
> 
> So, where do we go from here?  If it were up to me, I would probably
> chose (i), simply because it's already been done, but I've no strong
> feelings over it.

I prefer not to do (i) because it has much wider implications than
needed.  Either (ii) or (iii) are okay with me.  The former seems to
be simpler, so I tend to favor it slightly.




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Thu, 28 Feb 2019 22:00:02 GMT) Full text and rfc822 format available.

Message #134 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: daniel.lopez999 <at> gmail.com, monnier <at> IRO.UMontreal.CA, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Thu, 28 Feb 2019 21:54:14 +0000
Hello, Eli.

On Thu, Feb 28, 2019 at 19:41:12 +0200, Eli Zaretskii wrote:
> > Date: Thu, 28 Feb 2019 10:50:25 +0000
> > Cc: daniel.lopez999 <at> gmail.com, 34525 <at> debbugs.gnu.org
> > From: Alan Mackenzie <acm <at> muc.de>

> > (i) Calculate ->position's in previous_interval and next_interval, as my
> >   tentative patch already does.
> > (ii) Calculate the ->position's in update_interval, on moving to
> >   parents.
> > (iii) Do away with update_interval, replacing it in syntax.c with
> >   previous/next_interval in while loops.

> > In (i), the convention for ->position would be that it is valid for the
> > target interval together with all its parents.  In (ii) and (iii), it
> > would only be valid in the final target intervals found by navigation.
> > I think this should be explicitly stated in a comment in struct
> > interval.

Done.

> > So, where do we go from here?  If it were up to me, I would probably
> > chose (i), simply because it's already been done, but I've no strong
> > feelings over it.

> I prefer not to do (i) because it has much wider implications than
> needed.  Either (ii) or (iii) are okay with me.  The former seems to
> be simpler, so I tend to favor it slightly.

OK, I enclose a patch which codes up (ii).  As a matter of interest, it
seems to run a little faster on my benchmark of scrolling through
xdisp.c than the version without the fix.

And it fixes the OP's bug.  :-)



diff --git a/src/intervals.c b/src/intervals.c
index 524bb944e5..2ed913d5fb 100644
--- a/src/intervals.c
+++ b/src/intervals.c
@@ -713,11 +713,21 @@ previous_interval (register INTERVAL interval)
   return NULL;
 }
 
-/* Find the interval containing POS given some non-NULL INTERVAL
-   in the same tree.  Note that we need to update interval->position
-   if we go down the tree.
-   To speed up the process, we assume that the ->position of
-   I and all its parents is already uptodate.  */
+/* Set the ->position field of I's parent, based on I->position. */
+#define SET_PARENT_POSITION(i)                                  \
+  if (AM_LEFT_CHILD (i))                                        \
+    INTERVAL_PARENT (i)->position =                             \
+      i->position + TOTAL_LENGTH (i) - LEFT_TOTAL_LENGTH (i);   \
+  else                                                          \
+    INTERVAL_PARENT (i)->position =                             \
+      i->position - LEFT_TOTAL_LENGTH (i)                       \
+      - LENGTH (INTERVAL_PARENT (i))
+
+/* Find the interval containing POS given some non-NULL INTERVAL in
+   the same tree.  Note that we update interval->position in each
+   interval we traverse, assuming it is already correctly set for the
+   argument I.  We don't assume that any other interval already has a
+   correctly set ->position.  */
 INTERVAL
 update_interval (register INTERVAL i, ptrdiff_t pos)
 {
@@ -738,7 +748,10 @@ update_interval (register INTERVAL i, ptrdiff_t pos)
 	  else if (NULL_PARENT (i))
 	    error ("Point before start of properties");
 	  else
-	      i = INTERVAL_PARENT (i);
+            {
+              SET_PARENT_POSITION (i);
+              i = INTERVAL_PARENT (i);
+            }
 	  continue;
 	}
       else if (pos >= INTERVAL_LAST_POS (i))
@@ -753,7 +766,10 @@ update_interval (register INTERVAL i, ptrdiff_t pos)
 	  else if (NULL_PARENT (i))
 	    error ("Point %"pD"d after end of properties", pos);
 	  else
-            i = INTERVAL_PARENT (i);
+            {
+              SET_PARENT_POSITION (i);
+              i = INTERVAL_PARENT (i);
+            }
 	  continue;
 	}
       else
diff --git a/src/intervals.h b/src/intervals.h
index 9c5adf33a1..7608800116 100644
--- a/src/intervals.h
+++ b/src/intervals.h
@@ -31,11 +31,15 @@ struct interval
   /* The first group of entries deal with the tree structure.  */
   ptrdiff_t total_length;       /* Length of myself and both children.  */
   ptrdiff_t position;	        /* Cache of interval's character position.  */
-				/* This field is usually updated
-				   simultaneously with an interval
-				   traversal, there is no guarantee
-				   that it is valid for a random
-				   interval.  */
+                                /* This field is valid for the final
+                                   target interval returned by
+                                   find_interval, next_interval,
+                                   previous_interval and
+                                   update_interval.  It cannot be
+                                   depended upon for any intermediate
+                                   intevals traversed by these
+                                   functions, or any other
+                                   interval. */
   struct interval *left;	/* Intervals which precede me.  */
   struct interval *right;	/* Intervals which succeed me.  */
 
diff --git a/src/pdumper.c b/src/pdumper.c
index f9638d4357..3aea4ab0d6 100644
--- a/src/pdumper.c
+++ b/src/pdumper.c
@@ -2065,7 +2065,7 @@ dump_interval_tree (struct dump_context *ctx,
                     INTERVAL tree,
                     dump_off parent_offset)
 {
-#if CHECK_STRUCTS && !defined (HASH_interval_9110163DA0)
+#if CHECK_STRUCTS && !defined (HASH_interval_70865541E2)
 # error "interval changed. See CHECK_STRUCTS comment."
 #endif
   // TODO: output tree breadth-first?
diff --git a/src/syntax.c b/src/syntax.c
index 4616ae296f..faea1432cb 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -340,20 +340,6 @@ update_syntax_table (ptrdiff_t charpos, EMACS_INT count, bool init,
       invalidate = false;
       if (!i)
 	return;
-      /* interval_of updates only ->position of the return value, so
-	 update the parents manually to speed up update_interval.  */
-      while (!NULL_PARENT (i))
-	{
-	  if (AM_RIGHT_CHILD (i))
-	    INTERVAL_PARENT (i)->position = i->position
-	      - LEFT_TOTAL_LENGTH (i) + TOTAL_LENGTH (i) /* right end */
-	      - TOTAL_LENGTH (INTERVAL_PARENT (i))
-	      + LEFT_TOTAL_LENGTH (INTERVAL_PARENT (i));
-	  else
-	    INTERVAL_PARENT (i)->position = i->position - LEFT_TOTAL_LENGTH (i)
-	      + TOTAL_LENGTH (i);
-	  i = INTERVAL_PARENT (i);
-	}
       i = gl_state.forward_i;
       gl_state.b_property = i->position - gl_state.offset;
       gl_state.e_property = INTERVAL_LAST_POS (i) - gl_state.offset;


I've just noticed a typo in one of the comments in intervals.h, so the
above can't be final.  Sorry.

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Fri, 01 Mar 2019 14:40:02 GMT) Full text and rfc822 format available.

Message #137 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Daniel Lopez <daniel.lopez999 <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Fri, 1 Mar 2019 14:34:14 +0000
Hello, Daniel.

On Wed, Feb 20, 2019 at 21:25:27 +0000, Daniel Lopez wrote:
> Hi Eli and Alan,

> Thanks for investigating.

[ .... ]

We now understand what went wrong.  There was a problem in the handling
of a position cache in the code for "intervals" (which support text
properties).  There was also a smaller problem with an off-by-1 in the
regex handling code.

I enclose below a patch to fix the bug, which should apply cleanly to
the Emacs 26.1 source tree.

The fix will NOT be incorporated into the upcoming 26.2 release, since
it is just too risky for the small amount of testing time we have left.
However, I think the patch will also apply cleanly to 26.2 when it gets
released.

Please feel free to try out the patch, and let me know of any
undesirable things you see with it.

Thanks again for taking the trouble to report this interesting bug.




diff --git a/src/intervals.c b/src/intervals.c
index e7595b23b3..36bbc122b0 100644
--- a/src/intervals.c
+++ b/src/intervals.c
@@ -713,11 +713,21 @@ previous_interval (register INTERVAL interval)
   return NULL;
 }
 
-/* Find the interval containing POS given some non-NULL INTERVAL
-   in the same tree.  Note that we need to update interval->position
-   if we go down the tree.
-   To speed up the process, we assume that the ->position of
-   I and all its parents is already uptodate.  */
+/* Set the ->position field of I's parent, based on I->position. */
+#define SET_PARENT_POSITION(i)                                  \
+  if (AM_LEFT_CHILD (i))                                        \
+    INTERVAL_PARENT (i)->position =                             \
+      i->position + TOTAL_LENGTH (i) - LEFT_TOTAL_LENGTH (i);   \
+  else                                                          \
+    INTERVAL_PARENT (i)->position =                             \
+      i->position - LEFT_TOTAL_LENGTH (i)                       \
+      - LENGTH (INTERVAL_PARENT (i))
+
+/* Find the interval containing POS, given some non-NULL INTERVAL in
+   the same tree.  Note that we update interval->position in each
+   interval we traverse, assuming it is already correctly set for the
+   argument I.  We don't assume that any other interval already has a
+   correctly set ->position.  */
 INTERVAL
 update_interval (register INTERVAL i, ptrdiff_t pos)
 {
@@ -738,7 +748,10 @@ update_interval (register INTERVAL i, ptrdiff_t pos)
 	  else if (NULL_PARENT (i))
 	    error ("Point before start of properties");
 	  else
-	      i = INTERVAL_PARENT (i);
+            {
+              SET_PARENT_POSITION (i);
+              i = INTERVAL_PARENT (i);
+            }
 	  continue;
 	}
       else if (pos >= INTERVAL_LAST_POS (i))
@@ -753,7 +766,10 @@ update_interval (register INTERVAL i, ptrdiff_t pos)
 	  else if (NULL_PARENT (i))
 	    error ("Point %"pD"d after end of properties", pos);
 	  else
-            i = INTERVAL_PARENT (i);
+            {
+              SET_PARENT_POSITION (i);
+              i = INTERVAL_PARENT (i);
+            }
 	  continue;
 	}
       else
diff --git a/src/intervals.h b/src/intervals.h
index 311ef79466..873e2748ea 100644
--- a/src/intervals.h
+++ b/src/intervals.h
@@ -32,11 +32,15 @@ struct interval
 
   ptrdiff_t total_length;       /* Length of myself and both children.  */
   ptrdiff_t position;	        /* Cache of interval's character position.  */
-				/* This field is usually updated
-				   simultaneously with an interval
-				   traversal, there is no guarantee
-				   that it is valid for a random
-				   interval.  */
+                                /* This field is valid in the final
+                                   target interval returned by
+                                   find_interval, next_interval,
+                                   previous_interval and
+                                   update_interval.  It cannot be
+                                   depended upon in any intermediate
+                                   intervals traversed by these
+                                   functions, or any other
+                                   interval. */
   struct interval *left;	/* Intervals which precede me.  */
   struct interval *right;	/* Intervals which succeed me.  */
 
diff --git a/src/regex.c b/src/regex.c
index 09ed64a6e1..d5d22f7188 100644
--- a/src/regex.c
+++ b/src/regex.c
@@ -5898,8 +5898,8 @@ re_match_2_internal (struct re_pattern_buffer *bufp, const_re_char *string1,
 		int s1, s2;
 		int dummy;
 #ifdef emacs
-		ssize_t offset = PTR_TO_OFFSET (d - 1);
-		ssize_t charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
+		ssize_t offset = PTR_TO_OFFSET (d);
+		ssize_t charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset) - 1;
 		UPDATE_SYNTAX_TABLE_FAST (charpos);
 #endif
 		GET_CHAR_BEFORE_2 (c1, d, string1, end1, string2, end2);
@@ -5985,8 +5985,8 @@ re_match_2_internal (struct re_pattern_buffer *bufp, const_re_char *string1,
 	      int s1, s2;
 	      int dummy;
 #ifdef emacs
-	      ssize_t offset = PTR_TO_OFFSET (d) - 1;
-	      ssize_t charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
+	      ssize_t offset = PTR_TO_OFFSET (d);
+	      ssize_t charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset) - 1;
 	      UPDATE_SYNTAX_TABLE_FAST (charpos);
 #endif
 	      GET_CHAR_BEFORE_2 (c1, d, string1, end1, string2, end2);
@@ -6002,7 +6002,7 @@ re_match_2_internal (struct re_pattern_buffer *bufp, const_re_char *string1,
 		  PREFETCH_NOLIMIT ();
 		  GET_CHAR_AFTER (c2, d, dummy);
 #ifdef emacs
-		  UPDATE_SYNTAX_TABLE_FORWARD_FAST (charpos);
+		  UPDATE_SYNTAX_TABLE_FORWARD_FAST (charpos + 1);
 #endif
 		  s2 = SYNTAX (c2);
 
@@ -6072,8 +6072,8 @@ re_match_2_internal (struct re_pattern_buffer *bufp, const_re_char *string1,
 	      re_wchar_t c1, c2;
 	      int s1, s2;
 #ifdef emacs
-	      ssize_t offset = PTR_TO_OFFSET (d) - 1;
-	      ssize_t charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset);
+	      ssize_t offset = PTR_TO_OFFSET (d);
+	      ssize_t charpos = SYNTAX_TABLE_BYTE_TO_CHAR (offset) - 1;
 	      UPDATE_SYNTAX_TABLE_FAST (charpos);
 #endif
 	      GET_CHAR_BEFORE_2 (c1, d, string1, end1, string2, end2);
diff --git a/src/syntax.c b/src/syntax.c
index 3cc32094a8..ca9e2bbca9 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -339,20 +339,6 @@ update_syntax_table (ptrdiff_t charpos, EMACS_INT count, bool init,
       invalidate = false;
       if (!i)
 	return;
-      /* interval_of updates only ->position of the return value, so
-	 update the parents manually to speed up update_interval.  */
-      while (!NULL_PARENT (i))
-	{
-	  if (AM_RIGHT_CHILD (i))
-	    INTERVAL_PARENT (i)->position = i->position
-	      - LEFT_TOTAL_LENGTH (i) + TOTAL_LENGTH (i) /* right end */
-	      - TOTAL_LENGTH (INTERVAL_PARENT (i))
-	      + LEFT_TOTAL_LENGTH (INTERVAL_PARENT (i));
-	  else
-	    INTERVAL_PARENT (i)->position = i->position - LEFT_TOTAL_LENGTH (i)
-	      + TOTAL_LENGTH (i);
-	  i = INTERVAL_PARENT (i);
-	}
       i = gl_state.forward_i;
       gl_state.b_property = i->position - gl_state.offset;
       gl_state.e_property = INTERVAL_LAST_POS (i) - gl_state.offset;


> Daniel

-- 
Alan Mackenzie (Nuremberg, Germany).




Reply sent to Alan Mackenzie <acm <at> muc.de>:
You have taken responsibility. (Fri, 01 Mar 2019 17:48:02 GMT) Full text and rfc822 format available.

Notification sent to Daniel Lopez <daniel.lopez999 <at> gmail.com>:
bug acknowledged by developer. (Fri, 01 Mar 2019 17:48:02 GMT) Full text and rfc822 format available.

Message #142 received at 34525-done <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: 34525-done <at> debbugs.gnu.org
Cc: Daniel Lopez <daniel.lopez999 <at> gmail.com>
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Fri, 1 Mar 2019 17:42:42 +0000
Bug fixed in master.

Closing.

-- 
Alan Mackenzie (Nuremberg, Germany).




Information forwarded to bug-gnu-emacs <at> gnu.org, bug-cc-mode <at> gnu.org:
bug#34525; Package emacs,cc-mode. (Fri, 01 Mar 2019 18:01:02 GMT) Full text and rfc822 format available.

Message #145 received at 34525 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Lopez <daniel.lopez999 <at> gmail.com>
To: Alan Mackenzie <acm <at> muc.de>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 34525 <at> debbugs.gnu.org
Subject: Re: bug#34525: replace-regexp missing some matches
Date: Fri, 1 Mar 2019 17:58:11 +0000
Thanks very much Alan (and Eli and Stefan), I've just installed this 
locally. It looks good, and I'll let you know if anything seems to go 
awry subsequently, which I don't expect :)

Daniel





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 30 Mar 2019 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.