GNU bug report logs - #58158
29.0.50; [overlay] Interval tree iteration considered harmful

Previous Next

Package: emacs;

Reported by: Gerd Möllmann <gerd.moellmann <at> gmail.com>

Date: Thu, 29 Sep 2022 05:30:02 UTC

Severity: normal

Found in version 29.0.50

Fixed in version 30.1

Done: Gerd Möllmann <gerd.moellmann <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 58158 in the body.
You can then email your comments to 58158 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 29 Sep 2022 05:30:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Gerd Möllmann <gerd.moellmann <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 29 Sep 2022 05:30:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 29.0.50; [overlay] Interval tree iteration considered harmful
Date: Thu, 29 Sep 2022 07:29:25 +0200
In its current form, interval tree iteration works like this:

1. Call begin_iteration(T) to iterate over tree T
2. do stuff
3. Call end_iteration(T)

with the following rules:

- Begin_iteration and end_iteration must be paired.

- There can be only one iteration per tree at a time.  Nested iteration
  over the same tree is not supported (abort).

- No GC may happen in step 2.  This is because mark_buffer iterates over
  buffer overlays.

I think this is an exceedingly dangerous design.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 29 Sep 2022 06:29:02 GMT) Full text and rfc822 format available.

Message #8 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50;
 [overlay] Interval tree iteration considered harmful
Date: Thu, 29 Sep 2022 09:28:03 +0300
> From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
> Date: Thu, 29 Sep 2022 07:29:25 +0200
> 
> In its current form, interval tree iteration works like this:
> 
> 1. Call begin_iteration(T) to iterate over tree T
> 2. do stuff
> 3. Call end_iteration(T)
> 
> with the following rules:
> 
> - Begin_iteration and end_iteration must be paired.
> 
> - There can be only one iteration per tree at a time.  Nested iteration
>   over the same tree is not supported (abort).
> 
> - No GC may happen in step 2.  This is because mark_buffer iterates over
>   buffer overlays.
> 
> I think this is an exceedingly dangerous design.

Why, because of "no GC" requirement?  We could ensure that by calling
inhibit_garbage_collection (if the code doesn't do that already).
That is not very elegant, and might even cause memory pressure in some
(hopefully rare) situations, but we do have some users of this in the
sources.

What higher-level operations require "interval tree iteration" that
you describe?  Which primitives end up doing such iterations?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 29 Sep 2022 07:04:01 GMT) Full text and rfc822 format available.

Message #11 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Thu, 29 Sep 2022 09:03:17 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
>> Date: Thu, 29 Sep 2022 07:29:25 +0200
>> 
>> In its current form, interval tree iteration works like this:
>> 
>> 1. Call begin_iteration(T) to iterate over tree T
>> 2. do stuff
>> 3. Call end_iteration(T)
>> 
>> with the following rules:
>> 
>> - Begin_iteration and end_iteration must be paired.
>> 
>> - There can be only one iteration per tree at a time.  Nested iteration
>>   over the same tree is not supported (abort).
>> 
>> - No GC may happen in step 2.  This is because mark_buffer iterates over
>>   buffer overlays.
>> 
>> I think this is an exceedingly dangerous design.
>
> Why, because of "no GC" requirement?  We could ensure that by calling
> inhibit_garbage_collection (if the code doesn't do that already).

It doesn't.

BTW, if anything signals in step 2, so that end_iteration isn't called,
we're also hosed.

> What higher-level operations require "interval tree iteration" that
> you describe?  Which primitives end up doing such iterations?

What has to do with overlays.  To name a few: overlay-at, overlays-in,
next-overlay-change, previous-overlay-change, overlay-lists, ...

I personally think this is a no-go.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 29 Sep 2022 08:09:02 GMT) Full text and rfc822 format available.

Message #14 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Thu, 29 Sep 2022 11:08:17 +0300
> From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
> Cc: 58158 <at> debbugs.gnu.org
> Date: Thu, 29 Sep 2022 09:03:17 +0200
> 
> >> - No GC may happen in step 2.  This is because mark_buffer iterates over
> >>   buffer overlays.
> >> 
> >> I think this is an exceedingly dangerous design.
> >
> > Why, because of "no GC" requirement?  We could ensure that by calling
> > inhibit_garbage_collection (if the code doesn't do that already).
> 
> It doesn't.

Should be easy to fix, no?

> BTW, if anything signals in step 2, so that end_iteration isn't called,
> we're also hosed.

record_unwind_protect should fix that, right?
(inhibit_garbage_collection already employs this mechanism).

> > What higher-level operations require "interval tree iteration" that
> > you describe?  Which primitives end up doing such iterations?
> 
> What has to do with overlays.  To name a few: overlay-at, overlays-in,
> next-overlay-change, previous-overlay-change, overlay-lists, ...
> 
> I personally think this is a no-go.

Really?  Even if we take all the measures mentioned above?  Why so?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 29 Sep 2022 09:10:02 GMT) Full text and rfc822 format available.

Message #17 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 58158 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Thu, 29 Sep 2022 11:09:17 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
>> I personally think this is a no-go.
>
> Really?  Even if we take all the measures mentioned above?

Yes.

> Why so?

I think it simply can't be that what is basically walking a binary tree
requires such restrictions.  And if it does because of some quirk of the
interval tree in itree.c, something is seriously wrong with the design.

That's a bit harsh, but I mean it :-).





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 29 Sep 2022 09:38:02 GMT) Full text and rfc822 format available.

Message #20 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Thu, 29 Sep 2022 12:37:15 +0300
> From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>,  58158 <at> debbugs.gnu.org
> Date: Thu, 29 Sep 2022 11:09:17 +0200
> 
> I think it simply can't be that what is basically walking a binary tree
> requires such restrictions.  And if it does because of some quirk of the
> interval tree in itree.c, something is seriously wrong with the design.
> 
> That's a bit harsh, but I mean it :-).

Fair enough.

Can you propose a fix?

I guess the begin_iteration/end_iteration dance is because the tree
could be in inconsistent state in-between?  If so, what would it take
to avoid the inconsistency?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 29 Sep 2022 10:06:02 GMT) Full text and rfc822 format available.

Message #23 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Thu, 29 Sep 2022 12:05:50 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> Can you propose a fix?

Other than completely rewrite at least this part, no.

> I guess the begin_iteration/end_iteration dance is because the tree
> could be in inconsistent state in-between?  If so, what would it take
> to avoid the inconsistency?

I find it very hard to tell why this is done the way it is.  If someone
knowing the code can think of a reason, please speak up.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 29 Sep 2022 10:44:02 GMT) Full text and rfc822 format available.

Message #26 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Thu, 29 Sep 2022 13:43:05 +0300
> From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
> Cc: monnier <at> iro.umontreal.ca,  58158 <at> debbugs.gnu.org
> Date: Thu, 29 Sep 2022 12:05:50 +0200
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > Can you propose a fix?
> 
> Other than completely rewrite at least this part, no.
> 
> > I guess the begin_iteration/end_iteration dance is because the tree
> > could be in inconsistent state in-between?  If so, what would it take
> > to avoid the inconsistency?
> 
> I find it very hard to tell why this is done the way it is.  If someone
> knowing the code can think of a reason, please speak up.

I may be missing something, but it looks like the sole purpose of the
iter_start/iter_finish dance is to ensure only one iteration per tree
is running at any given time, and that's because the iteration uses
some state variable(s) of which there's only one instance per tree.

Stefan, am I missing something?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 29 Sep 2022 11:34:02 GMT) Full text and rfc822 format available.

Message #29 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Thu, 29 Sep 2022 13:33:25 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> I find it very hard to tell why this is done the way it is.  If someone
>> knowing the code can think of a reason, please speak up.
>
> I may be missing something, but it looks like the sole purpose of the
> iter_start/iter_finish dance is to ensure only one iteration per tree
> is running at any given time, and that's because the iteration uses
> some state variable(s) of which there's only one instance per tree.

BTW, It currently uses a "visited" flag in tree nodes, which is kind of
such a state, but something like that is normally not needed when
walking an rb-tree.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 29 Sep 2022 13:11:01 GMT) Full text and rfc822 format available.

Message #32 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Gerd Möllmann <gerd.moellmann <at> gmail.com>,
 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Thu, 29 Sep 2022 09:10:19 -0400
> I may be missing something, but it looks like the sole purpose of the
> iter_start/iter_finish dance is to ensure only one iteration per tree
> is running at any given time, and that's because the iteration uses
> some state variable(s) of which there's only one instance per tree.
>
> Stefan, am I missing something?

One reason is that traversing a binary tree usually requires something
like recursion, but that wouldn't fit very conveniently with the current
code (nor with C in general since you can't make a local recursive
closure which accesses local variables from the surrounding function).

Another is the need to update the begin/end fields (these need updating
because of insertions/deletions but they're updated lazily while
traversing the tree to avoid an O(N) complexity during the
insertions/deletions).  Hiding that behind 'some kind of "next node"
function keeps the code more readable.

But yes, the current restriction to have a single iteration at a time is
a bit of a problem, especially because it's very "global".  I added
a comment yesterday describing how we could make it non-global (hence
getting rid of the `visited` flag in the nodes).

For now, I pushed a simple fix to traverse the tree "by hand" in the GC
rather than via the iterator.


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 29 Sep 2022 13:25:02 GMT) Full text and rfc822 format available.

Message #35 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: gerd.moellmann <at> gmail.com, 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Thu, 29 Sep 2022 16:23:47 +0300
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: Gerd Möllmann <gerd.moellmann <at> gmail.com>,
>   58158 <at> debbugs.gnu.org
> Date: Thu, 29 Sep 2022 09:10:19 -0400
> 
> > I may be missing something, but it looks like the sole purpose of the
> > iter_start/iter_finish dance is to ensure only one iteration per tree
> > is running at any given time, and that's because the iteration uses
> > some state variable(s) of which there's only one instance per tree.
> >
> > Stefan, am I missing something?
> 
> One reason is that traversing a binary tree usually requires something
> like recursion, but that wouldn't fit very conveniently with the current
> code (nor with C in general since you can't make a local recursive
> closure which accesses local variables from the surrounding function).

I'm not sure I understand how recursion is related.  Are you saying
that recursion is replaced with iteration?  But then, if _start and
_finish are called by the same caller, we don't really need the
protection, since no one can start another iteration while the first
one runs.  Right?

> Another is the need to update the begin/end fields (these need updating
> because of insertions/deletions but they're updated lazily while
> traversing the tree to avoid an O(N) complexity during the
> insertions/deletions).  Hiding that behind 'some kind of "next node"
> function keeps the code more readable.

Where in the code do you see iteration that adds or deletes nodes?

> For now, I pushed a simple fix to traverse the tree "by hand" in the GC
> rather than via the iterator.

So this removes the restriction of not having GC during iteration?

Also, I take it that you don't consider the current code is as
"harmful" as Gerd thinks it is?  IOW, you don't share his opinion that
this implementation is a "no-go"?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 29 Sep 2022 13:41:01 GMT) Full text and rfc822 format available.

Message #38 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: gerd.moellmann <at> gmail.com, 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Thu, 29 Sep 2022 16:40:10 +0300
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: Gerd Möllmann <gerd.moellmann <at> gmail.com>,
>   58158 <at> debbugs.gnu.org
> Date: Thu, 29 Sep 2022 09:10:19 -0400
> 
> Another is the need to update the begin/end fields (these need updating
> because of insertions/deletions but they're updated lazily while
> traversing the tree to avoid an O(N) complexity during the
> insertions/deletions).  Hiding that behind 'some kind of "next node"
> function keeps the code more readable.

Btw, couldn't we handle this by having a flag in the node that tells
us whether the begin/end fields can be trusted?  Then the first caller
who need them would run the update and reset the flags, and we still
have lazy update, albeit somewhat less lazy, but without the need for
guarding the iteration with start/finish calls.  Would that work?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 29 Sep 2022 14:16:02 GMT) Full text and rfc822 format available.

Message #41 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Thu, 29 Sep 2022 16:15:09 +0200
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

> One reason is that traversing a binary tree usually requires something
> like recursion, but that wouldn't fit very conveniently with the current
> code (nor with C in general since you can't make a local recursive
> closure which accesses local variables from the surrounding function).

Ok, usually, but not necessarily.  The alternative is to implement an
iterator that starts with a node N, and an implementation of a successor
function, which return the successor of N in a given order.  This
requires a parent pointer in nodes, but that we have.

(Something like this is used for ordered containers like "map" and "set"
in C++ STL, for instance, which are based on rb-trees in GCC's
libstdc++.)

> Another is the need to update the begin/end fields (these need updating
> because of insertions/deletions but they're updated lazily while
> traversing the tree to avoid an O(N) complexity during the
> insertions/deletions).  Hiding that behind 'some kind of "next node"
> function keeps the code more readable.

Is this for buffer text changes, something akin to a delayed update of
marker positions?  I already wondered where that was.

> But yes, the current restriction to have a single iteration at a time is
> a bit of a problem, especially because it's very "global".  I added
> a comment yesterday describing how we could make it non-global (hence
> getting rid of the `visited` flag in the nodes).

BTW, and related to iteration directly, did you notice this in
interval_tree_insert?

      /* This suggests that nodes in the right subtree are strictly
         greater.  But this is not true due to later rotations. */
      child = node->begin <= child->begin ? child->left : child->right;




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 29 Sep 2022 14:39:01 GMT) Full text and rfc822 format available.

Message #44 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Thu, 29 Sep 2022 16:37:51 +0200
Gerd Möllmann <gerd.moellmann <at> gmail.com> writes:

> Stefan Monnier <monnier <at> iro.umontreal.ca> writes:
>
>> One reason is that traversing a binary tree usually requires something
>> like recursion, but that wouldn't fit very conveniently with the current
>> code (nor with C in general since you can't make a local recursive
>> closure which accesses local variables from the surrounding function).
>
> Ok, usually, but not necessarily.  The alternative is to implement an
> iterator that starts with a node N, and an implementation of a successor
> function, which return the successor of N in a given order.  This
> requires a parent pointer in nodes, but that we have.
>
> (Something like this is used for ordered containers like "map" and "set"
> in C++ STL, for instance, which are based on rb-trees in GCC's
> libstdc++.)

The code from libstdc++ is this (from libstdc++-v3/src/c++98/tree.cc):

  static _Rb_tree_node_base*
  local_Rb_tree_increment(_Rb_tree_node_base* __x) throw ()
  {
    if (__x->_M_right != 0)
      {
        __x = __x->_M_right;
        while (__x->_M_left != 0)
          __x = __x->_M_left;
      }
    else
      {
        _Rb_tree_node_base* __y = __x->_M_parent;
        while (__x == __y->_M_right)
          {
            __x = __y;
            __y = __y->_M_parent;
          }
        if (__x->_M_right != __y)
          __x = __y;
      }
    return __x;
  }

I hope one can read that.

The idea is to find the root of the smallest subtree containing the
current node, and proceed from there.  That's why the parent pointer is
needed.

Symmmetrical for max->min ordering.  And finding min/max is trivial.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 29 Sep 2022 16:41:02 GMT) Full text and rfc822 format available.

Message #47 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Thu, 29 Sep 2022 12:40:25 -0400
Gerd Möllmann [2022-09-29 16:15:09] wrote:
> Stefan Monnier <monnier <at> iro.umontreal.ca> writes:
>> One reason is that traversing a binary tree usually requires something
>> like recursion, but that wouldn't fit very conveniently with the current
>> code (nor with C in general since you can't make a local recursive
>> closure which accesses local variables from the surrounding function).
> Ok, usually, but not necessarily.  The alternative is to implement an
> iterator that starts with a node N, and an implementation of a successor
> function, which return the successor of N in a given order.

The approach currently used is somewhat similar to that.  Some of the
difference is that we need an actual "iterator/generator" object to
remember the parameter of the filtering we want to apply to the set of
objects.  And the problem is that this "object" is currently implemented
not only as a global value (thus restricting us to one-iteration at
a time) but also with some parts of the data stored in the tree.
I think this is the part that really needs to be changed.

> This requires a parent pointer in nodes, but that we have.
>
> (Something like this is used for ordered containers like "map" and "set"
> in C++ STL, for instance, which are based on rb-trees in GCC's
> libstdc++.)

Another difference is that itree.c's iterator uses a local "work stack"
instead of traversing the tree exclusively via left/right/parent like in
the code you show.  I don't know if that difference is important, tho.

>> Another is the need to update the begin/end fields (these need updating
>> because of insertions/deletions but they're updated lazily while
>> traversing the tree to avoid an O(N) complexity during the
>> insertions/deletions).  Hiding that behind 'some kind of "next node"
>> function keeps the code more readable.
>
> Is this for buffer text changes, something akin to a delayed update of
> marker positions?

Yes, exactly.

>> But yes, the current restriction to have a single iteration at a time is
>> a bit of a problem, especially because it's very "global".  I added
>> a comment yesterday describing how we could make it non-global (hence
>> getting rid of the `visited` flag in the nodes).
>
> BTW, and related to iteration directly, did you notice this in
> interval_tree_insert?
>
>       /* This suggests that nodes in the right subtree are strictly
>          greater.  But this is not true due to later rotations. */
>       child = node->begin <= child->begin ? child->left : child->right;

No, I had not noticed that yet, and I don't understand this comment.


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 29 Sep 2022 16:49:02 GMT) Full text and rfc822 format available.

Message #50 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: gerd.moellmann <at> gmail.com, 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Thu, 29 Sep 2022 12:48:34 -0400
>> > I may be missing something, but it looks like the sole purpose of the
>> > iter_start/iter_finish dance is to ensure only one iteration per tree
>> > is running at any given time, and that's because the iteration uses
>> > some state variable(s) of which there's only one instance per tree.
>> >
>> > Stefan, am I missing something?
>> 
>> One reason is that traversing a binary tree usually requires something
>> like recursion, but that wouldn't fit very conveniently with the current
>> code (nor with C in general since you can't make a local recursive
>> closure which accesses local variables from the surrounding function).
>
> I'm not sure I understand how recursion is related.  Are you saying
> that recursion is replaced with iteration?  But then, if _start and
> _finish are called by the same caller, we don't really need the
> protection, since no one can start another iteration while the first
> one runs.  Right?

Typically, current code will look something like:

    int x;
    Lisp_Object y;

    buffer_overlay_iter_start (current_buffer, prev, pos, ITREE_DESCENDING);
    while ((node = buffer_overlay_iter_next (current_buffer)))
      {
        ... do something that updates x and y ...
      }
    buffer_overlay_iter_finish (current_buffer);

If we were to use recursion, then we'd need to define a new (recursive)
function which does what's currently done in the `while` loop, but this
function can't access `x` and `y`, so it would need to take them as
argument, or a reference to them...

The use of an iterator is definitely convenient (and is a standard
approach in many languages).

>> Another is the need to update the begin/end fields (these need updating
>> because of insertions/deletions but they're updated lazily while
>> traversing the tree to avoid an O(N) complexity during the
>> insertions/deletions).  Hiding that behind 'some kind of "next node"
>> function keeps the code more readable.
>
> Where in the code do you see iteration that adds or deletes nodes?

No, I meant insertion/deletion of text in the buffer, thus requiring
updates to `begin/end` fields.

> Btw, couldn't we handle this by having a flag in the node that tells
> us whether the begin/end fields can be trusted?  Then the first caller
> who need them would run the update and reset the flags, and we still
> have lazy update, albeit somewhat less lazy, but without the need for
> guarding the iteration with start/finish calls.  Would that work?

Yes, it would, but it's still O(N).
The current approach is inspired by the approach used in `intervals.c`
which also updates those fields lazily/ondemand so as to avoid the O(N)
performance impact.

>> For now, I pushed a simple fix to traverse the tree "by hand" in the GC
>> rather than via the iterator.
> So this removes the restriction of not having GC during iteration?

Yes.

> Also, I take it that you don't consider the current code is as
> "harmful" as Gerd thinks it is?

I don't like the global state it uses, but I think we can fix this
aspect without too much trouble.

> IOW, you don't share his opinion that this implementation is
> a "no-go"?

No, indeed, I don't.


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 29 Sep 2022 22:10:02 GMT) Full text and rfc822 format available.

Message #53 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Thu, 29 Sep 2022 18:09:01 -0400
>   static _Rb_tree_node_base*
>   local_Rb_tree_increment(_Rb_tree_node_base* __x) throw ()
>   {
>     if (__x->_M_right != 0)
>       {
>         __x = __x->_M_right;
>         while (__x->_M_left != 0)
>           __x = __x->_M_left;
>       }
>     else
>       {
>         _Rb_tree_node_base* __y = __x->_M_parent;
>         while (__x == __y->_M_right)
>           {
>             __x = __y;
>             __y = __y->_M_parent;
>           }
>         if (__x->_M_right != __y)
>           __x = __y;
>       }
>     return __x;
>   }
>
> I hope one can read that.

I changed the code to store the `visited` bit in the work stack, but if
you could rewrite the `interval_generator_next` along the lines of the
code above that would be great.
[ An alternative would be to try and get rid of the `parent` field :-)  ]


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Fri, 30 Sep 2022 05:29:01 GMT) Full text and rfc822 format available.

Message #56 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Fri, 30 Sep 2022 07:28:26 +0200
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

> I changed the code to store the `visited` bit in the work stack, but if
> you could rewrite the `interval_generator_next` along the lines of the
> code above that would be great.

Ok, I'll rewrite that :-).  When I understand what that "narrowing" is
and how and for what it is used.

BTW, what do you think of changing function names to something a bit
shorter?  I find myself constantly getting confused when reading the
code.  I think an "itree_" prefix would suffice for non-static
functions, and static ones without prefix.  Renaming that is hopefully
not that much work with LSP.

> [ An alternative would be to try and get rid of the `parent` field :-)
> ]

:-)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Fri, 30 Sep 2022 06:12:02 GMT) Full text and rfc822 format available.

Message #59 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Fri, 30 Sep 2022 09:11:09 +0300
> From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
> Cc: Eli Zaretskii <eliz <at> gnu.org>,  58158 <at> debbugs.gnu.org
> Date: Fri, 30 Sep 2022 07:28:26 +0200
> 
> Renaming that is hopefully not that much work with LSP.

You should be able to do this with M-? followed by
"M-x xref-query-replace-in-results RET" (the latter
should be invoked in the "*xref*" buffer produced by M-?).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Fri, 30 Sep 2022 11:32:02 GMT) Full text and rfc822 format available.

Message #62 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration considered
 harmful
Date: Fri, 30 Sep 2022 13:31:21 +0200
On 22-09-30 8:11 , Eli Zaretskii wrote:
>> From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
>> Cc: Eli Zaretskii <eliz <at> gnu.org>,  58158 <at> debbugs.gnu.org
>> Date: Fri, 30 Sep 2022 07:28:26 +0200
>>
>> Renaming that is hopefully not that much work with LSP.
> 
> You should be able to do this with M-? followed by
> "M-x xref-query-replace-in-results RET" (the latter
> should be invoked in the "*xref*" buffer produced by M-?).

Ok, thanks for the tip.

Info for Stefan: I've now removed the per-tree null node, and make check 
shows no unexpected results.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Fri, 30 Sep 2022 13:26:02 GMT) Full text and rfc822 format available.

Message #65 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Fri, 30 Sep 2022 09:25:29 -0400
Gerd Möllmann [2022-09-30 07:28:26] wrote:
> Stefan Monnier <monnier <at> iro.umontreal.ca> writes:
>> I changed the code to store the `visited` bit in the work stack, but if
>> you could rewrite the `interval_generator_next` along the lines of the
>> code above that would be great.
> Ok, I'll rewrite that :-).  When I understand what that "narrowing" is
> and how and for what it is used.

The traversals are always bound by begin..end boundaries.  Usually we
know those bounds upfront (e.g. for `overlays-in` or `overlay-at`), but
when doing things like `next/previous-overlay-change` one of the bounds
is not know at first, so in order to try and avoid the O(N) complexity
the code refines that other bound on the fly (e.g. when searching
forward, after seeing an overlay that ends at POS we know that there's no
point looking further than POS so we can move the end boundary to POS).

> BTW, what do you think of changing function names to something a bit
> shorter?  I find myself constantly getting confused when reading the
> code.  I think an "itree_" prefix would suffice for non-static
> functions, and static ones without prefix.

Fine by me, yes.


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Fri, 30 Sep 2022 14:09:01 GMT) Full text and rfc822 format available.

Message #68 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Fri, 30 Sep 2022 16:08:36 +0200
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

> The traversals are always bound by begin..end boundaries.  Usually we
> know those bounds upfront (e.g. for `overlays-in` or `overlay-at`), but
> when doing things like `next/previous-overlay-change` one of the bounds
> is not know at first, so in order to try and avoid the O(N) complexity
> the code refines that other bound on the fly (e.g. when searching
> forward, after seeing an overlay that ends at POS we know that there's no
> point looking further than POS so we can move the end boundary to
> POS).

Thanks, that helps.

Maybe you could also help me with the questions below?

I'm assuming, from a comment somewhere, that an interval tree is an
rb-tree with keys being interval start positions, and allowing
duplicates.

That is, if N is a node, all nodes in the subtree N->left are strictly <
N, and nodes in N->right are >=.  Or in a picture, if [start, end] is an
interval, and we insert some intervals with the same start, we could
arrive at

                 [5, a]
                /      \
            [4, b]     [6, c]
              /\         /  \
             0  0       0   [6, d]
                              /\
                              ...

where 0 means null, and 6 is a duplicate start.

1. Is that correct?

2. Does the tree order duplicates in a particular way?  Maybe by overlay
priority, or by interval end?  And, perhaps, if it doesn't order
duplicates, would it be an idea to order them, maybe at some later
point?  I'm asking this because a successor(N) function would return
nodes in the order in the tree, always, and I don't know what the order
is now.

3. If my mental picture is right, we could of course end up with a tree
that has degenerated to a list.  Or a subtree, maybe.  Do you know if
that can happen?





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Fri, 30 Sep 2022 15:26:02 GMT) Full text and rfc822 format available.

Message #71 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Andreas Politz <mail <at> andreas-politz.de>,
 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Fri, 30 Sep 2022 11:25:30 -0400
> Maybe you could also help me with the questions below?

I'll try (BTW, the original author is Andreas Politz who we can still
reach at <mail <at> andreas-politz.de>.  He doesn't have much time to devote
to it, tho, so best not to Cc him through all the conversations).

> I'm assuming, from a comment somewhere, that an interval tree is an
> rb-tree with keys being interval start positions, and allowing
> duplicates.

Yup.

> That is, if N is a node, all nodes in the subtree N->left are strictly <
> N, and nodes in N->right are >=.

The following code in `interval_tree_insert`:

  while (child != ITREE_NULL)
    {
      parent = child;
      offset += child->offset;
      child->limit = max (child->limit, node->end - offset);
      /* This suggests that nodes in the right subtree are strictly
         greater.  But this is not true due to later rotations. */
      child = node->begin <= child->begin ? child->left : child->right;
    }

suggests that N->left are <= and N->right are > but my reading of the
comment is that the only thing we can rely on is that N-<left is <= and
N->right is >=

[ I do understand this comment now :-)  ]

> 2. Does the tree order duplicates in a particular way?
> Maybe by overlay priority, or by interval end?

AFAICT, no it does not.

> And, perhaps, if it doesn't order duplicates, would it be an idea to
> order them, maybe at some later point?  I'm asking this because
> a successor(N) function would return nodes in the order in the tree,
> always, and I don't know what the order is now.

Ordering based on interval end could arguably make sense.  I'm not
completely sure how/where it would benefit us, tho.  Most times we look
at the itree, we just look at *all* the nodes within a specific
interval, so the order in which they're returned doesn't matter very
much (the ordering within the tree does matter in terms of how we manage
to prune the tree, but that has more to do with which elements are near
the root, which is a different kind of "ordering" than the "binary tree
ordering" itself).  Maybe the `next/previous-overlay-change` code
benefit from sub-ordering based on `end`, but I suspect the effect would
be lost in the noise: if we want to speed up that part of the code,
I expect that a better avenue is to rewrite the
`next/previous-single-overlay-change` so as not to work by (while ..
(next/previous-overlay-change ..) .. (get-char-property ..) ..) where
both functions do the O(log N) itree-iteration dance, but instead doing
the itree iteration itself.

> 3. If my mental picture is right, we could of course end up with a tree
> that has degenerated to a list.  Or a subtree, maybe.  Do you know if
> that can happen?

In terms of tree depth, no: the code preserves the RB invariants, which
ensure balance regardless of ordering (or lack thereof).


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Fri, 30 Sep 2022 16:05:01 GMT) Full text and rfc822 format available.

Message #74 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: gerd.moellmann <at> gmail.com, mail <at> andreas-politz.de, 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Fri, 30 Sep 2022 19:04:05 +0300
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: Eli Zaretskii <eliz <at> gnu.org>,  58158 <at> debbugs.gnu.org, Andreas Politz
>  <mail <at> andreas-politz.de>
> Date: Fri, 30 Sep 2022 11:25:30 -0400
> 
> > And, perhaps, if it doesn't order duplicates, would it be an idea to
> > order them, maybe at some later point?  I'm asking this because
> > a successor(N) function would return nodes in the order in the tree,
> > always, and I don't know what the order is now.
> 
> Ordering based on interval end could arguably make sense.  I'm not
> completely sure how/where it would benefit us, tho.  Most times we look
> at the itree, we just look at *all* the nodes within a specific
> interval, so the order in which they're returned doesn't matter very
> much (the ordering within the tree does matter in terms of how we manage
> to prune the tree, but that has more to do with which elements are near
> the root, which is a different kind of "ordering" than the "binary tree
> ordering" itself).  Maybe the `next/previous-overlay-change` code
> benefit from sub-ordering based on `end`, but I suspect the effect would
> be lost in the noise: if we want to speed up that part of the code,
> I expect that a better avenue is to rewrite the
> `next/previous-single-overlay-change` so as not to work by (while ..
> (next/previous-overlay-change ..) .. (get-char-property ..) ..) where
> both functions do the O(log N) itree-iteration dance, but instead doing
> the itree iteration itself.

The display engine sorts the overlays, so order could be important.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Fri, 30 Sep 2022 17:12:01 GMT) Full text and rfc822 format available.

Message #77 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: gerd.moellmann <at> gmail.com, mail <at> andreas-politz.de, 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Fri, 30 Sep 2022 13:11:06 -0400
>> Ordering based on interval end could arguably make sense.  I'm not
>> completely sure how/where it would benefit us, tho.  Most times we look
>> at the itree, we just look at *all* the nodes within a specific
>> interval, so the order in which they're returned doesn't matter very
>> much (the ordering within the tree does matter in terms of how we manage
>> to prune the tree, but that has more to do with which elements are near
>> the root, which is a different kind of "ordering" than the "binary tree
>> ordering" itself).  Maybe the `next/previous-overlay-change` code
>> benefit from sub-ordering based on `end`, but I suspect the effect would
>> be lost in the noise: if we want to speed up that part of the code,
>> I expect that a better avenue is to rewrite the
>> `next/previous-single-overlay-change` so as not to work by (while ..
>> (next/previous-overlay-change ..) .. (get-char-property ..) ..) where
>> both functions do the O(log N) itree-iteration dance, but instead doing
>> the itree iteration itself.
>
> The display engine sorts the overlays, so order could be important.

Indeed, for things like `overlays_at` it would be handy if the iterator
could return the right overlays already in the right order, so we don't
have to `sort_overlays`.

But in `sort_overlays` the `priority` property takes precedence, so if
we order the RB tree based on that ordering, we will lose the main
purpose of the change, i.e. finding the overlays at POS in O(log N)
time, because the `compare_overlays` ordering does not let us know that
all of the left or right subtree is outside of a given interval.
IOW the itree has to use a different ordering than that of
`compare_overlays` anyway, so we're still forced to (re-)sort afterwards
with `sort_overlays` :-(


	Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Fri, 30 Sep 2022 18:31:02 GMT) Full text and rfc822 format available.

Message #80 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Fri, 30 Sep 2022 14:29:59 -0400
> Info for Stefan: I've now removed the per-tree null node, and make check
> shows no unexpected results.

Thanks.  It's an improvement, but I still don't understand how it
works :-(


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Sat, 01 Oct 2022 01:58:02 GMT) Full text and rfc822 format available.

Message #83 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: eliz <at> gnu.org, 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#58158: 29.0.50;
 [overlay] Interval tree iteration considered harmful
Date: Fri, 30 Sep 2022 21:57:36 -0400
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I think it simply can't be that what is basically walking a binary tree
  > requires such restrictions.

If we allow running complicated code during the tree walk, that would
be asking for trouble.  But if the operations done during the tree
walk are simple and written in C, it should be clear that there is no
possibility of this kind of a bug.  Isn't that the case?

  > > What has to do with overlays.  To name a few: overlay-at, overlays-in,
  > > next-overlay-change, previous-overlay-change, overlay-lists, ...

Those operations can be done in C with no risk of signaling an error.

That ought to be sufficient, as long as we don't need the code to be
reentrant.  If the code needs to be reentrant then we would have
to eliminate the "visited" flag.  But if we don't run Lisp code
during the tree-walk, we should not need it to be reentrant.

-- 
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Sat, 01 Oct 2022 05:08:02 GMT) Full text and rfc822 format available.

Message #86 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Andreas Politz <mail <at> andreas-politz.de>,
 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Sat, 01 Oct 2022 07:06:50 +0200
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

>> That is, if N is a node, all nodes in the subtree N->left are strictly <
>> N, and nodes in N->right are >=.
>
> The following code in `interval_tree_insert`:
>
>   while (child != ITREE_NULL)
>     {
>       parent = child;
>       offset += child->offset;
>       child->limit = max (child->limit, node->end - offset);
>       /* This suggests that nodes in the right subtree are strictly
>          greater.  But this is not true due to later rotations. */
>       child = node->begin <= child->begin ? child->left : child->right;
>     }
>
> suggests that N->left are <= and N->right are > but my reading of the
> comment is that the only thing we can rely on is that N-<left is <= and
> N->right is >=

Phew.  I'm not sure but I get the feeling that makes implementing a
successor function, let's say, challenging.

I wonder how std::multimap deals with that.  Hm.

Anyways, with your removal of the visited flag (N thumbs up, for large
values of N :-)), most of the problems I preceived are gone anyway.
So, I guess we can live with the iteration stack, which seems to work
fine, and the alternative is only nice to have, now.

(And impacts are getting closer in the real world here, so I'm
a bit distracted anyway.)

>
> [ I do understand this comment now :-)  ]

:-)

>
>> 2. Does the tree order duplicates in a particular way?
>> Maybe by overlay priority, or by interval end?
>
> AFAICT, no it does not.

Ok.

>> And, perhaps, if it doesn't order duplicates, would it be an idea to
>> order them, maybe at some later point?  I'm asking this because
>> a successor(N) function would return nodes in the order in the tree,
>> always, and I don't know what the order is now.
>
> Ordering based on interval end could arguably make sense.  I'm not
> completely sure how/where it would benefit us, tho.  Most times we look
> at the itree, we just look at *all* the nodes within a specific
> interval, so the order in which they're returned doesn't matter very
> much (the ordering within the tree does matter in terms of how we manage
> to prune the tree, but that has more to do with which elements are near
> the root, which is a different kind of "ordering" than the "binary tree
> ordering" itself).  Maybe the `next/previous-overlay-change` code
> benefit from sub-ordering based on `end`, but I suspect the effect would
> be lost in the noise: if we want to speed up that part of the code,
> I expect that a better avenue is to rewrite the
> `next/previous-single-overlay-change` so as not to work by (while ..
> (next/previous-overlay-change ..) .. (get-char-property ..) ..) where
> both functions do the O(log N) itree-iteration dance, but instead doing
> the itree iteration itself.

Thanks.

>
>> 3. If my mental picture is right, we could of course end up with a tree
>> that has degenerated to a list.  Or a subtree, maybe.  Do you know if
>> that can happen?
>
> In terms of tree depth, no: the code preserves the RB invariants, which
> ensure balance regardless of ordering (or lack thereof).

Ok, thanks for your insights!




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Sat, 01 Oct 2022 07:01:02 GMT) Full text and rfc822 format available.

Message #89 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: rms <at> gnu.org
Cc: gerd.moellmann <at> gmail.com, 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#58158: 29.0.50;
 [overlay] Interval tree iteration considered harmful
Date: Sat, 01 Oct 2022 10:00:03 +0300
> From: Richard Stallman <rms <at> gnu.org>
> Cc: eliz <at> gnu.org, 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> Date: Fri, 30 Sep 2022 21:57:36 -0400
> 
>   > > What has to do with overlays.  To name a few: overlay-at, overlays-in,
>   > > next-overlay-change, previous-overlay-change, overlay-lists, ...
> 
> Those operations can be done in C with no risk of signaling an error.

I think such assumptions were proven dangerous at best in the long
run.  The way Emacs develops, we constantly add more and more hooks to
C code, and more and more direct calls into Lisp from C.  These
invalidate any assumptions about "no risk of signaling an error", even
if they were originally true when the code was first written.

Moreover, we test for QUIT in operations that can be prolonged ones
(and for a good reason), so any long loop in C could potentially throw
to top level if the user pressed C-g.

All in all, experience shows that making such assumptions in Emacs is
unsafe.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Sat, 01 Oct 2022 07:26:01 GMT) Full text and rfc822 format available.

Message #92 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Andreas Politz <mail <at> andreas-politz.de>,
 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Sat, 01 Oct 2022 09:25:47 +0200
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

>> Maybe you could also help me with the questions below?
>
> I'll try (BTW, the original author is Andreas Politz who we can still
> reach at <mail <at> andreas-politz.de>.  He doesn't have much time to devote
> to it, tho, so best not to Cc him through all the conversations).
>
>> I'm assuming, from a comment somewhere, that an interval tree is an
>> rb-tree with keys being interval start positions, and allowing
>> duplicates.
>
> Yup.
>
>> That is, if N is a node, all nodes in the subtree N->left are strictly <
>> N, and nodes in N->right are >=.
>
> The following code in `interval_tree_insert`:
>
>   while (child != ITREE_NULL)
>     {
>       parent = child;
>       offset += child->offset;
>       child->limit = max (child->limit, node->end - offset);
>       /* This suggests that nodes in the right subtree are strictly
>          greater.  But this is not true due to later rotations. */
>       child = node->begin <= child->begin ? child->left : child->right;
>     }
>
> suggests that N->left are <= and N->right are > but my reading of the
> comment is that the only thing we can rely on is that N-<left is <= and
> N->right is >=
>

Yup.  I've used overlay-tree a bit (compile with ITREE_DEBUG defined
after pulling), and used this:

(defun make-ivs ()
  (with-current-buffer (get-buffer-create "*iv*")
    (delete-all-overlays)
    (erase-buffer)
    (insert (make-string 50 ?x))
    (let ((o1 (make-overlay 1 1))
	  (o2 (make-overlay 10 11))
	  (o3 (make-overlay 10 12))
	  (o4 (make-overlay 1 1))
	  )
      (move-overlay o4 10 13)
      (overlay-tree))))

(pp (make-ivs))
((:begin 10 :end 12 :limit 13 :offset 0 :rear-advance nil :front-advance nil)
 ((:begin 1 :end 1 :limit 13 :offset 0 :rear-advance nil :front-advance nil)
  nil
  ((:begin 10 :end 13 :limit 13 :offset 0 :rear-advance nil :front-advance nil)
   nil nil))
 ((:begin 10 :end 11 :limit 11 :offset 0 :rear-advance nil :front-advance nil)
  nil nil))

That's 

                    [10, 12]
                   /        \
                 [1, 1]      [10, 11]
                 /    \         /\
                    [10, 13]
                     /    \
                          
The 10 is found "all over the place".

I surmise no reasonable successor function can be written for such a
tree.

I have to look at std::multimap, they manage to do this somehow.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Sat, 01 Oct 2022 10:56:02 GMT) Full text and rfc822 format available.

Message #95 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Andreas Politz <mail <at> andreas-politz.de>,
 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Sat, 01 Oct 2022 12:55:42 +0200
Gerd Möllmann <gerd.moellmann <at> gmail.com> writes:

> I have to look at std::multimap, they manage to do this somehow.

Well, I should have thought of that, because it's obvious from the fact
they are able to use successor/predecessor functions :-/.

The equivalent in our code would go like below, which is just written
down in a hurry, so please bear with me.  It's just about the idea.  So:

Insert a new interval_node only if overlay start is not in the tree
already.  If the contains a node for the start, make the node's value a
list of overlays, and add the new one to it in an order that's
convenient (The STL uses insertion order).


As a consequence, keys is the rb-tree are unique, and it is always
strictly ordered, rotations don't change that.  The min node is the
left-most node in a tree, and so on.

So far so good.


An iterator in the order min->max would have to record the interval_node
in the tree plus information where in the list of overlays it is, if it
is in any.  Finding the next node is implemented with a successor
function like the one I've shown from libstdc++.

To find all overlays containing a given position POS, find the node
whose start is <= POS.  Start at the root of the tree, and walk the left
link until we find the ndoes's start is <= POS.  Then iterate forward
until start > POS, returning only overlapping overlays.

Finding overlays intersecting an interval [BEG, END] is not as nice, but
we can exclude intervals starting after END.  So we have to iterate over
all overlays from the minimum of the tree, and iterate forward until we
reach the first excluded one.

That's so "easy" that even I can understand how it works :-).

What am I overlooking?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Sat, 01 Oct 2022 13:55:02 GMT) Full text and rfc822 format available.

Message #98 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Andreas Politz <mail <at> andreas-politz.de>,
 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Sat, 01 Oct 2022 09:54:36 -0400
>> The following code in `interval_tree_insert`:
[...]
>> suggests that N->left are <= and N->right are > but my reading of the
>> comment is that the only thing we can rely on is that N-<left is <= and
>> N->right is >=
>
> Phew.  I'm not sure but I get the feeling that makes implementing a
> successor function, let's say, challenging.

I don't think it makes any difference in that respect, no.  The notion of
"successor" is just "the successor in the in-order traversal of the
tree" and while rotations change the initial property that "N->right are >",
they don't change the in-order traversal output (they don't re-order
nodes w.r.t in-order traversal (or in reverse order for that matter)).

As alluded to at the end of my previous email, while RB trees are
typically used for your usual binary tree which comes with some notion
of ordering that makes lookup O(log N), the RB trees ensure balance
without relying on any notion of ordering since the rotations just
blindly preserve the order as a side effect but they don't themselves
need to call the ordering function to decide how to do their job.


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Sat, 01 Oct 2022 14:02:01 GMT) Full text and rfc822 format available.

Message #101 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Andreas Politz <mail <at> andreas-politz.de>,
 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Sat, 01 Oct 2022 10:01:04 -0400
> That's 
>
>                     [10, 12]
>                    /        \
>                  [1, 1]      [10, 11]
>                  /    \         /\
>                     [10, 13]
>                      /    \
>                           
> The 10 is found "all over the place".

"all over the place" if you consider the pre-order or post-order
traversal, indeed.  But if you look at the in-order traversal (e.g., the
one you'd get from C++ `local_Rb_tree_increment` you showed), you get
the expected result:

   [1, 1]
   [10, 13]
   [10, 12]
   [10, 11]


-- Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Sun, 02 Oct 2022 08:07:02 GMT) Full text and rfc822 format available.

Message #104 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Sun, 02 Oct 2022 10:06:01 +0200
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

>> Info for Stefan: I've now removed the per-tree null node, and make check
>> shows no unexpected results.
>
> Thanks.  It's an improvement, but I still don't understand how it
> works :-(

Is it still null->parent?

If so, maybe it helps to picture that null is a child of many parents
:-).  Every node having null on the left or right is a parent in that
sense.  Therefore, if some code uses null->parent to do something it can
not be right.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Sun, 02 Oct 2022 08:23:01 GMT) Full text and rfc822 format available.

Message #107 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Andreas Politz <mail <at> andreas-politz.de>,
 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Sun, 02 Oct 2022 10:22:21 +0200
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

>> Phew.  I'm not sure but I get the feeling that makes implementing a
>> successor function, let's say, challenging.
>
> I don't think it makes any difference in that respect, no.

The reason I find it challenging, and I'm sure now, is that I've written
the following code, and failed :-).

/* FIXME: This assumption is wrong.  Nodes on the left of P are <=,
   and nodes on the right are >=.  */

/* Value is the successor of interval node X in ascending order.  It
   is assumed that the tree is organized so that nodes < X are in
   X->left and nodes in X->right are >= X.  */

static struct interval_node *
in_order_successor (struct interval_node *x)
{
  if (x->right != ITREE_NULL)
    {
      /* X has a right child, which means X's right subtree has
	 elements >= X.  Proceed to the left-most child in the right
	 subtree, which is the smallest in that subtree.  */
      x = x->right;
      while (x->left != 0)
	x = x->left;
    }
  else
    {
      /* X's left subtree is uninteresting, because everything there
	 is < X.  Therefore follow the parent chain.  If X is the
	 parent's right child, this means the parent is < X,  We are
	 looking for a parent that is >=.  */
      struct interval_node *y = x->parent;
      while (x == y->right)
	{
	  x = y;
	  y = y->parent;
	}

      /* If we found a parent that's >=, the parent is what we sought.
	 Otherwise, X has arrived at the null node, whose right child
	 is the sentinel node itself.  */
      if (x->right != y)
	x = y;
    }

  return x;
}

I tried to change the comments and/or modify the code for a tree like we
have (left subtree <=, right >=), and couldn't explain why it works,
but I also couldn't produce a counter-example that the existing tree
code actually can produce.  IOW, I couldn't prove anything.

P.S.

With the "all over the place" I indented to hint at the fact that the
tree is not a "normal" BST.  I should probably have said that directly,
sorry.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Sun, 02 Oct 2022 16:52:02 GMT) Full text and rfc822 format available.

Message #110 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Politz <mail <at> andreas-politz.de>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 58158 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Sun, 02 Oct 2022 18:32:19 +0200
[Message part 1 (text/plain, inline)]
I've managed to construct a prototype of this "stateless" iterator.

I've only implemented the in-order case and only applied it in the overlays_in function.

The only real trouble I had was with pushing the offset to the children
during the tree navigation in some kind of sensible way.  In the end I
gave up and simply pasted calls to the corresponding function "all over
the place".  It seems to work, at least buffer-tests are passing.

Andreas

[noverlay-stateless-iterator.diff (text/x-patch, inline)]
diff --git a/src/buffer.c b/src/buffer.c
index f59fddcbde..6a53b49aad 100644
--- a/src/buffer.c
+++ b/src/buffer.c
@@ -2948,15 +2948,16 @@ overlays_in (ptrdiff_t beg, ptrdiff_t end, bool extend,
   ptrdiff_t next = ZV;
   Lisp_Object *vec = *vec_ptr;
   struct interval_node *node;
+  struct interval_tree_iterator iterator;
 
-  ITREE_FOREACH (node, current_buffer->overlays, beg,
-                 /* Find empty OV at Z ? */
-                 (end >= ZV && empty) ? ZV + 1 : ZV, ASCENDING)
+  interval_tree_iterator_init(&iterator, current_buffer->overlays, beg,
+			      (end >= ZV && empty) ? ZV + 1 : ZV, ITREE_ASCENDING);
+
+  while ((node = interval_tree_iterator_next(&iterator)))
     {
       if (node->begin > end)
         {
           next = min (next, node->begin);
-          ITREE_FOREACH_ABORT ();
           break;
         }
       else if (node->begin == end)
@@ -2964,7 +2965,6 @@ overlays_in (ptrdiff_t beg, ptrdiff_t end, bool extend,
           next = node->begin;
           if ((! empty || end < ZV) && beg < end)
             {
-              ITREE_FOREACH_ABORT ();
               break;
             }
         }
diff --git a/src/itree.c b/src/itree.c
index eeecaf1839..21549fe8a7 100644
--- a/src/itree.c
+++ b/src/itree.c
@@ -1190,3 +1190,132 @@ interval_tree_subtree_min (const struct interval_tree *tree, struct interval_nod
  * +===================================================================================+ */
 
 /* See Foverlay_tree in buffer.c */
+
+
+/* +===================================================================================+
+ * | Stateless iterator
+ * +===================================================================================+ */
+
+static bool
+interval_tree_iter_traverse_p(struct interval_tree_iterator *iterator,
+                              struct interval_node *node);
+static
+struct interval_node *
+interval_tree_iterator_in_order_successor (struct interval_tree_iterator* iterator);
+
+void
+interval_tree_iterator_init(struct interval_tree_iterator *iterator,
+                            struct interval_tree *tree,
+                            ptrdiff_t begin,
+                            ptrdiff_t end,
+                            enum interval_tree_order order) {
+  iterator->tree = tree;
+  iterator->node = tree && begin <= end ? ITREE_NULL : NULL;
+  iterator->begin = begin;
+  iterator->end = end;
+  iterator->order = order;
+}
+
+struct interval_node *
+interval_tree_iterator_next(struct interval_tree_iterator *iterator) {
+  if (iterator->node) {
+    do {
+      switch (iterator->order) {
+      case ITREE_ASCENDING:
+        iterator->node = interval_tree_iterator_in_order_successor (iterator);
+        break;
+      default:
+        fprintf (stderr, "interval_tree_order != ITREE_ASCENDING not implemented");
+        emacs_abort ();
+      }
+    } while (iterator->node &&
+             ! interval_node_intersects (iterator->node, iterator->begin, iterator->end));
+  }
+
+  if (iterator->node == ITREE_NULL) {
+    fprintf (stderr, "Next node: ITREE_NULL\n");
+  } else if (! iterator->node) {
+    fprintf (stderr, "Next node: NULL\n");
+  } else {
+    fprintf (stderr, "Next node: begin = %ld, end = %ld (iterator: begin = %ld, end = %ld)\n",
+             iterator->node->begin, iterator->node->end, iterator->begin, iterator->end);
+  }
+  return iterator->node;
+}
+
+static bool
+interval_tree_iter_traverse_p(struct interval_tree_iterator *iterator,
+                              struct interval_node *node) {
+  eassert (node);
+
+  if (node == ITREE_NULL) {
+    return false;
+  } else {
+    eassert (node->parent != ITREE_NULL);
+    if (node->parent->left == node) {
+      return iterator->begin <= node->limit + node->offset;
+    } else {
+      return node->parent->begin <= iterator->end;
+    }
+  }
+}
+
+static
+struct interval_node *
+interval_tree_iterator_in_order_successor (struct interval_tree_iterator* iterator)
+{
+  struct interval_node *node = iterator->node;
+
+  if (node != ITREE_NULL) {
+    interval_tree_inherit_offset (iterator->tree, node);
+  }
+
+  if (node == ITREE_NULL) {
+    node = iterator->tree->root;
+    if (node != ITREE_NULL) {
+      interval_tree_inherit_offset (iterator->tree, node);
+    }
+    while (interval_tree_iter_traverse_p(iterator, node->left)) {
+      node = node->left;
+      if (node != ITREE_NULL) {
+        interval_tree_inherit_offset (iterator->tree, node);
+      }
+    }
+  } else if (interval_tree_iter_traverse_p(iterator, node->right))
+    {
+      node = node->right;
+      if (node != ITREE_NULL) {
+        interval_tree_inherit_offset (iterator->tree, node);
+      }
+      while (interval_tree_iter_traverse_p(iterator, node->left)) {
+        node = node->left;
+        if (node != ITREE_NULL) {
+          interval_tree_inherit_offset (iterator->tree, node);
+        }
+      }
+    }
+  else
+    {
+      struct interval_node *parent = node->parent;
+      while (node == parent->right)
+        {
+          node = parent;
+          parent = parent->parent;
+        }
+      if (node != ITREE_NULL)
+        node = parent;
+    }
+
+  return node == ITREE_NULL ? NULL : node;
+}
+
+void
+interval_tree_iterator_narrow (struct interval_tree_iterator *iterator,
+                               ptrdiff_t begin,
+                               ptrdiff_t end)
+{
+  eassert (begin >= iterator->begin);
+  eassert (end <= iterator->end);
+  iterator->begin =  max (begin, iterator->begin);
+  iterator->end =  min (end, iterator->end);
+}
diff --git a/src/itree.h b/src/itree.h
index 1f019a2607..53f03cca35 100644
--- a/src/itree.h
+++ b/src/itree.h
@@ -72,6 +72,15 @@ #define ITREE_NULL (&itree_null)
   ITREE_PRE_ORDER,
 };
 
+struct interval_tree_iterator
+{
+  struct interval_tree *tree;
+  struct interval_node *node;
+  ptrdiff_t begin;
+  ptrdiff_t end;
+  enum interval_tree_order order;
+};
+
 void interval_node_init (struct interval_node *, ptrdiff_t, ptrdiff_t, bool, bool, Lisp_Object);
 ptrdiff_t interval_node_begin (struct interval_tree *, struct interval_node *);
 ptrdiff_t interval_node_end (struct interval_tree *, struct interval_node *);
@@ -135,4 +144,15 @@ #define ITREE_FOREACH_ABORT() \
 #define ITREE_FOREACH_NARROW(beg, end) \
   interval_generator_narrow (itree_iter_, beg, end)
 
+void interval_tree_nodes (struct interval_tree *tree, struct interval_node **nodes, enum interval_tree_order order);
+
+void interval_tree_iterator_init(struct interval_tree_iterator *iterator,
+				 struct interval_tree *tree,
+				 ptrdiff_t begin,
+				 ptrdiff_t end,
+				 enum interval_tree_order order);
+struct interval_node *interval_tree_iterator_next(struct interval_tree_iterator *iterator);
+void interval_tree_iterator_narrow (struct interval_tree_iterator *iterator,
+			       ptrdiff_t begin,
+			       ptrdiff_t end);
 #endif

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Mon, 03 Oct 2022 04:36:02 GMT) Full text and rfc822 format available.

Message #113 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Andreas Politz <mail <at> andreas-politz.de>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 58158 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Mon, 03 Oct 2022 06:35:00 +0200
Andreas Politz <mail <at> andreas-politz.de> writes:

> It seems to work, at least buffer-tests are passing.

"seems to work" is a bit weak.  Can we prove it?

I don't mean mathematically, but by reasoning like I tried in the
comments in successor function I posted,




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Tue, 04 Oct 2022 11:06:04 GMT) Full text and rfc822 format available.

Message #116 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Politz <mail <at> andreas-politz.de>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 58158 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#58158: 29.0.50;
 [overlay] Interval tree iteration considered harmful
Date: Tue, 4 Oct 2022 12:50:54 +0200
My implementation seems to work. The correctness of the algorithm would follow from 

1. The Rb tree algorithm produces a tree with left <= root <= right,
2. the algorithm you’ve posted, and I’ve  adapted, produces an in–order of the tree and
3. the conditions guiding the traversal are correct, i.e. they don’t exclude matching intervals.

Since I believe these statements are true, I believe the algorithm is correct.

Andreas



> Am 03.10.2022 um 06:35 schrieb Gerd Möllmann <gerd.moellmann <at> gmail.com>:
> 
> Andreas Politz <mail <at> andreas-politz.de> writes:
> 
>> It seems to work, at least buffer-tests are passing.
> 
> "seems to work" is a bit weak.  Can we prove it?
> 
> I don't mean mathematically, but by reasoning like I tried in the
> comments in successor function I posted,





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 06 Oct 2022 22:27:02 GMT) Full text and rfc822 format available.

Message #119 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Matt Armstrong <matt <at> rfc20.org>
To: Andreas Politz <mail <at> andreas-politz.de>, Gerd Möllmann
 <gerd.moellmann <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 58158 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Thu, 06 Oct 2022 15:26:37 -0700
Andreas Politz <mail <at> andreas-politz.de> writes:

> I've managed to construct a prototype of this "stateless" iterator.
>
> I've only implemented the in-order case and only applied it in the overlays_in function.
>
> The only real trouble I had was with pushing the offset to the children
> during the tree navigation in some kind of sensible way.  In the end I
> gave up and simply pasted calls to the corresponding function "all over
> the place".  It seems to work, at least buffer-tests are passing.

Hey Andreas, this looks pretty good to me but I have some questions on
it.


> +static
> +struct interval_node *
> +interval_tree_iterator_in_order_successor (struct interval_tree_iterator* iterator)
> +{
> +  struct interval_node *node = iterator->node;
> +
> +  if (node != ITREE_NULL) {
> +    interval_tree_inherit_offset (iterator->tree, node);
> +  }
> +  if (node == ITREE_NULL) {
> +    node = iterator->tree->root;

I don't understand this branch.  If the node is NULL upon entry to the
successor call, why start at the root?  Why is the "next in order node"
of NULL the root?  The root is just an node whose BEG is roughly in the
middle of the total inorder traversal, right?

The "stateless" inorder traversal algorithm I am used to starts with the
minimum node (walk only left links down from the root until at a leaf
and that is the minimum).  Then do inorder traversal from there.


> +    if (node != ITREE_NULL) {
> +      interval_tree_inherit_offset (iterator->tree, node);
> +    }
> +    while (interval_tree_iter_traverse_p(iterator, node->left)) {
> +      node = node->left;
> +      interval_tree_inherit_offset (iterator->tree, node);
> +    }
> +  } else if (interval_tree_iter_traverse_p(iterator, node->right))
> +    {
> +      node = node->right;
> +      interval_tree_inherit_offset (iterator->tree, node);
> +      while (interval_tree_iter_traverse_p(iterator, node->left)) {
> +        node = node->left;
> +        interval_tree_inherit_offset (iterator->tree, node);
> +      }
> +    }
> +  else
> +    {
> +      struct interval_node *parent = node->parent;
> +      while (node == parent->right)
> +        {
> +          node = parent;
> +          parent = parent->parent;
> +        }
> +      if (node != ITREE_NULL)
> +        node = parent;
> +    }
> +
> +  return node == ITREE_NULL ? NULL : node;
> +}

I don't understand how the last "else" case works correctly in the face
of a call to `interval_generator_narrow` during iteration.  Narrowing
could render a node's parent out of the "interesting" range, and so the
iterator should not return it but instead keep trying to find a
successor in range.  Right?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Thu, 06 Oct 2022 22:37:01 GMT) Full text and rfc822 format available.

Message #122 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>, Gerd Möllmann
 <gerd.moellmann <at> gmail.com>
Cc: 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration considered
 harmful
Date: Fri, 7 Oct 2022 01:36:17 +0300
On 30.09.2022 09:11, Eli Zaretskii wrote:
> "M-x xref-query-replace-in-results RET"

JFYI this command has the convenient binding 'r' in Xref output buffers.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Fri, 07 Oct 2022 19:48:02 GMT) Full text and rfc822 format available.

Message #125 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: gerd.moellmann <at> gmail.com, 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration considered
 harmful
Date: Fri, 07 Oct 2022 22:47:29 +0300
> Date: Fri, 7 Oct 2022 01:36:17 +0300
> Cc: 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> 
> On 30.09.2022 09:11, Eli Zaretskii wrote:
> > "M-x xref-query-replace-in-results RET"
> 
> JFYI this command has the convenient binding 'r' in Xref output buffers.

It also works only when invoked from the Xref buffer, right?

Btw, what am I doing wrong below?

 emacs -Q
 C-x C-f src/character.h RET
 M-x visit-tags-table RET RET
 M-. char_string RET
 r whatever RET

Unexpected result: "No suitable matches here".  Huh? what did I miss?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Sat, 08 Oct 2022 18:51:01 GMT) Full text and rfc822 format available.

Message #128 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: gerd.moellmann <at> gmail.com, 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration considered
 harmful
Date: Sat, 8 Oct 2022 21:50:39 +0300
On 07.10.2022 22:47, Eli Zaretskii wrote:
>> Date: Fri, 7 Oct 2022 01:36:17 +0300
>> Cc: 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
>> From: Dmitry Gutov <dgutov <at> yandex.ru>
>>
>> On 30.09.2022 09:11, Eli Zaretskii wrote:
>>> "M-x xref-query-replace-in-results RET"
>>
>> JFYI this command has the convenient binding 'r' in Xref output buffers.
> 
> It also works only when invoked from the Xref buffer, right?

Yep.

But we also have commands like xref-find-references-and-replace and 
dired-do-find-regexp-and-replace.

> Btw, what am I doing wrong below?
> 
>   emacs -Q
>   C-x C-f src/character.h RET
>   M-x visit-tags-table RET RET
>   M-. char_string RET
>   r whatever RET
> 
> Unexpected result: "No suitable matches here".  Huh? what did I miss?

We can't replace over "find definition" matches: they are more abstract 
and don't contain the necessary information to perform the replacement 
(such as the length of a match, for instance).

And such xrefs might navigate you to the beginning of the line, for 
example, rather than to the beginning of the name.

But that makes sense, doesn't it? If replacing over "find definitions" 
results worked fine, in the end you would get a codebase where all 
declarations of a method 'foo' got renamed, but all callsites of it 
remain unchanged. That couldn't have been your intention, could it?

The error message could use some improvement, I suppose, but I'm not 
sure how to make it better.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Mon, 10 Oct 2022 08:11:02 GMT) Full text and rfc822 format available.

Message #131 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: gerd.moellmann <at> gmail.com, 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration considered
 harmful
Date: Mon, 10 Oct 2022 11:10:39 +0300
> Date: Sat, 8 Oct 2022 21:50:39 +0300
> Cc: gerd.moellmann <at> gmail.com, 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> 
> > Btw, what am I doing wrong below?
> > 
> >   emacs -Q
> >   C-x C-f src/character.h RET
> >   M-x visit-tags-table RET RET
> >   M-. char_string RET
> >   r whatever RET
> > 
> > Unexpected result: "No suitable matches here".  Huh? what did I miss?
> 
> We can't replace over "find definition" matches: they are more abstract 
> and don't contain the necessary information to perform the replacement 
> (such as the length of a match, for instance).
> 
> And such xrefs might navigate you to the beginning of the line, for 
> example, rather than to the beginning of the name.
> 
> But that makes sense, doesn't it? If replacing over "find definitions" 
> results worked fine, in the end you would get a codebase where all 
> declarations of a method 'foo' got renamed, but all callsites of it 
> remain unchanged. That couldn't have been your intention, could it?
> 
> The error message could use some improvement, I suppose, but I'm not 
> sure how to make it better.

I tried to improve the situation, both in the error message, the doc
string, and the manual.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Tue, 11 Oct 2022 02:13:02 GMT) Full text and rfc822 format available.

Message #134 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: gerd.moellmann <at> gmail.com, 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration considered
 harmful
Date: Tue, 11 Oct 2022 05:12:11 +0300
On 10.10.2022 11:10, Eli Zaretskii wrote:
>> Date: Sat, 8 Oct 2022 21:50:39 +0300
>> Cc:gerd.moellmann <at> gmail.com,58158 <at> debbugs.gnu.org,monnier <at> iro.umontreal.ca
>> From: Dmitry Gutov<dgutov <at> yandex.ru>
>>
>>> Btw, what am I doing wrong below?
>>>
>>>    emacs -Q
>>>    C-x C-f src/character.h RET
>>>    M-x visit-tags-table RET RET
>>>    M-. char_string RET
>>>    r whatever RET
>>>
>>> Unexpected result: "No suitable matches here".  Huh? what did I miss?
>> We can't replace over "find definition" matches: they are more abstract
>> and don't contain the necessary information to perform the replacement
>> (such as the length of a match, for instance).
>>
>> And such xrefs might navigate you to the beginning of the line, for
>> example, rather than to the beginning of the name.
>>
>> But that makes sense, doesn't it? If replacing over "find definitions"
>> results worked fine, in the end you would get a codebase where all
>> declarations of a method 'foo' got renamed, but all callsites of it
>> remain unchanged. That couldn't have been your intention, could it?
>>
>> The error message could use some improvement, I suppose, but I'm not
>> sure how to make it better.
> I tried to improve the situation, both in the error message, the doc
> string, and the manual.

It's probably better now, but still likely confusing.

What is a "subset of matches"? The user makes a search, gets a bunch of 
matches. Is it fair to call them "only a subset" is the user only 
searched for definitions?

Or to take another example, the user can search for some regexp inside a 
particular directly, with dired-do-find-regexp. Imagine that that 
directory is not the whole project (perhaps just a minor subdirectory). 
Would it be fair to call the results "only a subset" then? But 
xref-query-replace-in-results will work in that case, because the search 
returns values which have enough info to perform the replacements.

Perhaps we should make the error very specific, like "you can't replace 
inside xref-find-definitions results". Since that is going to be the 
exception in like 99.9% of the cases.

It's possible for other backend commands (such as find-references) to 
return unsuitable xref values in third-party backends, or for other code 
to use xref-show-xrefs with such, but those will be really very rare, if 
happen at all.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Tue, 11 Oct 2022 06:38:02 GMT) Full text and rfc822 format available.

Message #137 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: gerd.moellmann <at> gmail.com, 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration considered
 harmful
Date: Tue, 11 Oct 2022 09:37:38 +0300
> Date: Tue, 11 Oct 2022 05:12:11 +0300
> Cc: gerd.moellmann <at> gmail.com, 58158 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> 
> What is a "subset of matches"?

Feel free to suggest a less vague description.  The idea is that the
list in Xref buffer doesn't show all the references to the identifier,
making renaming infeasible.

> Perhaps we should make the error very specific, like "you can't replace 
> inside xref-find-definitions results". Since that is going to be the 
> exception in like 99.9% of the cases.

That'd be my preference, but what are those 0.1% of cases where the
Xref buffer produced by other commands could fit?

More generally, what exactly does xref.el test to produce the error
message, and how to describe that in user-level terms?

> It's possible for other backend commands (such as find-references) to 
> return unsuitable xref values in third-party backends, or for other code 
> to use xref-show-xrefs with such, but those will be really very rare, if 
> happen at all.

You are saying that 'r' is only useful after M-?, is that right?  The
manual says so, but the manual doesn't have to say "the whole truth".
The doc string should.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58158; Package emacs. (Fri, 06 Oct 2023 13:16:01 GMT) Full text and rfc822 format available.

Message #140 received at 58158 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: 58158 <at> debbugs.gnu.org
Subject: Re: bug#58158: 29.0.50; [overlay] Interval tree iteration
 considered harmful
Date: Fri, 06 Oct 2023 15:14:42 +0200
Gerd Möllmann <gerd.moellmann <at> gmail.com> writes:

> In its current form, interval tree iteration works like this:
>
> 1. Call begin_iteration(T) to iterate over tree T
> 2. do stuff
> 3. Call end_iteration(T)
>
> with the following rules:
>
> - Begin_iteration and end_iteration must be paired.
>
> - There can be only one iteration per tree at a time.  Nested iteration
>   over the same tree is not supported (abort).
>
> - No GC may happen in step 2.  This is because mark_buffer iterates over
>   buffer overlays.
>
> I think this is an exceedingly dangerous design.

Time has passed, and I think this is no longer relevant.
I'm therefore closing this bug.




bug marked as fixed in version 30.1, send any further explanations to 58158 <at> debbugs.gnu.org and Gerd Möllmann <gerd.moellmann <at> gmail.com> Request was from Gerd Möllmann <gerd.moellmann <at> gmail.com> to control <at> debbugs.gnu.org. (Fri, 06 Oct 2023 13:16:01 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 04 Nov 2023 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 145 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.