GNU bug report logs - #41520
28.0.50; Crash in character.h due to assertion error

Previous Next

Package: emacs;

Reported by: Stefan Kangas <stefan <at> marxist.se>

Date: Mon, 25 May 2020 07:06:01 UTC

Severity: normal

Found in version 28.0.50

Fixed in version 28.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 41520 in the body.
You can then email your comments to 41520 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#41520; Package emacs. (Mon, 25 May 2020 07:06:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stefan Kangas <stefan <at> marxist.se>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 25 May 2020 07:06:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: bug-gnu-emacs <at> gnu.org
Subject: 28.0.50; Crash in character.h due to assertion error
Date: Mon, 25 May 2020 00:05:21 -0700
[Message part 1 (text/plain, inline)]
When editing an org-mode document, I saw a crash due to this assertion
error:

character.h:379: Emacs fatal error: assertion failed: 0xC0 <= c

The document contains utf-8 characters, but nothing more fancy than
basic Swedish characters (åäö) as far as I know.

I'm out of my depth when it comes to debugging this, but I do have the
gdb session still running if you need more information.

I'm using an Emacs recently built from master, see below for the exact
details.
[Message part 2 (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41520; Package emacs. (Mon, 25 May 2020 07:30:03 GMT) Full text and rfc822 format available.

Message #8 received at 41520 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: 41520 <at> debbugs.gnu.org
Subject: Re: bug#41520: 28.0.50; Crash in character.h due to assertion error
Date: Mon, 25 May 2020 07:28:23 +0000
On Mon, May 25, 2020 at 7:06 AM Stefan Kangas <stefan <at> marxist.se> wrote:
> When editing an org-mode document, I saw a crash due to this assertion
> error:

It's a bug in this code in xdisp.c:

  else if (it->bidi_it.charpos == bob
       || (!string_p
           && (FETCH_CHAR (it->bidi_it.bytepos - 1) == '\n'
           || FETCH_CHAR (it->bidi_it.bytepos) == '\n')))

The first FETCH_CHAR should be a FETCH_BYTE to avoid the assertion error.

There's at least one other place that has the same error, so I'll grep
some more before sending a patch.

My suggestion is to drop the "eassume" on emacs-27, and fix FETCH_CHAR
to FETCH_BYTE on master.

(I'd like to reiterate my proposal to use a pos_t for bytepos/charpos
pairs, which would catch or silently fix (which happened in this case
on my pos_t branch) such bugs. The code on my branch reads:

  else if (POS_CHAR_EQUAL (it->bidi_it.pos, bob)
       || (!string_p
           && (FETCH_CHAR (dec_pos (it->bidi_it.pos)) == '\n'
           || FETCH_CHAR (it->bidi_it.pos) == '\n')))

which, while minimally slower, doesn't throw assertion errors.)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41520; Package emacs. (Mon, 25 May 2020 07:43:02 GMT) Full text and rfc822 format available.

Message #11 received at 41520 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Stefan Kangas <stefan <at> marxist.se>
Cc: 41520 <at> debbugs.gnu.org
Subject: Re: bug#41520: 28.0.50; Crash in character.h due to assertion error
Date: Mon, 25 May 2020 07:41:54 +0000
[Message part 1 (text/plain, inline)]
On Mon, May 25, 2020 at 7:28 AM Pip Cet <pipcet <at> gmail.com> wrote:
> On Mon, May 25, 2020 at 7:06 AM Stefan Kangas <stefan <at> marxist.se> wrote:
> There's at least one other place that has the same error, so I'll grep
> some more before sending a patch.

Patch attached (it compiles, but I haven't tested it thoroughly). This
should fix two bugs in xdisp.c and a bug in Fend_of_line.

All of these bugs were silently fixed by my pos_t conversion.

At the very least, FETCH_CHAR should be renamed so it's clear that the
right way to use it is FETCH_CHAR (bytepos), not FETCH_CHAR (charpos).
I suggest FETCH_CHAR_AT_BYTEPOS.
[0001-Fix-assertion-errors-caused-by-bytepos-charpos-confu.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41520; Package emacs. (Mon, 25 May 2020 14:19:01 GMT) Full text and rfc822 format available.

Message #14 received at 41520 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 41520 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#41520: 28.0.50; Crash in character.h due to assertion error
Date: Mon, 25 May 2020 17:18:25 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Mon, 25 May 2020 07:28:23 +0000
> Cc: 41520 <at> debbugs.gnu.org
> 
> On Mon, May 25, 2020 at 7:06 AM Stefan Kangas <stefan <at> marxist.se> wrote:
> > When editing an org-mode document, I saw a crash due to this assertion
> > error:
> 
> It's a bug in this code in xdisp.c:
> 
>   else if (it->bidi_it.charpos == bob
>        || (!string_p
>            && (FETCH_CHAR (it->bidi_it.bytepos - 1) == '\n'
>            || FETCH_CHAR (it->bidi_it.bytepos) == '\n')))

Ouch!

> The first FETCH_CHAR should be a FETCH_BYTE to avoid the assertion error.
> 
> There's at least one other place that has the same error, so I'll grep
> some more before sending a patch.

Thanks.

> My suggestion is to drop the "eassume" on emacs-27, and fix FETCH_CHAR
> to FETCH_BYTE on master.

There's no eassume on emacs-27, this is new on master.  That is why
these problems were never exposed before: the old versions of macros
didn't signal any errors in these cases, they just produced some wrong
values, which can never be equal to a newline.

So I installed on emacs-27 branch a patch very similar to what you
sent, except that it uses FETCH_BYTE in all cases where we compare to
a newline: this is both more efficient and more correct.

> (I'd like to reiterate my proposal to use a pos_t for bytepos/charpos
> pairs, which would catch or silently fix (which happened in this case
> on my pos_t branch) such bugs. The code on my branch reads:
> 
>   else if (POS_CHAR_EQUAL (it->bidi_it.pos, bob)
>        || (!string_p
>            && (FETCH_CHAR (dec_pos (it->bidi_it.pos)) == '\n'
>            || FETCH_CHAR (it->bidi_it.pos) == '\n')))
> 
> which, while minimally slower, doesn't throw assertion errors.)

That would require us to maintain both character and byte positions
where we use these macros, something that could be redundant
overhead.  Moreover, I think we prefer having assertions in the debug
builds rather then silent fixups (and in production eassume compiles
into something that doesn't abort).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41520; Package emacs. (Mon, 25 May 2020 14:32:01 GMT) Full text and rfc822 format available.

Message #17 received at 41520 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41520 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#41520: 28.0.50; Crash in character.h due to assertion error
Date: Mon, 25 May 2020 14:30:55 +0000
[Message part 1 (text/plain, inline)]
On Mon, May 25, 2020 at 2:18 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Mon, 25 May 2020 07:28:23 +0000
> > Cc: 41520 <at> debbugs.gnu.org
> > My suggestion is to drop the "eassume" on emacs-27, and fix FETCH_CHAR
> > to FETCH_BYTE on master.
>
> There's no eassume on emacs-27, this is new on master.

Yeah, I realized after hitting "Send" that I'd looked at the wrong tree :-)

> So I installed on emacs-27 branch a patch very similar to what you
> sent, except that it uses FETCH_BYTE in all cases where we compare to
> a newline: this is both more efficient and more correct.

I'm attaching a patch with a few more cases. I don't have a strong
preference for either FETCH_BYTE or FETCH_CHAR where both will do, but
we should be consistent.

> > (I'd like to reiterate my proposal to use a pos_t for bytepos/charpos
> > pairs, which would catch or silently fix (which happened in this case
> > on my pos_t branch) such bugs. The code on my branch reads:
> >
> >   else if (POS_CHAR_EQUAL (it->bidi_it.pos, bob)
> >        || (!string_p
> >            && (FETCH_CHAR (dec_pos (it->bidi_it.pos)) == '\n'
> >            || FETCH_CHAR (it->bidi_it.pos) == '\n')))
> >
> > which, while minimally slower, doesn't throw assertion errors.)
>
> That would require us to maintain both character and byte positions
> where we use these macros,

For now, I'd like to introduce them in Emacs proper only where we have
pairs of variables anyway.

> something that could be redundant
> overhead.

It would be redundant to use a pos_t where a ptrdiff_t suffices, yes.
I'm not proposing to do that at present, though I think there are some
places that do charpos/bytepos conversions unnecessarily because only
one of them is passed down the call chain. So, no, it wouldn't be
redundant overhead, and it might actually make some code paths faster.

> Moreover, I think we prefer having assertions in the debug
> builds rather then silent fixups (and in production eassume compiles
> into something that doesn't abort).

I see no way to have assertions, and I think free bug fixes are better
than undiscovered bugs. It's really hard to get things wrong with a
stronger type.
[0001-Fix-more-single-byte-accesses.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41520; Package emacs. (Mon, 25 May 2020 15:08:01 GMT) Full text and rfc822 format available.

Message #20 received at 41520 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 41520 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#41520: 28.0.50; Crash in character.h due to assertion error
Date: Mon, 25 May 2020 18:07:07 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Mon, 25 May 2020 14:30:55 +0000
> Cc: stefan <at> marxist.se, 41520 <at> debbugs.gnu.org
> 
> I'm attaching a patch with a few more cases. I don't have a strong
> preference for either FETCH_BYTE or FETCH_CHAR where both will do, but
> we should be consistent.

I'm okay with those additional changes, but let's install them on
master, as they are a cleanup, not a bug.

> > That would require us to maintain both character and byte positions
> > where we use these macros,
> 
> For now, I'd like to introduce them in Emacs proper only where we have
> pairs of variables anyway.

Maybe a new macro that accepts a 'struct text_pos'?

But wouldn't it be strange to see a macro that accepts a struct, but
only uses one member of that struct?

> > Moreover, I think we prefer having assertions in the debug
> > builds rather then silent fixups (and in production eassume compiles
> > into something that doesn't abort).
> 
> I see no way to have assertions

I mean we already have assertions: that's what eassume does in a debug
build.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41520; Package emacs. (Mon, 25 May 2020 15:17:01 GMT) Full text and rfc822 format available.

Message #23 received at 41520 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41520 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#41520: 28.0.50; Crash in character.h due to assertion error
Date: Mon, 25 May 2020 15:16:09 +0000
On Mon, May 25, 2020 at 3:07 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Mon, 25 May 2020 14:30:55 +0000
> > Cc: stefan <at> marxist.se, 41520 <at> debbugs.gnu.org
> >
> > I'm attaching a patch with a few more cases. I don't have a strong
> > preference for either FETCH_BYTE or FETCH_CHAR where both will do, but
> > we should be consistent.
>
> I'm okay with those additional changes, but let's install them on
> master, as they are a cleanup, not a bug.

Sure.

> > > That would require us to maintain both character and byte positions
> > > where we use these macros,
> >
> > For now, I'd like to introduce them in Emacs proper only where we have
> > pairs of variables anyway.
>
> Maybe a new macro that accepts a 'struct text_pos'?

Yes, a pos_t would look like a struct text_pos, at least in non-debug builds.

> But wouldn't it be strange to see a macro that accepts a struct, but
> only uses one member of that struct?

I don't think so. CHARPOS and BYTEPOS already exist, and that's
precisely what they do.

What is a little strange is that the ancient convention of not
returning struct types is still followed in much of Emacs.

> > > Moreover, I think we prefer having assertions in the debug
> > > builds rather then silent fixups (and in production eassume compiles
> > > into something that doesn't abort).
> >
> > I see no way to have assertions
>
> I mean we already have assertions: that's what eassume does in a debug
> build.

Yes, but we could do with some stricter checking, I think.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41520; Package emacs. (Mon, 25 May 2020 16:10:02 GMT) Full text and rfc822 format available.

Message #26 received at 41520 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 41520 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#41520: 28.0.50; Crash in character.h due to assertion error
Date: Mon, 25 May 2020 19:09:25 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Mon, 25 May 2020 15:16:09 +0000
> Cc: stefan <at> marxist.se, 41520 <at> debbugs.gnu.org
> 
> > But wouldn't it be strange to see a macro that accepts a struct, but
> > only uses one member of that struct?
> 
> I don't think so. CHARPOS and BYTEPOS already exist, and that's
> precisely what they do.
> 
> What is a little strange is that the ancient convention of not
> returning struct types is still followed in much of Emacs.

It's more expensive.  That's what I meant when I said "strange": why
would we fill 2 fields of a struct, but use only one?

> > I mean we already have assertions: that's what eassume does in a debug
> > build.
> 
> Yes, but we could do with some stricter checking, I think.

It cannot catch the cases where we put a character position into the
byte position slot.  That's the general problem with using simple
scalars.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41520; Package emacs. (Mon, 25 May 2020 17:55:02 GMT) Full text and rfc822 format available.

Message #29 received at 41520 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41520 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#41520: 28.0.50; Crash in character.h due to assertion error
Date: Mon, 25 May 2020 17:54:01 +0000
On Mon, May 25, 2020 at 4:09 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Mon, 25 May 2020 15:16:09 +0000
> > Cc: stefan <at> marxist.se, 41520 <at> debbugs.gnu.org
> >
> > > But wouldn't it be strange to see a macro that accepts a struct, but
> > > only uses one member of that struct?
> >
> > I don't think so. CHARPOS and BYTEPOS already exist, and that's
> > precisely what they do.
> >
> > What is a little strange is that the ancient convention of not
> > returning struct types is still followed in much of Emacs.
>
> It's more expensive.

Only for very large structs, or on old architectures.

> That's what I meant when I said "strange": why
> would we fill 2 fields of a struct, but use only one?

As I said, I'm not talking about cases in which one variable suffices.
It's those cases where we have:

ptrdiff_t charpos;
ptrdiff_t bytepos;

(not usually named like that, or indeed consistently).

My suggestion is we use

pos_t pos;

and then pos.charpos and pos.bytepos as appropriate.

> > > I mean we already have assertions: that's what eassume does in a debug
> > > build.
> >
> > Yes, but we could do with some stricter checking, I think.
>
> It cannot catch the cases where we put a character position into the
> byte position slot.  That's the general problem with using simple
> scalars.

Incorrect code becomes way more obvious.

bytepos = PT;

is incorrect but shorter than the correct version.

pos.bytepos = PT_POS.charpos;
pos.charpos = PT_POS.bytepos;

is much more obviously wrong, and the correct version is simply:

pos =  PT_POS;

On non-obsolescent architectures, returning a two-word struct is
cheaper than accessing two parameters through pointers, too; and
that's only relevant for those few cases in which the function isn't
inlined, anyway.

All I'm hoping for, at this point, is a "maybe, show me a patch".




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41520; Package emacs. (Mon, 25 May 2020 19:31:02 GMT) Full text and rfc822 format available.

Message #32 received at 41520 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 41520 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#41520: 28.0.50; Crash in character.h due to assertion error
Date: Mon, 25 May 2020 22:30:13 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Mon, 25 May 2020 17:54:01 +0000
> Cc: stefan <at> marxist.se, 41520 <at> debbugs.gnu.org
> 
> All I'm hoping for, at this point, is a "maybe, show me a patch".

I don't know what to say, since it sounds like the "appetite comes
with eating".  We never talked about PT_POS before.  Is the plan to
make any popular position a struct?  Like GPT, for example?  What
about BEGV and ZV?

IOW, I don't understand the goal here.  I think I did understand when
we were talking about accessing characters by buffer positions, and
the bugs related to incorrect usage there, but now it sounds like the
plot thickens?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41520; Package emacs. (Mon, 25 May 2020 20:40:01 GMT) Full text and rfc822 format available.

Message #35 received at 41520 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41520 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#41520: 28.0.50; Crash in character.h due to assertion error
Date: Mon, 25 May 2020 20:39:06 +0000
On Mon, May 25, 2020 at 7:30 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Mon, 25 May 2020 17:54:01 +0000
> > Cc: stefan <at> marxist.se, 41520 <at> debbugs.gnu.org
> >
> > All I'm hoping for, at this point, is a "maybe, show me a patch".
>
> I don't know what to say, since it sounds like the "appetite comes
> with eating".  We never talked about PT_POS before.

You're correct. Sorry about that.

> Is the plan to make any popular position a struct?

The plan is to introduce additional struct-valued macros for things
like PT/PT_BYTE:

#define PT_POS POS (PT, PT_BYTE)

In particular, it's not an lvalue. That's important to me, since
assigning to PT_POS would be a severe bug.

> Like GPT, for example?

That's difficult. GPT is, of course, very special. I still think the
code is slightly more readable with an additional GPT_POS macro being
used where appropriate. There's a particular problem because using GPT
very often means modifying it, so maybe GPT_POS should be an lval.

> What about BEGV and ZV?

BEG, BEGV, ZV, and Z would all have _POS equivalents, and very often
using them results in more readable code.

> IOW, I don't understand the goal here.

There are multiple goals: I think this significantly aids readability,
and I think there might still be some minor bugs to catch, and future
bugs to avoid.

For debug builds only, it might make sense to include the object that
the bytepos-charpos relation is valid for, to catch cases where one
object's correspondence is used for another object.

> I think I did understand when we were talking about accessing characters by buffer positions, and
> the bugs related to incorrect usage there, but now it sounds like the plot thickens?

I hope that most code will follow a basic structure of being passed a
Lisp_Object or two (charpos/marker and object), converting that to a
pos_t, handing that to internal APIs, potentially receiving a pos_t
back and converting it back to a Lisp_Object, with only a few lines of
code deep down the call stacks actually unwrapping the pos_t and
manipulating it directly. That means there are a few more cases than
accessing buffer text: comparing two positions, for example, walking a
buffer or string by incrementing or decrementing them, adding two
positions or subtracting them.

(It's true that all kinds of crazy experiments would be easier with
code that follows this structure, but that's a side effect: the goal
really is to increase readability a little in a lot of places.)

Take, for example, this code:

    ch = bidi_fetch_char (cpos += nc, bpos += clen, &disp_pos, &dpp, &bs,
                  bidi_it->w, fwp, &clen, &nc)

"nc" and "clen" belong together, and so do cpos and bpos. I find the
names don't make that very obvious, and simply reducing the number of
arguments bidi_fetch_char takes by two helps a little.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41520; Package emacs. (Tue, 26 May 2020 16:18:02 GMT) Full text and rfc822 format available.

Message #38 received at 41520 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 41520 <at> debbugs.gnu.org, stefan <at> marxist.se
Subject: Re: bug#41520: 28.0.50; Crash in character.h due to assertion error
Date: Tue, 26 May 2020 19:17:15 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Mon, 25 May 2020 20:39:06 +0000
> Cc: stefan <at> marxist.se, 41520 <at> debbugs.gnu.org
> 
> The plan is to introduce additional struct-valued macros for things
> like PT/PT_BYTE:
> 
> #define PT_POS POS (PT, PT_BYTE)
> 
> In particular, it's not an lvalue. That's important to me, since
> assigning to PT_POS would be a severe bug.

So all the places where we now access PT and PT_BYTE separately will
now dereference a struct?

(Btw, PT and PT_BYTE are already non-lvalues, so we have this
covered.)

> > Like GPT, for example?
> 
> That's difficult. GPT is, of course, very special.

How so?  It's just like any other buffer position.

> > What about BEGV and ZV?
> 
> BEG, BEGV, ZV, and Z would all have _POS equivalents, and very often
> using them results in more readable code.
> 
> > IOW, I don't understand the goal here.
> 
> There are multiple goals: I think this significantly aids readability,
> and I think there might still be some minor bugs to catch, and future
> bugs to avoid.
> 
> For debug builds only, it might make sense to include the object that
> the bytepos-charpos relation is valid for, to catch cases where one
> object's correspondence is used for another object.
> 
> > I think I did understand when we were talking about accessing characters by buffer positions, and
> > the bugs related to incorrect usage there, but now it sounds like the plot thickens?
> 
> I hope that most code will follow a basic structure of being passed a
> Lisp_Object or two (charpos/marker and object), converting that to a
> pos_t, handing that to internal APIs, potentially receiving a pos_t
> back and converting it back to a Lisp_Object, with only a few lines of
> code deep down the call stacks actually unwrapping the pos_t and
> manipulating it directly. That means there are a few more cases than
> accessing buffer text: comparing two positions, for example, walking a
> buffer or string by incrementing or decrementing them, adding two
> positions or subtracting them.
> 
> (It's true that all kinds of crazy experiments would be easier with
> code that follows this structure, but that's a side effect: the goal
> really is to increase readability a little in a lot of places.)

I cannot judge readability: I'm too accustomed to the current
variables.  But it's clear that this will hurt speed, and in the
innermost loops of our code.  Having to maintain 2 values, recompute
one from another, and move them into and out of a structure each adds
overhead, some small, some large.  They will add up.  I don't think I
see how we can justify that, as the current code is not horribly
unreadable.

Let's see what others think.

>     ch = bidi_fetch_char (cpos += nc, bpos += clen, &disp_pos, &dpp, &bs,
>                   bidi_it->w, fwp, &clen, &nc)
> 
> "nc" and "clen" belong together, and so do cpos and bpos. I find the
> names don't make that very obvious, and simply reducing the number of
> arguments bidi_fetch_char takes by two helps a little.

We can use more descriptive names, that's easy and has zero overhead.
Converting all of our sources to using a struct as positions is
something different.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41520; Package emacs. (Sun, 27 Sep 2020 14:37:01 GMT) Full text and rfc822 format available.

Message #41 received at 41520 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41520 <at> debbugs.gnu.org, stefan <at> marxist.se, Pip Cet <pipcet <at> gmail.com>
Subject: Re: bug#41520: 28.0.50; Crash in character.h due to assertion error
Date: Sun, 27 Sep 2020 16:36:21 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> I'm attaching a patch with a few more cases. I don't have a strong
>> preference for either FETCH_BYTE or FETCH_CHAR where both will do, but
>> we should be consistent.
>
> I'm okay with those additional changes, but let's install them on
> master, as they are a cleanup, not a bug.

Pip's patch from May no longer applies cleanly, so I've respun it for
the trunk now.

Does this still look OK?

diff --git a/src/cmds.c b/src/cmds.c
index 90526612b7..c29cf00dad 100644
--- a/src/cmds.c
+++ b/src/cmds.c
@@ -390,7 +390,7 @@ internal_self_insert (int c, EMACS_INT n)
 		     by spaces so that the remaining text won't move.  */
 		  ptrdiff_t actual = PT_BYTE;
 		  actual -= prev_char_len (actual);
-		  if (FETCH_CHAR (actual) == '\t')
+		  if (FETCH_BYTE (actual) == '\t')
 		    /* Rather than add spaces, let's just keep the tab. */
 		    chars_to_delete--;
 		  else
diff --git a/src/syntax.c b/src/syntax.c
index 066972e6d8..cbef61dfbe 100644
--- a/src/syntax.c
+++ b/src/syntax.c
@@ -731,7 +731,6 @@ back_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
     {
       rarely_quit (++quit_count);
 
-      ptrdiff_t temp_byte;
       int prev_syntax;
       bool com2start, com2end, comstart;
 
@@ -882,8 +881,7 @@ back_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
 	  if (open_paren_in_column_0_is_defun_start
               && NILP (Vcomment_use_syntax_ppss)
 	      && (from == stop
-		  || (temp_byte = dec_bytepos (from_byte),
-		      FETCH_CHAR (temp_byte) == '\n')))
+		  || (FETCH_BYTE (from_byte - 1) == '\n')))
 	    {
 	      defun_start = from;
 	      defun_start_byte = from_byte;
diff --git a/src/xdisp.c b/src/xdisp.c
index 3d40878be6..ecd23e0d0f 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -10619,7 +10619,7 @@ DEFUN ("window-text-pixel-size", Fwindow_text_pixel_size, Swindow_text_pixel_siz
       while (bpos > BEGV_BYTE)
 	{
 	  dec_both (&start, &bpos);
-	  c = FETCH_CHAR (bpos);
+	  c = FETCH_BYTE (bpos);
 	  if (!(c == ' ' || c == '\t'))
 	    break;
 	}
@@ -10641,7 +10641,7 @@ DEFUN ("window-text-pixel-size", Fwindow_text_pixel_size, Swindow_text_pixel_siz
       while (bpos > BEGV_BYTE)
 	{
 	  dec_both (&end, &bpos);
-	  c = FETCH_CHAR (bpos);
+	  c = FETCH_BYTE (bpos);
 	  if (!(c == ' ' || c == '\t' || c == '\n' || c == '\r'))
 	    break;
 	}
@@ -22277,7 +22277,7 @@ trailing_whitespace_p (ptrdiff_t charpos)
   int c = 0;
 
   while (bytepos < ZV_BYTE
-	 && (c = FETCH_CHAR (bytepos),
+	 && (c = FETCH_BYTE (bytepos),
 	     c == ' ' || c == '\t'))
     ++bytepos;
 


-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41520; Package emacs. (Sun, 27 Sep 2020 15:17:01 GMT) Full text and rfc822 format available.

Message #44 received at 41520 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 41520 <at> debbugs.gnu.org, stefan <at> marxist.se, pipcet <at> gmail.com
Subject: Re: bug#41520: 28.0.50; Crash in character.h due to assertion error
Date: Sun, 27 Sep 2020 18:15:53 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: Pip Cet <pipcet <at> gmail.com>,  41520 <at> debbugs.gnu.org,  stefan <at> marxist.se
> Date: Sun, 27 Sep 2020 16:36:21 +0200
> 
> > I'm okay with those additional changes, but let's install them on
> > master, as they are a cleanup, not a bug.
> 
> Pip's patch from May no longer applies cleanly, so I've respun it for
> the trunk now.
> 
> Does this still look OK?

Almost.  I'd rather skip this part:

> @@ -882,8 +881,7 @@ back_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop,
>  	  if (open_paren_in_column_0_is_defun_start
>                && NILP (Vcomment_use_syntax_ppss)
>  	      && (from == stop
> -		  || (temp_byte = dec_bytepos (from_byte),
> -		      FETCH_CHAR (temp_byte) == '\n')))
> +		  || (FETCH_BYTE (from_byte - 1) == '\n')))

It might be that I'm being paranoid here, but I don't like calling
FETCH_BYTE when we are potentially in the middle of a multibyte
sequence or near the gap.

The rest should be fine, thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41520; Package emacs. (Sun, 27 Sep 2020 15:43:02 GMT) Full text and rfc822 format available.

Message #47 received at 41520 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41520 <at> debbugs.gnu.org, stefan <at> marxist.se, pipcet <at> gmail.com
Subject: Re: bug#41520: 28.0.50; Crash in character.h due to assertion error
Date: Sun, 27 Sep 2020 17:42:27 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> Almost.  I'd rather skip this part:

OK; applied to the trunk now.

There was then some followup discussion about renaming some of these
macros, since they have names that sometimes lead to some confusion, but
I think it's perhaps too much churn for too little gain.

So I think everything is fixed here now?  The reported crash had a fix
in bidi.c at the time, so this latest commit was just a mop-up of other
confusions.  So I'm closing this bug report.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug marked as fixed in version 28.1, send any further explanations to 41520 <at> debbugs.gnu.org and Stefan Kangas <stefan <at> marxist.se> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Sun, 27 Sep 2020 15:43:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41520; Package emacs. (Sun, 27 Sep 2020 16:02:02 GMT) Full text and rfc822 format available.

Message #52 received at 41520 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 41520 <at> debbugs.gnu.org, stefan <at> marxist.se, pipcet <at> gmail.com
Subject: Re: bug#41520: 28.0.50; Crash in character.h due to assertion error
Date: Sun, 27 Sep 2020 19:00:44 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: 41520 <at> debbugs.gnu.org,  stefan <at> marxist.se,  pipcet <at> gmail.com
> Date: Sun, 27 Sep 2020 17:42:27 +0200
> 
> There was then some followup discussion about renaming some of these
> macros, since they have names that sometimes lead to some confusion, but
> I think it's perhaps too much churn for too little gain.

Yes.  In general, I'd prefer not to rename veteran macros
unnecessarily, as they are more or less burned into the muscle memory
of those who frequently work on the Emacs internals.  Into my memory,
definitely.  These two macros are from that group.

> So I think everything is fixed here now?

Yes.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 26 Oct 2020 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 182 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.