GNU bug report logs - #9771
24.0.90; Redisplay problems with control characters

Previous Next

Package: emacs;

Reported by: Johan Bockgård <bojohan <at> gnu.org>

Date: Sun, 16 Oct 2011 22:26:02 UTC

Severity: normal

Found in version 24.0.90

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 9771 in the body.
You can then email your comments to 9771 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#9771; Package emacs. (Sun, 16 Oct 2011 22:26:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Johan Bockgård <bojohan <at> gnu.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 16 Oct 2011 22:26:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Johan Bockgård <bojohan <at> gnu.org>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.0.90; Redisplay problems with control characters
Date: Mon, 17 Oct 2011 00:24:30 +0200
There are several redisplay problems with control characters
(GET_FROM_DISPLAY_VECTOR method).


1. Abort in move_it_by_lines (with bidi)

emacs -Q
C-u 2000 C-q 0 RET
M-<
M->

Fatal error (6)Aborted (core dumped)

(gdb) bt
#0  abort () at emacs.c:385
#1  0x0000000000451a48 in move_it_by_lines (it=0x7fffffffc620, dvpos=0)
    at xdisp.c:8907
#2  0x00000000004c0717 in Frecenter (arg=10) at window.c:5107
#3  0x000000000060ce90 in Ffuncall (nargs=<optimized out>, args=0x7fffffffd460)
    at eval.c:2974
[...]

The following assertion fails:

  [move_it_by_lines]

    if (dvpos == 0)
      {
        /* DVPOS == 0 means move to the start of the screen line.  */
        move_it_vertically_backward (it, 0);
        xassert (it->current_x == 0 && it->hpos == 0);

(gdb) p it->current_x
$1 = 10
(gdb) p it->hpos
$2 = 1


The non-zero values for it->hpos and it->current_x come from move_it_to
in this piece of code:

 [move_it_vertically_backward]

    /* The above code moves us to some position NLINES down,
       usually to its first glyph (leftmost in an L2R line), but
       that's not necessarily the start of the line, under bidi
       reordering.  We want to get to the character position
       that is immediately after the newline of the previous
       line.  */
    if (it->bidi_p && IT_CHARPOS (*it) > BEGV
        && FETCH_BYTE (IT_BYTEPOS (*it) - 1) != '\n')
      {
        EMACS_INT nl_pos =
          find_next_newline_no_quit (IT_CHARPOS (*it) - 1, -1);

        move_it_to (it, nl_pos, -1, -1, -1, MOVE_TO_POS);
      }


2. (Old) problem in BUFFER_POS_REACHED_P

The position hpos = 1 above is not just non-zero; it's also in the
middle of the ^@ control character (the screen line starts with ^). It's
produced by move_it_in_display_line_to:

    #define BUFFER_POS_REACHED_P()                                  \
    [...]
       && (it->method == GET_FROM_BUFFER                            \
           || (it->method == GET_FROM_DISPLAY_VECTOR                \
               && it->dpvec + it->current.dpvec_index + 1 >= it->dpend)))

According to the condition above, the position in column 0 before the
^ glyph (dpvec_index = 0) is not a possible stop point, but the position
between ^ and @ is.

Cf. in_display_vector_p:

    /* Return 1 if IT points into the middle of a display vector.  */
    in_display_vector_p (struct it *it)
    {
      return (it->method == GET_FROM_DISPLAY_VECTOR
              && it->current.dpvec_index > 0
              && it->dpvec + it->current.dpvec_index != it->dpend);
    }


3. Long lines with display vectors make Emacs really slow (with bidi)

emacs -Q
C-u 2000 C-q 0 RET
M-<
Type text...


4. Abort in push_display_prop

emacs -Q

(setq wrap-prefix "x")

C-u 100 C-q 0 RET

Fatal error (6)Aborted (core dumped)

(gdb) bt
#0  abort () at emacs.c:385
#1  0x0000000000477316 in push_display_prop (it=0x7fffffff8cd0, prop=24535585)
    at xdisp.c:18432
#2  0x0000000000477857 in handle_line_prefix (it=0x7fffffff8cd0)
    at xdisp.c:18536
#3  0x00000000004782d4 in display_line (it=0x7fffffff8cd0) at xdisp.c:18802

  xassert (it->method == GET_FROM_BUFFER
	   || it->method == GET_FROM_STRING);

(gdb) p it->method 
$1 = GET_FROM_DISPLAY_VECTOR




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9771; Package emacs. (Mon, 17 Oct 2011 07:50:01 GMT) Full text and rfc822 format available.

Message #8 received at 9771 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Johan Bockgård <bojohan <at> gnu.org>
Cc: 9771 <at> debbugs.gnu.org
Subject: Re: bug#9771: 24.0.90; Redisplay problems with control characters
Date: Mon, 17 Oct 2011 03:48:27 -0400
> From: Johan Bockgård <bojohan <at> gnu.org>
> Date: Mon, 17 Oct 2011 00:24:30 +0200
> 
> 2. (Old) problem in BUFFER_POS_REACHED_P
> 
> The position hpos = 1 above is not just non-zero; it's also in the
> middle of the ^@ control character (the screen line starts with ^). It's
> produced by move_it_in_display_line_to:
> 
>     #define BUFFER_POS_REACHED_P()                                  \
>     [...]
>        && (it->method == GET_FROM_BUFFER                            \
>            || (it->method == GET_FROM_DISPLAY_VECTOR                \
>                && it->dpvec + it->current.dpvec_index + 1 >= it->dpend)))
> 
> According to the condition above, the position in column 0 before the
> ^ glyph (dpvec_index = 0) is not a possible stop point, but the position
> between ^ and @ is.

Okay, but what is the practical problem with this?

> 3. Long lines with display vectors make Emacs really slow (with bidi)

It's not the display vectors in general that cause this.  It's
specifically the display of control characters.  E.g., the following
test case, which is a small variation of yours, shows no visible
slowdown:

  emacs -Q
  M-: (require 'disp-table) RET
  M-: (aset standard-display-table ?x (vconcat "^A")) RET
  C-u 2000 x
  M-<
  Type text...

I'll look into this, but is there some real-life use case behind your
recipe?  Binary nulls in a file generally cause Emacs to make the
buffer unibyte, where bidi reordering is disabled.  If some
optimizations are in order to speed up redisplay in this situation, it
would help to have a real-life use case handy, to make sure the
optimizations really make a difference.

The other parts of the report are more or less clear.

Btw, in the future please make a separate bug report for each issue,
it makes references to bugs in log messages more straightforward.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9771; Package emacs. (Mon, 17 Oct 2011 14:20:01 GMT) Full text and rfc822 format available.

Message #11 received at 9771 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: bojohan <at> gnu.org, 9771 <at> debbugs.gnu.org
Subject: Re: bug#9771: 24.0.90; Redisplay problems with control characters
Date: Mon, 17 Oct 2011 10:18:42 -0400
> Date: Mon, 17 Oct 2011 03:48:27 -0400
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 9771 <at> debbugs.gnu.org
> 
> > 3. Long lines with display vectors make Emacs really slow (with bidi)
> 
> It's not the display vectors in general that cause this.  It's
> specifically the display of control characters.

The reason for the slowness is the bidirectional property of the
control characters, and a peculiarity of the UBA, the Unicode
Bidirectional Algorithms, regarding the processing of sequences of
characters with that particular property.  In the particular case in
point, it causes Emacs to search to the end of the long line, when it
is about to display the first null character.

So now this question becomes very relevant:

> is there some real-life use case behind your recipe?

Depending on the answer, it may or may not be a good idea to look for
some heuristics for this case before 24.1 is released, because AFAICS
the current implementation of the UBA requirements is correct.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9771; Package emacs. (Mon, 17 Oct 2011 17:08:02 GMT) Full text and rfc822 format available.

Message #14 received at 9771 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Johan Bockgård <bojohan <at> gnu.org>
Cc: 9771 <at> debbugs.gnu.org
Subject: Re: bug#9771: 24.0.90; Redisplay problems with control characters
Date: Mon, 17 Oct 2011 19:06:04 +0200
> From: Johan Bockgård <bojohan <at> gnu.org>
> Date: Mon, 17 Oct 2011 00:24:30 +0200
> 
> 4. Abort in push_display_prop
> 
> emacs -Q
> 
> (setq wrap-prefix "x")
> 
> C-u 100 C-q 0 RET
> 
> Fatal error (6)Aborted (core dumped)

This part should be fixed in revision 106102 on the trunk.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9771; Package emacs. (Mon, 17 Oct 2011 19:19:02 GMT) Full text and rfc822 format available.

Message #17 received at 9771 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Johan Bockgård <bojohan <at> gnu.org>
Cc: 9771 <at> debbugs.gnu.org
Subject: Re: bug#9771: 24.0.90; Redisplay problems with control characters
Date: Mon, 17 Oct 2011 21:16:38 +0200
> From: Johan Bockgård <bojohan <at> gnu.org>
> Date: Mon, 17 Oct 2011 00:24:30 +0200
> 
> 1. Abort in move_it_by_lines (with bidi)
> 
> emacs -Q
> C-u 2000 C-q 0 RET
> M-<
> M->
> 
> Fatal error (6)Aborted (core dumped)
> 
> (gdb) bt
> #0  abort () at emacs.c:385
> #1  0x0000000000451a48 in move_it_by_lines (it=0x7fffffffc620, dvpos=0)
>     at xdisp.c:8907
> #2  0x00000000004c0717 in Frecenter (arg=10) at window.c:5107
> #3  0x000000000060ce90 in Ffuncall (nargs=<optimized out>, args=0x7fffffffd460)
>     at eval.c:2974
> [...]
> 
> The following assertion fails:
> 
>   [move_it_by_lines]
> 
>     if (dvpos == 0)
>       {
>         /* DVPOS == 0 means move to the start of the screen line.  */
>         move_it_vertically_backward (it, 0);
>         xassert (it->current_x == 0 && it->hpos == 0);
> 
> (gdb) p it->current_x
> $1 = 10
> (gdb) p it->hpos
> $2 = 1

This part should be fixed in revision 106106 on the trunk.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9771; Package emacs. (Mon, 17 Oct 2011 21:06:02 GMT) Full text and rfc822 format available.

Message #20 received at 9771 <at> debbugs.gnu.org (full text, mbox):

From: Johan Bockgård <bojohan <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 9771 <at> debbugs.gnu.org
Subject: Re: bug#9771: 24.0.90; Redisplay problems with control characters
Date: Mon, 17 Oct 2011 23:04:46 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> According to the condition above, the position in column 0 before the
>> ^ glyph (dpvec_index = 0) is not a possible stop point, but the position
>> between ^ and @ is.
>
> Okay, but what is the practical problem with this?

Nothing, really. But it doesn't seem correct.

> Binary nulls in a file generally cause Emacs to make the buffer
> unibyte, where bidi reordering is disabled.

That doesn't seem to be working, then.

If I do

  emacs -Q /usr/bin/emacs

the resulting buffer is multibyte.

Type M-> and Emacs hangs. (I waited 5 minutes and had to kill the
process.)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9771; Package emacs. (Tue, 18 Oct 2011 12:58:01 GMT) Full text and rfc822 format available.

Message #23 received at 9771 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Johan Bockgård <bojohan <at> gnu.org>, handa <at> m17n.org
Cc: 9771 <at> debbugs.gnu.org
Subject: Re: bug#9771: 24.0.90; Redisplay problems with control characters
Date: Tue, 18 Oct 2011 08:56:34 -0400
> From: Johan Bockgård <bojohan <at> gnu.org>
> Cc: 9771 <at> debbugs.gnu.org
> Date: Mon, 17 Oct 2011 23:04:46 +0200
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> >> According to the condition above, the position in column 0 before the
> >> ^ glyph (dpvec_index = 0) is not a possible stop point, but the position
> >> between ^ and @ is.
> >
> > Okay, but what is the practical problem with this?
> 
> Nothing, really. But it doesn't seem correct.

I'm not sure.  Maybe I'm missing something (move_it_in_display_line_to
is one of the trickiest functions in the display engine), so please
bear with me as I explain why I think the test is correct.

You wrote:

>     #define BUFFER_POS_REACHED_P()                                  \
>     [...]
>        && (it->method == GET_FROM_BUFFER                            \
>            || (it->method == GET_FROM_DISPLAY_VECTOR                \
>                && it->dpvec + it->current.dpvec_index + 1 >= it->dpend)))
> 
> According to the condition above, the position in column 0 before the
> ^ glyph (dpvec_index = 0) is not a possible stop point, but the position
> between ^ and @ is.

A display vector has its glyphs stored at indices 0..it->dpend-1.
IOW, the last glyph's index is it->dpend-1.  BUFFER_POS_REACHED_P is
invoked always after a call to get_next_display_element, but before
the call to set_iterator_to_next.  The former just consumes the glyph
at it->current.dpvec_index; the latter advances the index to the next
glyph of the display vector.  Therefore, after a call to
get_next_display_element, the glyph at dpvec_index was already
consumed, but the index was not yet advanced to the next position.
Thus, when the index is at dpend-1, as the condition in
BUFFER_POS_REACHED_P says, we have already consumed the last glyph.

Consuming the last glyph of a display vector means that the next call
to set_iterator_to_next will detect that the display vector is
exhausted, and will advance to the next buffer position, the one after
the position which we just passed by consuming all the glyphs from the
display vector used to display the character at that position.  Thus,
"buffer position reached".

Does this make sense?

In practice, after correcting the bug that caused the assertion
violation, I can no longer reproduce the situation where we stop in
the middle of the ^@ character.  If you can show me a recipe for
winding up in the middle of a display vector under these or similar
circumstances, I will have another look.

> > Binary nulls in a file generally cause Emacs to make the buffer
> > unibyte, where bidi reordering is disabled.
> 
> That doesn't seem to be working, then.
> 
> If I do
> 
>   emacs -Q /usr/bin/emacs
> 
> the resulting buffer is multibyte.

Indeed.  However, buffer-file-coding-system is no-conversion, which is
confusingly inconsistent with "C-x RET c no-conversion C-x C-f".
Perhaps Handa-san could shed some light on this inconsistency.

Anyway, this being so, I found a way to optimize a couple of expensive
loops inside bidi.c for the important special case of a plain L2R text
in the buffer.  With these optimizations, the bidi redisplay of the
entire window showing a buffer with 2000 binary nulls is only a few
milliseconds slower than the unidirectional one.

I will install those changes later today, after some more testing.

> Type M-> and Emacs hangs. (I waited 5 minutes and had to kill the
> process.)

It takes 1.5 seconds now (with a 44MB emacs binary ;-)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9771; Package emacs. (Tue, 18 Oct 2011 17:00:02 GMT) Full text and rfc822 format available.

Message #26 received at 9771 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: bojohan <at> gnu.org
Cc: 9771 <at> debbugs.gnu.org
Subject: Re: bug#9771: 24.0.90; Redisplay problems with control characters
Date: Tue, 18 Oct 2011 18:58:41 +0200
> Date: Tue, 18 Oct 2011 08:56:34 -0400
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 9771 <at> debbugs.gnu.org
> 
> Anyway, this being so, I found a way to optimize a couple of expensive
> loops inside bidi.c for the important special case of a plain L2R text
> in the buffer.  With these optimizations, the bidi redisplay of the
> entire window showing a buffer with 2000 binary nulls is only a few
> milliseconds slower than the unidirectional one.
> 
> I will install those changes later today, after some more testing.

Done (revision 106124 on the trunk).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9771; Package emacs. (Wed, 19 Oct 2011 01:08:01 GMT) Full text and rfc822 format available.

Message #29 received at 9771 <at> debbugs.gnu.org (full text, mbox):

From: Johan Bockgård <bojohan <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 9771 <at> debbugs.gnu.org, handa <at> m17n.org
Subject: Re: bug#9771: 24.0.90; Redisplay problems with control characters
Date: Wed, 19 Oct 2011 03:06:09 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> Consuming the last glyph of a display vector means that the next call
> to set_iterator_to_next will detect that the display vector is
> exhausted, and will advance to the next buffer position, the one after
> the position which we just passed by consuming all the glyphs from the
> display vector used to display the character at that position. Thus,
> "buffer position reached".

It seems that BUFFER_POS_REACHED_P at least misses the opportunity to
stop immediately before *any* glyph in the display vector has been
consumed, i.e. when dpvec_index == 0. (Or the problem is not with
BUFFER_POS_REACHED_P but with how it's being used.)

[...]
> In practice, after correcting the bug that caused the assertion
> violation, I can no longer reproduce the situation where we stop in
> the middle of the ^@ character.  If you can show me a recipe for
> winding up in the middle of a display vector under these or similar
> circumstances, I will have another look.

emacs -Q

C-u 2000 C-q 0 RET

Notice that the lines visible in the window start with "@^".

Press <up> until the top of the buffer just scrolls into view.

Notice that lines before point start with "^@", and lines after
point start with "@^".

Change BUFFER_POS_REACHED_P (or something) so that dpvec_index == 0 is
an acceptable stop position and this problem doesn't happen.




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Wed, 19 Oct 2011 11:52:01 GMT) Full text and rfc822 format available.

Notification sent to Johan Bockgård <bojohan <at> gnu.org>:
bug acknowledged by developer. (Wed, 19 Oct 2011 11:52:01 GMT) Full text and rfc822 format available.

Message #34 received at 9771-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Johan Bockgård <bojohan <at> gnu.org>
Cc: 9771-done <at> debbugs.gnu.org
Subject: Re: bug#9771: 24.0.90; Redisplay problems with control characters
Date: Wed, 19 Oct 2011 13:49:54 +0200
> From: Johan Bockgård <bojohan <at> gnu.org>
> Cc: handa <at> m17n.org,  9771 <at> debbugs.gnu.org
> Date: Wed, 19 Oct 2011 03:06:09 +0200
> 
> emacs -Q
> 
> C-u 2000 C-q 0 RET
> 
> Notice that the lines visible in the window start with "@^".
> 
> Press <up> until the top of the buffer just scrolls into view.
> 
> Notice that lines before point start with "^@", and lines after
> point start with "@^".

Thanks for the test case.

> Change BUFFER_POS_REACHED_P (or something) so that dpvec_index == 0 is
> an acceptable stop position and this problem doesn't happen.

I'm not sure a change in BUFFER_POS_REACHED_P is the right fix for
this particular problem.  Even if it is, I'm reluctant to make in
move_it_in_display_line_to a change that could have effect in many
unrelated places, since this function is such a central piece of the
display engine.  I'd rather make a change that targets this specific
problem, certainly while we are in a pretest.

I think I found such a change.  The problem is that when we start a
new redisplay cycle of a window, we begin drawing the window's first
line at the last glyph of the display vector used to display the first
character position visible in the window.  I made a simple change that
should fix this (in revision 106131), please see if it resolves the
issue.

Another problem I found was that one of the redisplay optimizations
would redraw too few lines of a window, when the last redrawn line
ends in glyphs from a display vector.  I fixed that in revision
106133.

With all the 4 parts of the original report being addressed now, I'm
closing this bug report.  If there are any left-overs, please submit
separate bug reports about them.

Thanks.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9771; Package emacs. (Thu, 20 Oct 2011 12:45:04 GMT) Full text and rfc822 format available.

Message #37 received at 9771 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: bojohan <at> gnu.org, 9771 <at> debbugs.gnu.org
Subject: Re: bug#9771: 24.0.90; Redisplay problems with control characters
Date: Thu, 20 Oct 2011 14:43:45 +0200
> Date: Tue, 18 Oct 2011 18:58:41 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 9771 <at> debbugs.gnu.org
> 
> > Date: Tue, 18 Oct 2011 08:56:34 -0400
> > From: Eli Zaretskii <eliz <at> gnu.org>
> > Cc: 9771 <at> debbugs.gnu.org
> > 
> > Anyway, this being so, I found a way to optimize a couple of expensive
> > loops inside bidi.c for the important special case of a plain L2R text
> > in the buffer.  With these optimizations, the bidi redisplay of the
> > entire window showing a buffer with 2000 binary nulls is only a few
> > milliseconds slower than the unidirectional one.
> > 
> > I will install those changes later today, after some more testing.
> 
> Done (revision 106124 on the trunk).

For the record, revision 106151 speeds up the bidi display in a more
general class of use cases.  (Previously, appending some character,
like `a', to the 2000 nulls in the original example would still slow
down redisplay, even after the changes in r106124.)




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 18 Nov 2011 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 12 years and 171 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.