GNU bug report logs - #39977
28.0.50; Unhelpful stack trace

Previous Next

Package: emacs;

Reported by: Madhu <enometh <at> meer.net>

Date: Sat, 7 Mar 2020 18:09:01 UTC

Severity: normal

Tags: fixed

Found in version 28.0.50

Fixed in version 28.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 39977 in the body.
You can then email your comments to 39977 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Sat, 07 Mar 2020 18:09:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Madhu <enometh <at> meer.net>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sat, 07 Mar 2020 18:09:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Madhu <enometh <at> meer.net>
To: bug-gnu-emacs <at> gnu.org
Subject: 28.0.50; Unhelpful stack trace
Date: Sat, 07 Mar 2020 23:13:16 +0530
[Message part 1 (text/plain, inline)]
Thanks Eli, Following up on the emacs-devel message, I recompiled emacs
and get substantially the same stack trace. This is with my ~/.emacs and
all my local customizations loaded.

when evaluating a form in a sly lisp buffer with
 (break)
M-x sly-eval-last-sexp
pops up a sly debug window in a new frame
quitting the window (and frame) tries to select the previous window
and the crash apparently happens at this point. I'm attaching the
backtraces as attachments

With a little guidance I expect to be able to investigate this further

[bt.txt (text/plain, attachment)]
[bt-full.txt (text/plain, attachment)]
[Message part 4 (text/plain, inline)]
In GNU Emacs 28.0.50 (build 3, x86_64-pc-linux-gnu, GTK+ Version 3.24.13, cairo version 1.16.0)
 of 2020-03-07 built on leonis4
Repository revision: 64c791cecf6a9a8593c6e818b0007fcba5cc1549
Repository branch: madhu-tip
Windowing system distributor 'The X.Org Foundation', version 11.0.12006000
System Description: Gentoo/Linux

Configured using:
 'configure -C --with-harfbuzz --with-cairo --with-x-toolkit=gtk
 'CFLAGS=-ggdb -O0''

Configured features:
XPM JPEG TIFF GIF PNG RSVG CAIRO SOUND GPM DBUS GSETTINGS GLIB NOTIFY
INOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE HARFBUZZ M17N_FLT LIBOTF
ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 XDBE XIM MODULES THREADS JSON PDUMPER
LCMS2 GMP

Important settings:
  value of $LC_COLLATE: C
  value of $LANG: en_US.utf8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  savehist-mode: t
  xclip-mode: t
  elisp-slime-nav-mode: t
  ivy-prescient-mode: t
  prescient-persist-mode: t
  ivy-mode: t
  save-place-mode: t
  recentf-mode: t
  show-paren-mode: t
  shell-dirtrack-mode: t
  minibuffer-depth-indicate-mode: t
  display-time-mode: t
  which-function-mode: t
  foo-clear-output-mode: t
  foo-translate-kbd-paren-mode: t
  new-shell-activate-mode: t
  foo-mode: t
  tooltip-mode: t
  eldoc-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Features:
(shadow sort mail-extr emacsbug message rmc puny rfc822 mml mml-sec epa
epg epg-config mm-decode mm-bodies mm-encode mail-parse rfc2231
mailabbrev gmm-utils mailheader sendmail lw-manual lww61-manual-data
savehist xclip elisp-slime-nav gnus nnheader gnus-util rmail
rmail-loaddefs rfc2047 rfc2045 ietf-drums text-property-search
mail-utils mm-util mail-prsvr company pcase cus-start cus-load
ivy-prescient prescient counsel xdg swiper ivy delsel colir color
ivy-overlay ggtags etags fileloop generator xref project compile ewoc
zenicb-color zenicb-whereis zenicb-complete zenicb-stamp zenicb-history
zenicb-away zenicb zenirc-sasl erc-goodies erc erc-backend erc-compat pp
erc-loaddefs zenirc-color zenirc-stamp zenirc-trigger zenirc-notify
zenirc-netsplit zenirc-ignore zenirc-history zenirc-format zenirc-dcc
zenirc-complete zenirc-command-queue zenirc-away zenirc org-mew mew-auth
mew-config mew-imap2 mew-imap mew-nntp2 mew-nntp mew-pop mew-smtp
mew-ssl mew-ssh mew-net mew-highlight mew-sort mew-fib mew-ext
mew-refile mew-demo mew-attach mew-draft mew-message mew-thread
mew-virtual mew-summary4 mew-summary3 mew-summary2 mew-summary
mew-search mew-pick mew-passwd mew-scan mew-syntax mew-bq mew-smime
mew-pgp mew-header mew-exec mew-mark mew-mime mew-unix mew-edit
mew-decode mew-encode mew-cache mew-minibuf mew-complete mew-addrbook
mew-local mew-vars3 mew-vars2 mew-vars mew-env mew-mule3 mew-mule
mew-gemacs mew-key mew-func mew-blvs mew-const mew server winner
windmove whitespace tramp-sh tramp tramp-loaddefs trampver
tramp-integration files-x tramp-compat ls-lisp ange-ftp term disp-table
ehelp saveplace recentf tree-widget wid-edit paren ob-lisp ob-shell
shell org ob ob-tangle ob-ref ob-lob ob-table ob-exp org-macro
org-footnote org-src ob-comint org-pcomplete pcomplete comint ring
org-list org-faces org-entities noutline outline org-version
ob-emacs-lisp ob-core ob-eval org-table ol org-keys org-compat org-macs
org-loaddefs format-spec find-func cal-menu calendar cal-loaddefs
rng-nxml rng-valid rng-loc rng-uri rng-parse nxml-parse rng-match rng-dt
rng-util rng-pttrn nxml-ns nxml-mode nxml-outln nxml-rap sgml-mode dom
nxml-util nxml-enc xmltok mb-depth ffap thingatpt battery dbus xml time
so-long which-func imenu parse-time iso8601 time-date cookie1 diff
generic ansi-color derived easy-mmode edmacro kmacro advice cl-extra
help-mode dired-x dired dired-loaddefs cl gh-common marshal eieio-compat
rx info package easymenu browse-url url-handlers url-parse auth-source
cl-seq eieio eieio-core cl-macs eieio-loaddefs password-cache json
subr-x map url-vars seq byte-opt gv bytecomp byte-compile cconv
cl-loaddefs cl-lib tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page tab-bar menu-bar
rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame minibuffer cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote threads dbusbind
inotify lcms2 dynamic-setting system-font-setting font-render-setting
cairo move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 496211 16301)
 (symbols 48 37922 1)
 (strings 32 133899 4857)
 (string-bytes 1 3903977)
 (vectors 16 38303)
 (vector-slots 8 455102 14242)
 (floats 8 588 350)
 (intervals 56 1694 0)
 (buffers 1000 12))



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Sat, 07 Mar 2020 18:51:01 GMT) Full text and rfc822 format available.

Message #8 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Madhu <enometh <at> meer.net>
Cc: 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Sat, 07 Mar 2020 20:50:42 +0200
> From: Madhu <enometh <at> meer.net>
> Date: Sat, 07 Mar 2020 23:13:16 +0530
> 
> Thanks Eli, Following up on the emacs-devel message, I recompiled emacs
> and get substantially the same stack trace. This is with my ~/.emacs and
> all my local customizations loaded.
> 
> when evaluating a form in a sly lisp buffer with
>  (break)
> M-x sly-eval-last-sexp
> pops up a sly debug window in a new frame
> quitting the window (and frame) tries to select the previous window
> and the crash apparently happens at this point. I'm attaching the
> backtraces as attachments
> 
> With a little guidance I expect to be able to investigate this further

Thanks.  Please try the patch below, and tell if it makes the crash
go away.

diff --git a/src/window.c b/src/window.c
index 8cdad27..863fac4 100644
--- a/src/window.c
+++ b/src/window.c
@@ -541,8 +541,11 @@ select_window (Lisp_Object window, Lisp_Object norecord,
   else
     redisplay_other_windows ();
 
-  sf = SELECTED_FRAME ();
-  if (XFRAME (WINDOW_FRAME (w)) != sf)
+  if (FRAMEP (selected_frame) && FRAME_LIVE_P (XFRAME (selected_frame)))
+    sf = XFRAME (selected_frame);
+  else
+    sf = NULL;
+  if (!sf || XFRAME (WINDOW_FRAME (w)) != sf)
     {
       fset_selected_window (XFRAME (WINDOW_FRAME (w)), window);
       /* Use this rather than Fhandle_switch_frame




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Fri, 13 Mar 2020 09:56:02 GMT) Full text and rfc822 format available.

Message #11 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: enometh <at> meer.net, martin rudalics <rudalics <at> gmx.at>
Cc: 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Fri, 13 Mar 2020 11:55:26 +0200
Ping!  Can you please try the proposed patch?

Martin, any thoughts or comments about this?

> Date: Sat, 07 Mar 2020 20:50:42 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 39977 <at> debbugs.gnu.org
> 
> > From: Madhu <enometh <at> meer.net>
> > Date: Sat, 07 Mar 2020 23:13:16 +0530
> > 
> > Thanks Eli, Following up on the emacs-devel message, I recompiled emacs
> > and get substantially the same stack trace. This is with my ~/.emacs and
> > all my local customizations loaded.
> > 
> > when evaluating a form in a sly lisp buffer with
> >  (break)
> > M-x sly-eval-last-sexp
> > pops up a sly debug window in a new frame
> > quitting the window (and frame) tries to select the previous window
> > and the crash apparently happens at this point. I'm attaching the
> > backtraces as attachments
> > 
> > With a little guidance I expect to be able to investigate this further
> 
> Thanks.  Please try the patch below, and tell if it makes the crash
> go away.
> 
> diff --git a/src/window.c b/src/window.c
> index 8cdad27..863fac4 100644
> --- a/src/window.c
> +++ b/src/window.c
> @@ -541,8 +541,11 @@ select_window (Lisp_Object window, Lisp_Object norecord,
>    else
>      redisplay_other_windows ();
>  
> -  sf = SELECTED_FRAME ();
> -  if (XFRAME (WINDOW_FRAME (w)) != sf)
> +  if (FRAMEP (selected_frame) && FRAME_LIVE_P (XFRAME (selected_frame)))
> +    sf = XFRAME (selected_frame);
> +  else
> +    sf = NULL;
> +  if (!sf || XFRAME (WINDOW_FRAME (w)) != sf)
>      {
>        fset_selected_window (XFRAME (WINDOW_FRAME (w)), window);
>        /* Use this rather than Fhandle_switch_frame
> 
> 
> 
> 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Fri, 13 Mar 2020 16:30:03 GMT) Full text and rfc822 format available.

Message #14 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>, enometh <at> meer.net
Cc: 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Fri, 13 Mar 2020 17:28:53 +0100
> Martin, any thoughts or comments about this?

The selected frame must be invariantly live.  Madhu, could you find out
why we apparently manage to return from delete_frame in frame.c without
selecting another frame?

The dividing area is the part written as

  /* At this point, we are committed to deleting the frame.
     There is no more chance for errors to prevent it.  */
  minibuffer_selected = EQ (minibuf_window, selected_window);
  sf = SELECTED_FRAME ();
  /* Don't let the frame remain selected.  */
  if (f == sf)

starting around line 2012 in delete_frame.  Put a breakpoint anywhere
there and run your sly function.  If the (f == sf) check is not true, we
are lost.  Otherwise, try to step through the following FOR_EACH_FRAME
and tell us why it doesn't break out of that loop (and the subsequent
one).  It requires a bit of intuition, but since you probably will not
have more than one frame you should be able to find out quickly.

Other than that I cannot imagine what could have gone wrong here and/or
how to test this.  In either case sf = NULL; is not TRT but I think you
are aware of that.

martin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Fri, 13 Mar 2020 19:44:01 GMT) Full text and rfc822 format available.

Message #17 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Fri, 13 Mar 2020 21:43:11 +0200
> Cc: 39977 <at> debbugs.gnu.org
> From: martin rudalics <rudalics <at> gmx.at>
> Date: Fri, 13 Mar 2020 17:28:53 +0100
> 
> In either case sf = NULL; is not TRT but I think you are aware of
> that.

No, I don't think I'm aware of that.  It's just a local variable, so
why assigning NULL could not be TRT?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Sat, 14 Mar 2020 08:49:02 GMT) Full text and rfc822 format available.

Message #20 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Sat, 14 Mar 2020 09:48:20 +0100
>> In either case sf = NULL; is not TRT but I think you are aware of
>> that.
>
> No, I don't think I'm aware of that.  It's just a local variable, so
> why assigning NULL could not be TRT?

Because it hides the underlying error.  The abort in SELECTED_FRAME is
there so we can find its cause and that's why I said you are aware of
it.  Obviously, we can set it to NULL to avoid an abort when, as in the
case at hand, we construct the mode line or the frame title.  But in
general doing such a thing in select_window is not TRT.  At least that's
what I learned from you.

martin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Sat, 14 Mar 2020 10:11:02 GMT) Full text and rfc822 format available.

Message #23 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Sat, 14 Mar 2020 12:10:09 +0200
> Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
> From: martin rudalics <rudalics <at> gmx.at>
> Date: Sat, 14 Mar 2020 09:48:20 +0100
> 
>  >> In either case sf = NULL; is not TRT but I think you are aware of
>  >> that.
>  >
>  > No, I don't think I'm aware of that.  It's just a local variable, so
>  > why assigning NULL could not be TRT?
> 
> Because it hides the underlying error.  The abort in SELECTED_FRAME is
> there so we can find its cause and that's why I said you are aware of
> it.  Obviously, we can set it to NULL to avoid an abort when, as in the
> case at hand, we construct the mode line or the frame title.  But in
> general doing such a thing in select_window is not TRT.  At least that's
> what I learned from you.

My understanding of the scenario in this report was that the value of
selected_frame didn't have time to become updated before redisplay
kicked in.  If you think the problem is elsewhere, I'm okay with
leaving this crash in emacs-27 until we understand the cause of that
and fix it elsewhere.  I just hope you will have the solution quickly
enough to not release Emacs 27 with this crash.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Sat, 14 Mar 2020 10:38:01 GMT) Full text and rfc822 format available.

Message #26 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Sat, 14 Mar 2020 11:37:24 +0100
> My understanding of the scenario in this report was that the value of
> selected_frame didn't have time to become updated before redisplay
> kicked in.

My understanding is the same.  But we should not allow redisplay to kick
in before selected_frame is updated.

> If you think the problem is elsewhere, I'm okay with
> leaving this crash in emacs-27 until we understand the cause of that
> and fix it elsewhere. I just hope you will have the solution quickly
> enough to not release Emacs 27 with this crash.

I don't oppose your patch and that's why I didn't comment it initially.
If you think it's urgent to fix this specific crash, push it.  But we
obviously might also let more serious bugs pass through that mechanism
then.  If Madhu has a simple recipe to reproduce the crash, we should
IMHO try to profit from it.  If he responds at all.

martin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Sat, 14 Mar 2020 18:56:02 GMT) Full text and rfc822 format available.

Message #29 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Sat, 14 Mar 2020 19:55:34 +0100
Maybe the following two changes would not harm:

(1) In fast_set_selected_frame check whether FRAME is live before doing
    selected_frame = frame;

(2) In display_mode_lines check whether new_frame is live before doing

    selected_frame = new_frame;

    and maybe also for old_selected_frame before

    selected_frame = old_selected_frame;

Maybe display_mode_lines should better use

   record_unwind_protect (fast_set_selected_frame, selected_frame);

What was the rationale for protecting frame reselection when drawing the
tab bar or the tool bar and not protecting it when drawing mode lines?

I have no idea whether these could help in any way but since we are in
redisplay and the selected frame has become dead all of a sudden ...

Another point is obviously the

      do_switch_frame (frame1, 0, 1, Qnil);
      sf = SELECTED_FRAME ();

combination in delete_frame itself.  If frame1 is dead, we select the
frame we are about to delete.  But this should not produce the abort at
hand since the assignment to selected_frame happens in do_switch_frame.
Guarding that assignment would not harm either and then we're done IMO.
WDYT?

martin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Sat, 14 Mar 2020 20:10:01 GMT) Full text and rfc822 format available.

Message #32 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Sat, 14 Mar 2020 22:09:14 +0200
> From: martin rudalics <rudalics <at> gmx.at>
> Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
> Date: Sat, 14 Mar 2020 19:55:34 +0100
> 
> Maybe the following two changes would not harm:
> 
> (1) In fast_set_selected_frame check whether FRAME is live before doing
>      selected_frame = frame;
> 
> (2) In display_mode_lines check whether new_frame is live before doing
> 
>      selected_frame = new_frame;
> 
>      and maybe also for old_selected_frame before
> 
>      selected_frame = old_selected_frame;

And what do you suggest to do if the frame at RHS is not live?

> I have no idea whether these could help in any way but since we are in
> redisplay and the selected frame has become dead all of a sudden ...

Why do you think this is a problem for redisplay?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Sun, 15 Mar 2020 17:50:02 GMT) Full text and rfc822 format available.

Message #35 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Sun, 15 Mar 2020 18:49:21 +0100
[Message part 1 (text/plain, inline)]
> And what do you suggest to do if the frame at RHS is not live?

Sorry, what is RHS?  As far as xdisp.c is concerned it simply must not
set selected_frame to a dead frame.  Never ever.  As far as frame.c is
concerned, it should do something like in the attached patch.  In the
worst case this might make it impossible to remove a specific frame but
this can usually be fixed by C-x 5 2 followed by C-x 5 1 unless things
are awfully broken.  But at least this is a place from where to continue
investigating.  I have no better idea.

>> I have no idea whether these could help in any way but since we are in
>> redisplay and the selected frame has become dead all of a sudden ...
>
> Why do you think this is a problem for redisplay?

I didn't say that this is a problem for redisplay.  What I wanted to say
is that your fix which is supposed to handle a (maybe only temporary)
problem inside redisplay might cause more serious problems when
selecting a dead frame outside of redisplay.  But maybe I'm confusing
things.

martin
[frame.c.diff (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Sun, 15 Mar 2020 18:42:01 GMT) Full text and rfc822 format available.

Message #38 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Sun, 15 Mar 2020 20:41:42 +0200
> Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
> From: martin rudalics <rudalics <at> gmx.at>
> Date: Sun, 15 Mar 2020 18:49:21 +0100
> 
>  > And what do you suggest to do if the frame at RHS is not live?
> 
> Sorry, what is RHS?

Right-hand side.

> As far as xdisp.c is concerned it simply must not set selected_frame
> to a dead frame.

I don't think that's possible in xdisp.c cases you've shown.

> Never ever.

Why not?

> As far as frame.c is concerned, it should do something like in the
> attached patch.

We cannot punt like that in the display engine.

>  > Why do you think this is a problem for redisplay?
> 
> I didn't say that this is a problem for redisplay.  What I wanted to say
> is that your fix which is supposed to handle a (maybe only temporary)
> problem inside redisplay might cause more serious problems when
> selecting a dead frame outside of redisplay.  But maybe I'm confusing
> things.

So you are saying that selecting such a frame will cause trouble to
some other code, not to the display engine?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Mon, 16 Mar 2020 02:43:01 GMT) Full text and rfc822 format available.

Message #41 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Madhu <enometh <at> meer.net>
To: rudalics <at> gmx.at
Cc: eliz <at> gnu.org, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Mon, 16 Mar 2020 08:12:18 +0530 (IST)
[Sorry for the delay in checking my email until now. I wasn't
subscribe to the debbgugs newsgroup]

* I didn't get Eli's message #8 (Sat, 07 Mar 2020 20:50:42 +0200) in
my mailbox.  The patch supplied in this message does indeed make the
crash go away.

* I did get Eli's message #11 (Fri, 13 Mar 2020 11:55:26 +0200) to my
mailbox. Subsequent messages seem to be delivered to my email address.

* Re Martin's message #14 (Fri, 13 Mar 2020 17:28:53 +0100), the check
(f == sf) in delet_frame is not true when I trigger the crash.

* Re Eli's message #23 (Sat, 14 Mar 2020 12:10:09 +0200): It seems out
that I am going out of the way to trigger the crash - I may be
introducing a "bug" in SLY code, or exposing a defect in SLY design.

Presumably under normal circumstances the crash should not occur.

I haven't understood the sequence of events which causes leads to this
crash case.  I'm a little embarrased to reveal it but I can try to
pass on the recipe to Martin.

* Re Martin's message #35 (Sun, 15 Mar 2020 18:49:21 +0100), the patch
frame.c.diff does make the crash go away.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Mon, 16 Mar 2020 09:25:02 GMT) Full text and rfc822 format available.

Message #44 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Mon, 16 Mar 2020 10:24:14 +0100
>> As far as xdisp.c is concerned it simply must not set selected_frame
>> to a dead frame.
>
> I don't think that's possible in xdisp.c cases you've shown.
>
>> Never ever.
>
> Why not?

Because it might shift the abort to the next instance of SELECTED_FRAME.

>> As far as frame.c is concerned, it should do something like in the
>> attached patch.
>
> We cannot punt like that in the display engine.

Why not?  At least one of the frame restorations is unprotected anyway
and might leave the temporarily selected frame selected.

> So you are saying that selecting such a frame will cause trouble to
> some other code, not to the display engine?

Not "will" but "may".  The problem is that it then might be harder
to find the cause.

With emacs -Q evaluate

(defvar foo
  '(:eval
    (when (> (length (frame-list)) 1)
      (delete-frame (next-frame)))))

(setq-default mode-line-format foo)

and do C-x 5 2.  The backtrace I get here is


#0  0x000000000063f7a3 in terminate_due_to_signal (sig=6, backtrace_limit=40) at ../../src/emacs.c:371
#1  0x000000000068ac8a in emacs_abort () at ../../src/sysdep.c:2448
#2  0x00000000004ee088 in select_window (window=XIL(0x1be5745), norecord=XIL(0x30), inhibit_point_swap=false) at ../../src/window.c:544
#3  0x00000000004ee2f9 in Fselect_window (window=XIL(0x1be5745), norecord=XIL(0x30)) at ../../src/window.c:630
#4  0x0000000000484c33 in gui_consider_frame_title (frame=XIL(0x1be5505)) at ../../src/xdisp.c:12318
#5  0x00000000004974b6 in redisplay_window (window=XIL(0x1be5745), just_this_one_p=false) at ../../src/xdisp.c:18940
#6  0x000000000048cb00 in redisplay_window_0 (window=XIL(0x1be5745)) at ../../src/xdisp.c:16179
#7  0x00000000007b10dd in internal_condition_case_1 (bfun=0x48cabe <redisplay_window_0>, arg=XIL(0x1be5745), handlers=XIL(0x7ffff40bafbb), hfun=0x48ca86 <redisplay_window_error>) at ../../src/eval.c:1379
#8  0x000000000048ca58 in redisplay_windows (window=XIL(0x1be5745)) at ../../src/xdisp.c:16159
#9  0x000000000048b486 in redisplay_internal () at ../../src/xdisp.c:15627
#10 0x0000000000489084 in redisplay () at ../../src/xdisp.c:14854
#11 0x0000000000650828 in read_char (commandflag=1, map=XIL(0x17ce993), prev_event=XIL(0), used_mouse_menu=0x7fffffffe13f, end_time=0x0) at ../../src/keyboard.c:2493
#12 0x0000000000663705 in read_key_sequence (keybuf=0x7fffffffe2d0, prompt=XIL(0), dont_downcase_last=false, can_return_switch_frame=true, fix_current_buffer=true, prevent_redisplay=false) at ../../src/keyboard.c:9549
#13 0x000000000064ccee in command_loop_1 () at ../../src/keyboard.c:1350
#14 0x00000000007b1002 in internal_condition_case (bfun=0x64c872 <command_loop_1>, handlers=XIL(0x90), hfun=0x64be81 <cmd_error>) at ../../src/eval.c:1355
#15 0x000000000064c457 in command_loop_2 (ignore=XIL(0)) at ../../src/keyboard.c:1091
#16 0x00000000007b04b6 in internal_catch (tag=XIL(0xd0e0), func=0x64c42a <command_loop_2>, arg=XIL(0)) at ../../src/eval.c:1116
#17 0x000000000064c3f5 in command_loop () at ../../src/keyboard.c:1070
#18 0x000000000064b968 in recursive_edit_1 () at ../../src/keyboard.c:714
#19 0x000000000064bb60 in Frecursive_edit () at ../../src/keyboard.c:786
#20 0x0000000000641f98 in main (argc=2, argv=0x7fffffffe7c8) at ../../src/emacs.c:2035


which is almost the same as Madhu's.  So maybe the display engine should
simply set a global boolean inhibit_frame_changes while evaluating mode
lines or frame titles and have at least delete_frame not delete a frame
when that variable is set.  If we decide that fixing such a problem is
urgent.

martin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Mon, 16 Mar 2020 09:26:01 GMT) Full text and rfc822 format available.

Message #47 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Madhu <enometh <at> meer.net>
Cc: eliz <at> gnu.org, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Mon, 16 Mar 2020 10:25:04 +0100
> I haven't understood the sequence of events which causes leads to this
> crash case.  I'm a little embarrased to reveal it but I can try to
> pass on the recipe to Martin.

Do you ever evaluate something that could cause a frame deletion when
setting a mode or header-line string or a frame title.  Do you use the
tab-bar?  Where in your code does the "quitting the window (and frame)"
you mentioned earlier happen?

martin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Mon, 16 Mar 2020 15:34:02 GMT) Full text and rfc822 format available.

Message #50 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Mon, 16 Mar 2020 17:33:32 +0200
> Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
> From: martin rudalics <rudalics <at> gmx.at>
> Date: Mon, 16 Mar 2020 10:24:14 +0100
> 
>  >> As far as xdisp.c is concerned it simply must not set selected_frame
>  >> to a dead frame.
>  >
>  > I don't think that's possible in xdisp.c cases you've shown.
>  >
>  >> Never ever.
>  >
>  > Why not?
> 
> Because it might shift the abort to the next instance of SELECTED_FRAME.

Why does it matter which SELECTED_FRAME crashes?

Anyway, my point was a different one: it was that we cannot simply
"not select" such a frame, we need to do something else.  What exactly
is not trivial, and I didn't understand what you were suggesting to
do.

>  >> As far as frame.c is concerned, it should do something like in the
>  >> attached patch.
>  >
>  > We cannot punt like that in the display engine.
> 
> Why not?

Because we must have a frame that we were supposed to redisplay.

> At least one of the frame restorations is unprotected anyway
> and might leave the temporarily selected frame selected.

The display engine doesn't select frames to show them to the user, it
selects them to redraw their windows.  So the considerations what to
do in this case are different from those we need to consider when the
user selects a frame.

>  > So you are saying that selecting such a frame will cause trouble to
>  > some other code, not to the display engine?
> 
> Not "will" but "may".  The problem is that it then might be harder
> to find the cause.
> 
> With emacs -Q evaluate
> 
> (defvar foo
>    '(:eval
>      (when (> (length (frame-list)) 1)
>        (delete-frame (next-frame)))))
> 
> (setq-default mode-line-format foo)
> 
> and do C-x 5 2.  The backtrace I get here is

Which just means we need to add the protection to SELECTED_FRAME
itself, so that it runs everywhere.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Tue, 17 Mar 2020 09:39:01 GMT) Full text and rfc822 format available.

Message #53 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Tue, 17 Mar 2020 10:38:11 +0100
> Why does it matter which SELECTED_FRAME crashes?

Because the next crash may happen at some time in the future.  Why not
cure the first crash we have right away?

> Anyway, my point was a different one: it was that we cannot simply
> "not select" such a frame, we need to do something else.  What exactly
> is not trivial, and I didn't understand what you were suggesting to
> do.
>
>>   >> As far as frame.c is concerned, it should do something like in the
>>   >> attached patch.
>>   >
>>   > We cannot punt like that in the display engine.
>>
>> Why not?
>
> Because we must have a frame that we were supposed to redisplay.

Either we are miscommunicating or I' m just dumb.  I would in no way
restrict the display engine in choosing whatever live frame it wants to
redisplay.

> The display engine doesn't select frames to show them to the user, it
> selects them to redraw their windows.  So the considerations what to
> do in this case are different from those we need to consider when the
> user selects a frame.

As I said above: This is not about the frame its windows it has to
redraw.  It's about the display engine trying to select a frame after
it has redrawn (parts of) another frame's windows.

>> Not "will" but "may".  The problem is that it then might be harder
>> to find the cause.
>>
>> With emacs -Q evaluate
>>
>> (defvar foo
>>     '(:eval
>>       (when (> (length (frame-list)) 1)
>>         (delete-frame (next-frame)))))
>>
>> (setq-default mode-line-format foo)
>>
>> and do C-x 5 2.  The backtrace I get here is
>
> Which just means we need to add the protection to SELECTED_FRAME
> itself, so that it runs everywhere.

But SELECTED_FRAME is not the cause of this problem.  The cause of the
problem is AFAICT the fact that :eval is allowed to do silly things
while the display engine tries to redraw windows.  The example above is
only a mirror of

----------------------
With emacs -Q evaluate

(defvar foo
  '(:eval
    (when (> (length (frame-list)) 1)
      (delete-frame))))

(setq-default mode-line-format foo)

and do C-x 5 2.
---------------

where I'm told that

:eval deleted the frame being displayed

So the display engine is, in principle, aware of one incarnation of the
problem - the one where an :eval tries to delete under its feet the
frame it currently tries to redraw and the comment correctly says that

  This is a nonsensical thing to do,
  and signaling an error from redisplay might be
  dangerous, but we cannot continue with an invalid frame.

So here the display engine bows out.  OTOH we allow it to set
selected_frame to an equally invalid frame.  Isn't that a bit selfish?

martin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Tue, 17 Mar 2020 15:53:01 GMT) Full text and rfc822 format available.

Message #56 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Tue, 17 Mar 2020 17:51:37 +0200
> Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
> From: martin rudalics <rudalics <at> gmx.at>
> Date: Tue, 17 Mar 2020 10:38:11 +0100
> 
>  > Why does it matter which SELECTED_FRAME crashes?
> 
> Because the next crash may happen at some time in the future.  Why not
> cure the first crash we have right away?

If we can cure it, sure.  But I don't yet see what kind of cure are
you suggesting.  And in any case, the cure is not in SELECTED_FRAME.

>  >>   >> As far as frame.c is concerned, it should do something like in the
>  >>   >> attached patch.
>  >>   >
>  >>   > We cannot punt like that in the display engine.
>  >>
>  >> Why not?
>  >
>  > Because we must have a frame that we were supposed to redisplay.
> 
> Either we are miscommunicating or I' m just dumb.  I would in no way
> restrict the display engine in choosing whatever live frame it wants to
> redisplay.

The original crash, and the crash you reported a couple of messages
upthread, are both in redisplay, though.  So I'm looking for a
solution to those.  Assigning some arbitrary value to a local variable
and/or switching to a different frame can be such solutions, albeit
not optimal ones; the changes you propose for frame.c cannot.

So I'm still unsure what exactly would you propose for the display
engine to do when it needs to examine the selected frame and discovers
that this frame is invalid.

>  > The display engine doesn't select frames to show them to the user, it
>  > selects them to redraw their windows.  So the considerations what to
>  > do in this case are different from those we need to consider when the
>  > user selects a frame.
> 
> As I said above: This is not about the frame its windows it has to
> redraw.  It's about the display engine trying to select a frame after
> it has redrawn (parts of) another frame's windows.

The display engine selects a frame because it needs to display
something related to that frame.  If it cannot select it, it should do
something about that, not just punt.

> :eval deleted the frame being displayed
> 
> So the display engine is, in principle, aware of one incarnation of the
> problem - the one where an :eval tries to delete under its feet the
> frame it currently tries to redraw and the comment correctly says that
> 
>    This is a nonsensical thing to do,
>    and signaling an error from redisplay might be
>    dangerous, but we cannot continue with an invalid frame.

You are proposing that we find all the places where SELECTED_FRAME is
used and fix them one by one?  I thought it could be better to fix
them all at once as part of SELECTED_FRAME.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Tue, 17 Mar 2020 17:32:02 GMT) Full text and rfc822 format available.

Message #59 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Tue, 17 Mar 2020 18:31:18 +0100
[Message part 1 (text/plain, inline)]
> You are proposing that we find all the places where SELECTED_FRAME is
> used and fix them one by one?  I thought it could be better to fix
> them all at once as part of SELECTED_FRAME.

We are still miscommunicating.  I only want to fix the parts that
restore selected_frame so to make sure that they never set it to a dead
frame.  See the attached selected_frame.diff.

martin
[selected_frame.diff (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Tue, 17 Mar 2020 17:47:02 GMT) Full text and rfc822 format available.

Message #62 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Tue, 17 Mar 2020 19:45:39 +0200
> Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
> From: martin rudalics <rudalics <at> gmx.at>
> Date: Tue, 17 Mar 2020 18:31:18 +0100
> 
>  > You are proposing that we find all the places where SELECTED_FRAME is
>  > used and fix them one by one?  I thought it could be better to fix
>  > them all at once as part of SELECTED_FRAME.
> 
> We are still miscommunicating.  I only want to fix the parts that
> restore selected_frame so to make sure that they never set it to a dead
> frame.

Why?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Tue, 17 Mar 2020 18:40:02 GMT) Full text and rfc822 format available.

Message #65 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Tue, 17 Mar 2020 19:39:18 +0100
> Why?

So that SELECTED_FRAME does not abort.

martin






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Tue, 17 Mar 2020 19:42:01 GMT) Full text and rfc822 format available.

Message #68 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Tue, 17 Mar 2020 21:41:26 +0200
> Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
> From: martin rudalics <rudalics <at> gmx.at>
> Date: Tue, 17 Mar 2020 19:39:18 +0100
> 
> > Why?
> 
> So that SELECTED_FRAME does not abort.

I'm sorry, I think I no longer know what we are discussing.  Feel free
to fix this (whatever it is) as you see fit.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Wed, 18 Mar 2020 09:13:01 GMT) Full text and rfc822 format available.

Message #71 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Wed, 18 Mar 2020 10:12:32 +0100
> I'm sorry, I think I no longer know what we are discussing.

For me the present abort is just another instance of Bug#29726 where you
said:

  The reason for the crash is that the ':eval' form which you have on
  the header-line can delete the frame whose header-line Emacs is
  redrawing!  The Lisp-level backtrace below shows how delete-frame is
  called from your code; hopefully, this backtrace will allow you to fix
  your code so it doesn't do such nonsensical things.

Only that in the case at hand 'delete-frame' does not try to delete the
frame the display engine is working on and so your fix for Bug#29726
won't catch it.  Rather, the frame that gets deleted is the frame that
was selected before the display engine started to process the :eval
form.  When, after processing the :eval form and the containing mode
line or title bar format, the display engine wants to restore the
previously selected frame, it sets selected_frame to a dead frame.  And
the next attempt to use selected_frame via SELECTED_FRAME results in the
abort.

> Feel free
> to fix this (whatever it is) as you see fit.

The longer I'm looking into this, the more I think that we should be
much more restrictive wrt what an :eval form in mode line or title name
processing should be allowed to do.  Tab bars could provide even more
confusion.  I think we should disallow any such :eval to kill buffers
and delete windows or frames at the very least.

Maybe it should be also disallowed to select a window or frame or
whatever the display engine tries to restore after processing these
forms.  Such selections would be usually undone anyway by the display
engine.  Probably, we should disallow such :eval forms to modify
"anything" at all but I have no idea how to do that.

martin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Wed, 18 Mar 2020 14:54:01 GMT) Full text and rfc822 format available.

Message #74 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Wed, 18 Mar 2020 16:53:13 +0200
> Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
> From: martin rudalics <rudalics <at> gmx.at>
> Date: Wed, 18 Mar 2020 10:12:32 +0100
> 
>  > Feel free
>  > to fix this (whatever it is) as you see fit.
> 
> The longer I'm looking into this, the more I think that we should be
> much more restrictive wrt what an :eval form in mode line or title name
> processing should be allowed to do.  Tab bars could provide even more
> confusion.  I think we should disallow any such :eval to kill buffers
> and delete windows or frames at the very least.

So we are talking about :eval in mode-line-format (and similar
variables)?

> Maybe it should be also disallowed to select a window or frame or
> whatever the display engine tries to restore after processing these
> forms.  Such selections would be usually undone anyway by the display
> engine.  Probably, we should disallow such :eval forms to modify
> "anything" at all but I have no idea how to do that.

I'm not sure we can detect these actions reliably, as Lisp code can be
very complex.  I think we can only handle the consequences of those
actions.  Which is why I proposed to deal with that in SELECTED_FRAME
(we could, of course, find some other place where the disastrous
results of such code can be detected).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Wed, 18 Mar 2020 18:49:01 GMT) Full text and rfc822 format available.

Message #77 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Wed, 18 Mar 2020 19:48:10 +0100
> So we are talking about :eval in mode-line-format (and similar
> variables)?

I am but I might be wrong.  Maybe Madhu can tell us where that window
quitting operation is issued.

> I'm not sure we can detect these actions reliably, as Lisp code can be
> very complex.  I think we can only handle the consequences of those
> actions.

We already disallow deleting the last live or visible frame and the last
window on a frame.  So the redisplay code, whenever it runs Lisp in
between, could simply set a boolean that will disallow deleting any
window or frame as well as setting the window configuration and other
dangerous operations that implicitly might kill a window or a buffer.

> Which is why I proposed to deal with that in SELECTED_FRAME
> (we could, of course, find some other place where the disastrous
> results of such code can be detected).

SELECTED_FRAME does not necessarily have to abort.  It could return some
other live frame, maybe selecting it on-the-fly, in the hope that the
configuration stabilizes sooner or later.  But this doesn't help with
the fact that such an :eval can do a lot more nasty things like deleting
windows or killing buffers.

martin





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Wed, 18 Mar 2020 19:37:01 GMT) Full text and rfc822 format available.

Message #80 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Wed, 18 Mar 2020 21:36:04 +0200
> Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
> From: martin rudalics <rudalics <at> gmx.at>
> Date: Wed, 18 Mar 2020 19:48:10 +0100
> 
>  > I'm not sure we can detect these actions reliably, as Lisp code can be
>  > very complex.  I think we can only handle the consequences of those
>  > actions.
> 
> We already disallow deleting the last live or visible frame and the last
> window on a frame.

Those situations are easy to detect, so we do that.  You are now
proposing something more sophisticated than that, and I'm afraid that
doing so is not as straightforward as in those few simple cases we
already handle.

> So the redisplay code, whenever it runs Lisp in between, could
> simply set a boolean that will disallow deleting any window or frame
> as well as setting the window configuration and other dangerous
> operations that implicitly might kill a window or a buffer.

The problem is how to do this without breaking legitimate code.  For
example, changing the window configuration temporarily, then changing
it back is quite legitimate, so summarily disallowing such actions is
too drastic and will be hard to justify.

>  > Which is why I proposed to deal with that in SELECTED_FRAME
>  > (we could, of course, find some other place where the disastrous
>  > results of such code can be detected).
> 
> SELECTED_FRAME does not necessarily have to abort.  It could return some
> other live frame, maybe selecting it on-the-fly, in the hope that the
> configuration stabilizes sooner or later.  But this doesn't help with
> the fact that such an :eval can do a lot more nasty things like deleting
> windows or killing buffers.

All we need to do is avoid crashing and keeping the display
up-to-date; any other outcome: error messages, code that doesn't do
what the author expected/intended, and any other annoyance -- are
completely fine, because whoever writes such nasty code will learn a
lesson.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Thu, 19 Mar 2020 03:49:01 GMT) Full text and rfc822 format available.

Message #83 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Madhu <enometh <at> meer.net>
To: rudalics <at> gmx.at
Cc: eliz <at> gnu.org, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Thu, 19 Mar 2020 09:18:31 +0530 (IST)
*  martin rudalics
Wrote on Wed, 18 Mar 2020 19:48:10 +0100
>> So we are talking about :eval in mode-line-format (and similar
>> variables)?
>
> I am but I might be wrong.  Maybe Madhu can tell us where that
> window quitting operation is issued.

I'm afraid I haven't made any progress in figuring it out.  Unless I
hear from you I will send you a mail shortly with my setup (prereqs:
sly and some supported-by-sly lisp implementation already installed on
your system) with the recipe for the crash - (I don't want to post it
to this list)





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Thu, 19 Mar 2020 08:56:01 GMT) Full text and rfc822 format available.

Message #86 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Thu, 19 Mar 2020 09:55:07 +0100
>> We already disallow deleting the last live or visible frame and the last
>> window on a frame.
>
> Those situations are easy to detect, so we do that.

For some value of easy.

> You are now
> proposing something more sophisticated than that, and I'm afraid that
> doing so is not as straightforward as in those few simple cases we
> already handle.

I'm afraid that we already might mishandle some of those simple cases.

>> So the redisplay code, whenever it runs Lisp in between, could
>> simply set a boolean that will disallow deleting any window or frame
>> as well as setting the window configuration and other dangerous
>> operations that implicitly might kill a window or a buffer.
>
> The problem is how to do this without breaking legitimate code.  For
> example, changing the window configuration temporarily, then changing
> it back is quite legitimate,

Right in the middle of redisplay, while constructing the mode line or
the title format?  I won't object but this is something we should decide
ASAP in order to decide which kind of solution to pursue.

> so summarily disallowing such actions is
> too drastic and will be hard to justify.
[...]
> All we need to do is avoid crashing and keeping the display
> up-to-date; any other outcome: error messages, code that doesn't do
> what the author expected/intended, and any other annoyance -- are
> completely fine, because whoever writes such nasty code will learn a
> lesson.

Hmmm...  I thought we have all those emacs_abort instances to ease
debugging.

martin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Thu, 19 Mar 2020 14:35:02 GMT) Full text and rfc822 format available.

Message #89 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Thu, 19 Mar 2020 16:33:53 +0200
> Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
> From: martin rudalics <rudalics <at> gmx.at>
> Date: Thu, 19 Mar 2020 09:55:07 +0100
> 
>  >> We already disallow deleting the last live or visible frame and the last
>  >> window on a frame.
>  >
>  > Those situations are easy to detect, so we do that.
> 
> For some value of easy.

Relatively easy.

>  > You are now
>  > proposing something more sophisticated than that, and I'm afraid that
>  > doing so is not as straightforward as in those few simple cases we
>  > already handle.
> 
> I'm afraid that we already might mishandle some of those simple cases.

That just makes my point stronger, doesn't it?

>  >> So the redisplay code, whenever it runs Lisp in between, could
>  >> simply set a boolean that will disallow deleting any window or frame
>  >> as well as setting the window configuration and other dangerous
>  >> operations that implicitly might kill a window or a buffer.
>  >
>  > The problem is how to do this without breaking legitimate code.  For
>  > example, changing the window configuration temporarily, then changing
>  > it back is quite legitimate,
> 
> Right in the middle of redisplay, while constructing the mode line or
> the title format?

Why not?  As long as things are back as they were by the time :eval
returns, I see no reason to disallow such code.

>  > so summarily disallowing such actions is
>  > too drastic and will be hard to justify.
> [...]
>  > All we need to do is avoid crashing and keeping the display
>  > up-to-date; any other outcome: error messages, code that doesn't do
>  > what the author expected/intended, and any other annoyance -- are
>  > completely fine, because whoever writes such nasty code will learn a
>  > lesson.
> 
> Hmmm...  I thought we have all those emacs_abort instances to ease
> debugging.

No, they are there in cases where we simply don't know how to
continue.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Sat, 21 Mar 2020 09:33:01 GMT) Full text and rfc822 format available.

Message #92 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Sat, 21 Mar 2020 10:32:24 +0100
>> I'm afraid that we already might mishandle some of those simple cases.
>
> That just makes my point stronger, doesn't it?

Not really.  It's easy for delete_frame to refuse deleting a frame right
at the beginning.  But once it has accepted a deletion, it might become
hard to deal with all the consequences.

>>   > The problem is how to do this without breaking legitimate code.  For
>>   > example, changing the window configuration temporarily, then changing
>>   > it back is quite legitimate,
>>
>> Right in the middle of redisplay, while constructing the mode line or
>> the title format?
>
> Why not?  As long as things are back as they were by the time :eval
> returns, I see no reason to disallow such code.

Such a change in the window configuration would take place in a state
where certain variables have temporary settings only.  Selected frame,
selected window and current buffer have been set by redisplay in a fast,
improvised manner.  I would never trust the outcome of save_window_save
or 'set-window-configuration' in such a state.

> No, they are there in cases where we simply don't know how to
> continue.

If that's the reason, then SELECTED_FRAME can easily set selected_frame
to some live frame and continue.

martin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Sat, 21 Mar 2020 13:16:02 GMT) Full text and rfc822 format available.

Message #95 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Sat, 21 Mar 2020 15:15:42 +0200
> Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
> From: martin rudalics <rudalics <at> gmx.at>
> Date: Sat, 21 Mar 2020 10:32:24 +0100
> 
>  >> I'm afraid that we already might mishandle some of those simple cases.
>  >
>  > That just makes my point stronger, doesn't it?
> 
> Not really.  It's easy for delete_frame to refuse deleting a frame right
> at the beginning.  But once it has accepted a deletion, it might become
> hard to deal with all the consequences.

I don't think I understand where you are going with this.

>  >>   > The problem is how to do this without breaking legitimate code.  For
>  >>   > example, changing the window configuration temporarily, then changing
>  >>   > it back is quite legitimate,
>  >>
>  >> Right in the middle of redisplay, while constructing the mode line or
>  >> the title format?
>  >
>  > Why not?  As long as things are back as they were by the time :eval
>  > returns, I see no reason to disallow such code.
> 
> Such a change in the window configuration would take place in a state
> where certain variables have temporary settings only.  Selected frame,
> selected window and current buffer have been set by redisplay in a fast,
> improvised manner.  I would never trust the outcome of save_window_save
> or 'set-window-configuration' in such a state.

This isn't about trust.  This is about letting users' Lisp do anything
they want as long as the results allow redisplay to continue after
that Lisp returns.  I don't think it's right to disallow certain
actions just because they _might_ cause problems.

>  > No, they are there in cases where we simply don't know how to
>  > continue.
> 
> If that's the reason, then SELECTED_FRAME can easily set selected_frame
> to some live frame and continue.

Something like that, yes.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Sun, 22 Mar 2020 18:21:01 GMT) Full text and rfc822 format available.

Message #98 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Sun, 22 Mar 2020 19:20:33 +0100
[Message part 1 (text/plain, inline)]
>> Not really.  It's easy for delete_frame to refuse deleting a frame right
>> at the beginning.  But once it has accepted a deletion, it might become
>> hard to deal with all the consequences.
>
> I don't think I understand where you are going with this.

Once an initial frame has been created, Lisp code should be always able
to rely on the truth of

(and (frame-live-p (selected-frame))
     (window-live-p (selected-window))
     (eq (frame-selected-window (selected-frame))
	 (selected-window)))

Whenever this basic invariant is violated, there is no guarantee that
frame and window management will produce correct results.  Currently,
this invariance is no longer guaranteed if Lisp code is allowed to
manipulate frames and windows arbitrarily while processing the mode line
or the frame title.  IMO there are three possible ways to deal with this
problem:

(1) Let the redisplay code handle it.

(2) Let the frame and window management handle it by disallowing such
operations while they are issued by the mode line or frame title
processing code.

(3) Ignore it and let the frame/window management routines catch up with
it later.

Using (1) way my initial idea.  The patch I proposed handles simple
cases like Madhu's bug.  It will certainly not handle more sophisticated
cases where, for example, an application kills two frames in a row.

(2) is by far the most simple and reliable approach but it will restrict
applications in what they are allowed to do when processing a mode line
or frame title.

(3) means that frame/window management proceeds in a non-deterministic
fashion as long as it has not detected that its basic invariant has been
violated.

> This isn't about trust.  This is about letting users' Lisp do anything
> they want as long as the results allow redisplay to continue after
> that Lisp returns.  I don't think it's right to disallow certain
> actions just because they _might_ cause problems.

You again care about redisplay only.  Which means that frame/window
management is second-class as far as safety is concerned.

>>   > No, they are there in cases where we simply don't know how to
>>   > continue.
>>
>> If that's the reason, then SELECTED_FRAME can easily set selected_frame
>> to some live frame and continue.
>
> Something like that, yes.

I attach a patch that does that.  If you try it with a recipe like
loading


(defvar foo
  '(:eval
    (when (> (length (frame-list)) 1)
      (delete-frame (next-frame)))))

(setq-default mode-line-format foo)

(make-frame)


with emacs -Q you will see that while it works around the crash it still
produces a

Wrong type argument: window-live-p, #<window 3>

error in redisplay.

martin
[SELECTD_FRAME.diff (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Mon, 23 Mar 2020 14:49:01 GMT) Full text and rfc822 format available.

Message #101 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Mon, 23 Mar 2020 16:48:27 +0200
> Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
> From: martin rudalics <rudalics <at> gmx.at>
> Date: Sun, 22 Mar 2020 19:20:33 +0100
> 
> Once an initial frame has been created, Lisp code should be always able
> to rely on the truth of
> 
> (and (frame-live-p (selected-frame))
>       (window-live-p (selected-window))
>       (eq (frame-selected-window (selected-frame))
> 	 (selected-window)))

I agree.  But note that selected-frame could switch frames internally,
if the last selected frame is dead; as long as selected-frame also
adjusts the selected window, the above will still hold.

> (1) Let the redisplay code handle it.
> 
> (2) Let the frame and window management handle it by disallowing such
> operations while they are issued by the mode line or frame title
> processing code.
> 
> (3) Ignore it and let the frame/window management routines catch up with
> it later.
> 
> Using (1) way my initial idea.  The patch I proposed handles simple
> cases like Madhu's bug.  It will certainly not handle more sophisticated
> cases where, for example, an application kills two frames in a row.
> 
> (2) is by far the most simple and reliable approach but it will restrict
> applications in what they are allowed to do when processing a mode line
> or frame title.
> 
> (3) means that frame/window management proceeds in a non-deterministic
> fashion as long as it has not detected that its basic invariant has been
> violated.

I'm okay with having non-deterministic behavior triggered by code that
violates that invariant.  We will tell people who write such Lisp code
"if it hurts, don't do that".

>  > This isn't about trust.  This is about letting users' Lisp do anything
>  > they want as long as the results allow redisplay to continue after
>  > that Lisp returns.  I don't think it's right to disallow certain
>  > actions just because they _might_ cause problems.
> 
> You again care about redisplay only.

Only because the crashes we are discussing were in redisplay.  Not in
general.

>  >> If that's the reason, then SELECTED_FRAME can easily set selected_frame
>  >> to some live frame and continue.
>  >
>  > Something like that, yes.
> 
> I attach a patch that does that.  If you try it with a recipe like
> loading
> 
> 
> (defvar foo
>    '(:eval
>      (when (> (length (frame-list)) 1)
>        (delete-frame (next-frame)))))
> 
> (setq-default mode-line-format foo)
> 
> (make-frame)
> 
> 
> with emacs -Q you will see that while it works around the crash it still
> produces a
> 
> Wrong type argument: window-live-p, #<window 3>
> 
> error in redisplay.

That might not be the best solution, but it's "good enough" in my
book.  The programmer who writes such code deserves punishment, and an
error in redisplay that doesn't lock up Emacs (or does it?) is ample
punishment, IMO.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Tue, 24 Mar 2020 09:46:02 GMT) Full text and rfc822 format available.

Message #104 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Tue, 24 Mar 2020 10:45:16 +0100
>> (and (frame-live-p (selected-frame))
>>        (window-live-p (selected-window))
>>        (eq (frame-selected-window (selected-frame))
>> 	 (selected-window)))
>
> I agree.  But note that selected-frame could switch frames internally,
> if the last selected frame is dead; as long as selected-frame also
> adjusts the selected window, the above will still hold.

Do you mean 'select-frame' instead of 'selected-frame'?  If so, please
note that the problems occur due to the fact that we set selected_frame
and selected_window directly without going through do_switch_frame.

> I'm okay with having non-deterministic behavior triggered by code that
> violates that invariant.  We will tell people who write such Lisp code
> "if it hurts, don't do that".

But till then we may have to handle reports of bugs that are very hard
to reproduce.  In the case at hand the mode-line code runs a function
'sly-db-exit' (https://github.com/joaotavora/sly/blob/master/sly.el)
where practically every single function call can have unpredictable
consequences.  And 'sly-db-exit' might be one of the milder examples of
what code can possibly do there.

>> Wrong type argument: window-live-p, #<window 3>
>>
>> error in redisplay.
>
> That might not be the best solution, but it's "good enough" in my
> book.  The programmer who writes such code deserves punishment, and an
> error in redisplay that doesn't lock up Emacs (or does it?) is ample
> punishment, IMO.

This error might be due to the fact that _any_ of old_top_frame,
old_window and target_frame_window in unwind_format_mode_line can be
dead at the time of unwinding.  unwind_format_mode_line is much to
fragile in this regard.  And I have no idea yet why we need an extra
unwind for restoring selected_frame and selected_window.  Shouldn't
these go hand in hand with what unwind_format_mode_line does?  Does the
one even know about the other?

martin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Sat, 28 Mar 2020 08:24:01 GMT) Full text and rfc822 format available.

Message #107 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Sat, 28 Mar 2020 11:23:29 +0300
> Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
> From: martin rudalics <rudalics <at> gmx.at>
> Date: Tue, 24 Mar 2020 10:45:16 +0100
> 
>  >> (and (frame-live-p (selected-frame))
>  >>        (window-live-p (selected-window))
>  >>        (eq (frame-selected-window (selected-frame))
>  >> 	 (selected-window)))
>  >
>  > I agree.  But note that selected-frame could switch frames internally,
>  > if the last selected frame is dead; as long as selected-frame also
>  > adjusts the selected window, the above will still hold.
> 
> Do you mean 'select-frame' instead of 'selected-frame'?

No, I meant selected-frame.

>  > I'm okay with having non-deterministic behavior triggered by code that
>  > violates that invariant.  We will tell people who write such Lisp code
>  > "if it hurts, don't do that".
> 
> But till then we may have to handle reports of bugs that are very hard
> to reproduce.

Bugs that are caused by such invalid Lisp, and that manifest
themselves by unexpected or unpredictable behavior, are fine with me.
Of course, it would be good to find the causes of such bugs and point
them out to the responsible Lisp programmer, but as long as we don't
crash or lock up, we are in a relatively good shape.

>  >> Wrong type argument: window-live-p, #<window 3>
>  >>
>  >> error in redisplay.
>  >
>  > That might not be the best solution, but it's "good enough" in my
>  > book.  The programmer who writes such code deserves punishment, and an
>  > error in redisplay that doesn't lock up Emacs (or does it?) is ample
>  > punishment, IMO.
> 
> This error might be due to the fact that _any_ of old_top_frame,
> old_window and target_frame_window in unwind_format_mode_line can be
> dead at the time of unwinding.  unwind_format_mode_line is much to
> fragile in this regard.

Perhaps we should make unwind_format_mode_line less fragile, then.

> And I have no idea yet why we need an extra unwind for restoring
> selected_frame and selected_window.  Shouldn't these go hand in hand
> with what unwind_format_mode_line does?  Does the one even know
> about the other?

I don't think I understand what extra unwind are you talking about
here.  Can you provide a more specific pointer to the relevant code?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Mon, 30 Mar 2020 02:36:31 GMT) Full text and rfc822 format available.

Message #110 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Sat, 28 Mar 2020 19:38:24 +0100
[Message part 1 (text/plain, inline)]
>>   >> (and (frame-live-p (selected-frame))
>>   >>        (window-live-p (selected-window))
>>   >>        (eq (frame-selected-window (selected-frame))
>>   >> 	 (selected-window)))
>>   >
>>   > I agree.  But note that selected-frame could switch frames internally,
>>   > if the last selected frame is dead; as long as selected-frame also
>>   > adjusts the selected window, the above will still hold.
>>
>> Do you mean 'select-frame' instead of 'selected-frame'?
>
> No, I meant selected-frame.

So you meant a 'selected-frame' based on the SELECTED_FRAME I proposed.
The problem is that that macro won't protect the invariant if it was
broken before, for example, by setting the selected window to a dead
window as in Madhu's scenario.

> Perhaps we should make unwind_format_mode_line less fragile, then.

The problematic step happens already at the time we set it up.  In a
nutshell, Madhu's scenario goes as follows: In display_mode_lines we do

  Lisp_Object old_selected_window = selected_window;
  Lisp_Object old_selected_frame = selected_frame;
...
      display_mode_line (w, CURRENT_MODE_LINE_FACE_ID_3 (sel_w, sel_w, w),
			 NILP (window_mode_line_format)
			 ? BVAR (current_buffer, mode_line_format)
			 : window_mode_line_format);
...
  XFRAME (new_frame)->selected_window = old_frame_selected_window;
  selected_frame = old_selected_frame;
  selected_window = old_selected_window;

where display_mode_line deletes both, old_selected_frame and
old_selected_window.  So we end up with selected_frame and
selected_window both referencing dead objects.  The subsequent call of
gui_consider_frame_title now does

      record_unwind_protect (unwind_format_mode_line,
			     format_mode_line_unwind_data
			       (f, current_buffer, selected_window, false));

where selected_window is already a dead window.  Since in
unwind_format_mode_line old_window is non-nil, it will call

      Fselect_window (old_window, Qt);

which first chokes on

   CHECK_LIVE_WINDOW (window);

which is the error reported when emacs does not crash and finally on

  sf = SELECTED_FRAME ();

which crashes emacs due to the fact that selected_frame is dead.  In
either case, making the unwind_format_mode_line less fragile won't avoid
any crash, it might just postpone it.

>> And I have no idea yet why we need an extra unwind for restoring
>> selected_frame and selected_window.  Shouldn't these go hand in hand
>> with what unwind_format_mode_line does?  Does the one even know
>> about the other?
>
> I don't think I understand what extra unwind are you talking about
> here.  Can you provide a more specific pointer to the relevant code?

I meant the fact that we already do unwind_format_mode_line when
formatting the mode line and that function could restore the selected
window in a safe way.  I'm far from proposing to use that approach when
drawing the mode lines, though.

Attached find a patch which should solve the more grave problems caused
by a function deleting the previously selected frame or window.  It
intentionally does not change SELECTED_FRAME.  Any abort there should be
reserved to obscure bugs we have not been able to trace yet.  Please
read it with your usual care, it took me some time to convince myself
that it selects its frame in a reasonable way.

On master, I would then like to use restore_selected_window also for
gui_consider_frame_title.  The overhead caused by that is a great
annoyance (especially when debugging frame switching code) and we could
then hopefully get rid of the old_window stuff and the Bug#32777 fix as
well.

What any patch I provide here cannot do is to fix problems when the mode
line code deletes the selected window right away.  Code like


(defvar window (split-window))

(defvar foo
  '(:eval
    (if (or (not (window-live-p window))
	    (eq window (frame-first-window)))
	(setq window (split-window))
      (delete-window window))))

(setq-default mode-line-format foo)


will continue to segfault unless you can cure that.  I tried to fix it
in the spirit of

		if (!FRAME_LIVE_P (it->f))
		  signal_error (":eval deleted the frame being displayed", elt);

but that just caused emacs to hang.

martin
[restore_selected_window.diff (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Fri, 03 Apr 2020 16:33:01 GMT) Full text and rfc822 format available.

Message #113 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Fri, 3 Apr 2020 18:32:18 +0200
[Message part 1 (text/plain, inline)]
> Attached find a patch which should solve the more grave problems caused
> by a function deleting the previously selected frame or window.

Fixing a few more things as attached.

martin
[restore_selected_window.diff (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Fri, 10 Apr 2020 11:52:02 GMT) Full text and rfc822 format available.

Message #116 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Madhu <enometh <at> meer.net>
To: rudalics <at> gmx.at
Cc: eliz <at> gnu.org, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Fri, 10 Apr 2020 17:21:13 +0530 (IST)
*  martin rudalics <e0a053fb-7752-3d55-ef8c-d05b44d2fdc3 <at> gmx.at>
Wrote on Fri, 3 Apr 2020 18:32:18 +0200
> Fixing a few more things as attached.

I've been running this patch for a while and believe it handles the
situation I reported adequately.

(PS I was incorrect in an earlier report above about a particular
patch working)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Wed, 30 Sep 2020 15:08:01 GMT) Full text and rfc822 format available.

Message #119 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 39977 <at> debbugs.gnu.org, enometh <at> meer.net
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Wed, 30 Sep 2020 17:06:52 +0200
martin rudalics <rudalics <at> gmx.at> writes:

>> Attached find a patch which should solve the more grave problems caused
>> by a function deleting the previously selected frame or window.
>
> Fixing a few more things as attached.

Madhu <enometh <at> meer.net> writes:

> *  martin rudalics <e0a053fb-7752-3d55-ef8c-d05b44d2fdc3 <at> gmx.at>
> Wrote on Fri, 3 Apr 2020 18:32:18 +0200
>> Fixing a few more things as attached.
>
> I've been running this patch for a while and believe it handles the
> situation I reported adequately.

This was the final message in this long thread (in April).  As far as I
can see, Martin's patch was not applied.  I haven't read the entire
thread, but were there other problems here?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Wed, 30 Sep 2020 15:33:01 GMT) Full text and rfc822 format available.

Message #122 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: rudalics <at> gmx.at, enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Wed, 30 Sep 2020 18:31:37 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: Eli Zaretskii <eliz <at> gnu.org>,  enometh <at> meer.net,  39977 <at> debbugs.gnu.org
> Date: Wed, 30 Sep 2020 17:06:52 +0200
> 
> martin rudalics <rudalics <at> gmx.at> writes:
> 
> >> Attached find a patch which should solve the more grave problems caused
> >> by a function deleting the previously selected frame or window.
> >
> > Fixing a few more things as attached.
> 
> Madhu <enometh <at> meer.net> writes:
> 
> > *  martin rudalics <e0a053fb-7752-3d55-ef8c-d05b44d2fdc3 <at> gmx.at>
> > Wrote on Fri, 3 Apr 2020 18:32:18 +0200
> >> Fixing a few more things as attached.
> >
> > I've been running this patch for a while and believe it handles the
> > situation I reported adequately.
> 
> This was the final message in this long thread (in April).  As far as I
> can see, Martin's patch was not applied.  I haven't read the entire
> thread, but were there other problems here?

I think Martin's patch should be installed, but let's wait for Martin
to chime in.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Wed, 30 Sep 2020 17:30:02 GMT) Full text and rfc822 format available.

Message #125 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>, Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: enometh <at> meer.net, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Wed, 30 Sep 2020 19:29:15 +0200
> I think Martin's patch should be installed, but let's wait for Martin
> to chime in.

AFAICT I wrote that shortly before two of my OS installations crashed
almost simultaneously and I lost some of my work including this.  So I
might have run Emacs with that patch applied for a couple of weeks at
most.  I hope though that my reasoning was mostly correct back then and
since IIUC Madhu confirmed that it fixes his use case it should be safe
to install (WITHOUT WARRANTY).

martin




Added tag(s) fixed. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Thu, 01 Oct 2020 00:01:01 GMT) Full text and rfc822 format available.

bug marked as fixed in version 28.1, send any further explanations to 39977 <at> debbugs.gnu.org and Madhu <enometh <at> meer.net> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Thu, 01 Oct 2020 00:01:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Thu, 01 Oct 2020 00:02:02 GMT) Full text and rfc822 format available.

Message #132 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 39977 <at> debbugs.gnu.org, enometh <at> meer.net
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Thu, 01 Oct 2020 02:01:19 +0200
martin rudalics <rudalics <at> gmx.at> writes:

> AFAICT I wrote that shortly before two of my OS installations crashed
> almost simultaneously and I lost some of my work including this.  So I
> might have run Emacs with that patch applied for a couple of weeks at
> most.  I hope though that my reasoning was mostly correct back then and
> since IIUC Madhu confirmed that it fixes his use case it should be safe
> to install (WITHOUT WARRANTY).

OK, I've now applied the patch to Emacs 28 after some very light testing.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#39977; Package emacs. (Thu, 01 Oct 2020 04:39:01 GMT) Full text and rfc822 format available.

Message #135 received at 39977 <at> debbugs.gnu.org (full text, mbox):

From: Madhu <enometh <at> meer.net>
To: larsi <at> gnus.org
Cc: rudalics <at> gmx.at, eliz <at> gnu.org, 39977 <at> debbugs.gnu.org
Subject: Re: bug#39977: 28.0.50; Unhelpful stack trace
Date: Thu, 01 Oct 2020 10:07:43 +0530 (IST)
I had been running this patch locally all this while. [On rebasing my
local branch on master today I notiece this patch was dropped because
it was already on master so it was the identical thing I was running]






bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 29 Oct 2020 11:24:08 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 192 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.