GNU bug report logs - #41321
27.0.91; Emacs aborts due to invalid pseudovector objects

Previous Next

Package: emacs;

Reported by: Eli Zaretskii <eliz <at> gnu.org>

Date: Sat, 16 May 2020 10:34:02 UTC

Severity: normal

Found in version 27.0.91

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 41321 in the body.
You can then email your comments to 41321 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 16 May 2020 10:34:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eli Zaretskii <eliz <at> gnu.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sat, 16 May 2020 10:34:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: bug-gnu-emacs <at> gnu.org
Subject: 27.0.91; Emacs aborts due to invalid pseudovector objects
Date: Sat, 16 May 2020 13:33:13 +0300
I don't have a reproducible recipe, unfortunately.

What happens is that Emacs aborts a short time after reverting a
buffer (reverted because the file it is visiting was changed on disk).
So far, I've seen this in a C Mode buffer reverted because "git pull"
brought a modified version, and in an Info mode buffer reverted
because the manual was rebuilt after the Texinfo sources were
modified.  In the latter case I captured a backtrace, see below.

The problem seem to involve invalid markers, perhaps markers that were
unchained and put on the free list (witness the PVEC_FREE object that
caused the abort in the backtrace below, where Emacs seems to be
trying to display an error message about an invalid marker).

I don't think I saw such problems in Emacs 27.0.90, so I walked
through all the changes since then till 27.0.91 release, but didn't
see anything that could explain the problem.

Needless to say, this is a serious problem, so I'd like to ask
everyone to please run the latest pretest under a debugger and report
any similar problems with all the details they can provide.

Here's the backtrace and some additional information from the session
where it happened last:

Thread 1 hit Breakpoint 3, 0x77c36bb3 in msvcrt!abort ()
   from C:\WINDOWS\system32\msvcrt.dll
(gdb) bt
#0  0x77c36bb3 in msvcrt!abort () from C:\WINDOWS\system32\msvcrt.dll
#1  0x011cfdd8 in emacs_abort () at w32fns.c:10893
#2  0x01175f3a in print_vectorlike (obj=<optimized out>,
    printcharfun=XIL(0x30), escapeflag=escapeflag <at> entry=true,
    buf=buf <at> entry=0x82f07a "") at print.c:1830
#3  0x01172055 in print_object (obj=<optimized out>, printcharfun=XIL(0x30),
    escapeflag=true) at print.c:2148
#4  0x01172f04 in print (obj=<optimized out>, printcharfun=<optimized out>,
    escapeflag=<optimized out>, escapeflag <at> entry=true) at print.c:1147
#5  0x01173355 in Fprin1 (object=XIL(0xa00000001c9866d8),
    printcharfun=<optimized out>) at print.c:653
#6  0x0117483b in print_error_message (data=<optimized out>,
    stream=<optimized out>, context=<optimized out>, caller=<optimized out>)
    at print.c:979
#7  0x010c13c5 in Fcommand_error_default_function (
    data=XIL(0xc000000000ff92e0), context=XIL(0x80000000058e9118),
    sys_signal=XIL(0x5d72548)) at keyboard.c:1029
#8  0x0114fb99 in funcall_subr (subr=<optimized out>,
    numargs=<optimized out>, numargs <at> entry=3, args=<optimized out>,
    args <at> entry=0x82f498) at eval.c:2872
#9  0x0114d9fd in Ffuncall (nargs=4, args=0x82f490) at eval.c:2794
#10 0x0114dca3 in Fapply (nargs=<optimized out>, args=<optimized out>)
    at eval.c:2424
#11 0x0114d9fd in Ffuncall (nargs=3, args=args <at> entry=0x82f590) at eval.c:2794
#12 0x0118eaf7 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=<optimized out>, nargs <at> entry=3,
    args=<optimized out>, args <at> entry=0x82f888) at bytecode.c:633
#13 0x0115125f in funcall_lambda (fun=<optimized out>, nargs=nargs <at> entry=3,
    arg_vector=arg_vector <at> entry=0x82f888) at eval.c:2989
#14 0x0114d953 in Ffuncall (nargs=nargs <at> entry=4, args=args <at> entry=0x82f880)
    at eval.c:2808
#15 0x01151d29 in call3 (fn=XIL(0xa000000005e00b20),
    arg1=XIL(0xc000000000ff92e0), arg2=XIL(0x80000000058e9118),
    arg3=XIL(0x5d72548)) at eval.c:2668
#16 0x010c5020 in cmd_error_internal (data=XIL(0xc000000000ff92e0),
    context=context <at> entry=0x82f92e "") at keyboard.c:984
#17 0x010c51e6 in cmd_error (data=XIL(0xc000000000ff92e0)) at keyboard.c:953
#18 0x0114c952 in internal_condition_case (
    bfun=bfun <at> entry=0x10d0a0e <command_loop_1>, handlers=XIL(0x90),
    hfun=hfun <at> entry=0x10c5049 <cmd_error>) at eval.c:1351
#19 0x010bdbda in command_loop_2 (ignore=XIL(0)) at keyboard.c:1091
#20 0x0114c8a6 in internal_catch (tag=XIL(0xdfb0),
    func=func <at> entry=0x10bdbb3 <command_loop_2>, arg=XIL(0)) at eval.c:1116
#21 0x010bdb5d in command_loop () at keyboard.c:1070
#22 0x010c4bf3 in recursive_edit_1 () at keyboard.c:714
#23 0x010c4f0c in Frecursive_edit () at keyboard.c:786
#24 0x0124a4a4 in main (argc=<optimized out>, argv=<optimized out>)
    at emacs.c:2054

Lisp Backtrace:
"command-error-default-function" (0x82f498)
"apply" (0x82f598)
0x5e00b20 PVEC_COMPILED
(gdb) fr 2
#2  0x01175f3a in print_vectorlike (obj=<optimized out>,
    printcharfun=XIL(0x30), escapeflag=escapeflag <at> entry=true,
    buf=buf <at> entry=0x82f07a "") at print.c:1830
1830          emacs_abort ();
(gdb) fr 7
#7  0x010c13c5 in Fcommand_error_default_function (
    data=XIL(0xc000000000ff92e0), context=XIL(0x80000000058e9118),
    sys_signal=XIL(0x5d72548)) at keyboard.c:1029
1029          print_error_message (data, Qt, SSDATA (context), signal);
(gdb) p data
$1 = XIL(0xc000000000ff92e0)
(gdb) xtype
Lisp_Cons
(gdb) xcar
$2 = 0xfd80
(gdb) xtype
Lisp_Symbol
(gdb) xsym
xsymbol   xsymname
(gdb) xsymbol
$3 = (struct Lisp_Symbol *) 0x15d9f60 <lispsym+64896>
"wrong-type-argument"
(gdb) p data
$4 = XIL(0xc000000000ff92e0)
(gdb) xcdr
$5 = 0xc000000000ff9300
(gdb) xtype
Lisp_Cons
(gdb) xcar
$6 = 0x9810
(gdb) xtype
Lisp_Symbol
(gdb) xsymbol
$7 = (struct Lisp_Symbol *) 0x15d39f0 <lispsym+38928>
"markerp"
(gdb) p data
$8 = XIL(0xc000000000ff92e0)
(gdb) xcdr
$9 = 0xc000000000ff9300
(gdb) xcdr
$10 = 0xc000000000ff9310
(gdb) xtype
Lisp_Cons
(gdb) xcar
$11 = 0xa00000001c9866d8
(gdb) xtype
Lisp_Vectorlike
PVEC_FREE
(gdb) fr 17
#17 0x010c51e6 in cmd_error (data=XIL(0xc000000000ff92e0)) at keyboard.c:953
953       cmd_error_internal (data, macroerror);

In GNU Emacs 27.0.91 (build 1, i686-pc-mingw32)
 of 2020-04-18 built on HOME-C4E4A596F7
Windowing system distributor 'Microsoft Corp.', version 5.1.2600
System Description: Microsoft Windows XP Service Pack 3 (v5.1.0.2600)

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Configured using:
 'configure --prefix=/d/usr --with-wide-int --with-modules 'CFLAGS=-O2
 -gdwarf-4 -g3''

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND NOTIFY W32NOTIFY ACL GNUTLS LIBXML2
HARFBUZZ ZLIB TOOLKIT_SCROLL_BARS MODULES THREADS JSON PDUMPER LCMS2 GMP

Important settings:
  value of $LANG: ENU
  locale-coding-system: cp1255

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny dired dired-loaddefs
format-spec rfc822 mml easymenu mml-sec password-cache epa derived epg
epg-config gnus-util rmail rmail-loaddefs text-property-search time-date
subr-x seq byte-opt gv bytecomp byte-compile cconv mm-decode mm-bodies
mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs
cl-lib sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils
tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type
mwheel dos-w32 ls-lisp disp-table term/w32-win w32-win w32-vars
term/common-win tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch timer
select scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang
misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms
cp51932 hebrew greek romanian slovak czech european ethiopic indian
cyrillic chinese composite charscript charprop case-table epa-hook
jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote threads w32notify w32 lcms2 multi-tty make-network-process
emacs)

Memory information:
((conses 16 50536 10936)
 (symbols 48 7172 1)
 (strings 16 18837 2268)
 (string-bytes 1 532938)
 (vectors 16 9527)
 (vector-slots 8 127687 7318)
 (floats 8 21 170)
 (intervals 40 254 84)
 (buffers 888 11))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 16 May 2020 16:34:02 GMT) Full text and rfc822 format available.

Message #8 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org
Subject: Re: 27.0.91; Emacs aborts due to invalid pseudovector objects
Date: Sat, 16 May 2020 09:33:35 -0700
I fooled around a bit with emacs-27 on Ubuntu 18.04.4 (compiled in 32-bit mode
--with-wide-int) and couldn't reproduce it. I'll keep trying.

Could you give more details about the failures you observed? That might help
attempts at reproducing. How did you revert your info buffer - was it by typing
"M-x revert-buffer"? Are you using auto-revert-mode? That sort of thing.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 16 May 2020 16:48:02 GMT) Full text and rfc822 format available.

Message #11 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 41321 <at> debbugs.gnu.org
Subject: Re: 27.0.91; Emacs aborts due to invalid pseudovector objects
Date: Sat, 16 May 2020 19:47:29 +0300
> Cc: 41321 <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Sat, 16 May 2020 09:33:35 -0700
> 
> I fooled around a bit with emacs-27 on Ubuntu 18.04.4 (compiled in 32-bit mode
> --with-wide-int) and couldn't reproduce it. I'll keep trying.

Yes, I didn't succeed reproducing it on purpose, either.  Not sure
why, maybe there's some other factor that is at work, e.g. how many
markers are there in the buffer.

> Could you give more details about the failures you observed? That might help
> attempts at reproducing. How did you revert your info buffer - was it by typing
> "M-x revert-buffer"? Are you using auto-revert-mode? That sort of thing.

Just "M-x revert-buffer RET" followed by 'y'.  I don't use
auto-revert-mode.

In the Git case, I would usually switch to a buffer visiting the file,
perhaps via "M-.", and Emacs would ask me whether to re-read the file
into its buffer, I'd say yes, and then I see an error about a bad
marker; the next command would abort Emacs.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sun, 17 May 2020 10:58:02 GMT) Full text and rfc822 format available.

Message #14 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sun, 17 May 2020 10:56:28 +0000
On Sat, May 16, 2020 at 10:34 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> So far, I've seen this in a C Mode buffer reverted because "git pull"
> brought a modified version, and in an Info mode buffer reverted
> because the manual was rebuilt after the Texinfo sources were
> modified.  In the latter case I captured a backtrace, see below.
>
> The problem seem to involve invalid markers, perhaps markers that were
> unchained and put on the free list

Even unchained markers shouldn't be put on the free list as long as
they're still reachable, so I suspect the problem is more likely to be
caused by that.

> (witness the PVEC_FREE object that
> caused the abort in the backtrace below, where Emacs seems to be
> trying to display an error message about an invalid marker).

What I would do next is run with a breakpoint on wrong_type_argument
(if that's impossible, change the code in CHECK_MARKER to abort upon
encountering a PVEC_FREE vector) to see where the reference to the
freed pseudovector came from. An undo list, maybe?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sun, 17 May 2020 15:29:01 GMT) Full text and rfc822 format available.

Message #17 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 41321 <at> debbugs.gnu.org
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sun, 17 May 2020 18:28:04 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Sun, 17 May 2020 10:56:28 +0000
> Cc: 41321 <at> debbugs.gnu.org
> 
> What I would do next is run with a breakpoint on wrong_type_argument
> (if that's impossible, change the code in CHECK_MARKER to abort upon
> encountering a PVEC_FREE vector) to see where the reference to the
> freed pseudovector came from. An undo list, maybe?

I'm already running with such a breakpoint, let's how it will catch
something.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sun, 17 May 2020 15:59:01 GMT) Full text and rfc822 format available.

Message #20 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: pipcet <at> gmail.com
Cc: 41321 <at> debbugs.gnu.org
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sun, 17 May 2020 18:57:53 +0300
> Date: Sun, 17 May 2020 18:28:04 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 41321 <at> debbugs.gnu.org
> 
> I'm already running with such a breakpoint, let's how it will catch
> something.                                        ^^^

Should have been "hope".  Sorry.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 22 May 2020 07:24:01 GMT) Full text and rfc822 format available.

Message #23 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: pipcet <at> gmail.com, Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 41321 <at> debbugs.gnu.org
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 22 May 2020 10:22:56 +0300
> Date: Sun, 17 May 2020 18:57:53 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 41321 <at> debbugs.gnu.org
> 
> > Date: Sun, 17 May 2020 18:28:04 +0300
> > From: Eli Zaretskii <eliz <at> gnu.org>
> > Cc: 41321 <at> debbugs.gnu.org
> > 
> > I'm already running with such a breakpoint, let's how it will catch
> > something.                                        ^^^
> 
> Should have been "hope".  Sorry.

It happened again, and now insert-file-contents wasn't involved, so I
guess it's off the hook.  The command which triggered the problem was
self-insert-command, as shown in the backtrace below.  The problem
seems to be with handling overlays when buffer text changes.

The backtrace below, as well as some tinkering with values of relevant
variables, indicate that the buffer has two overlays, both of which
point to invalid memory.  The crash happens here:

  /* Now run the before-change-functions if any.  */
  if (!NILP (Vbefore_change_functions))
    {
      rvoe_arg.location = &Vbefore_change_functions;
      rvoe_arg.errorp = 1;

      PRESERVE_VALUE;
      PRESERVE_START_END;

      /* Mark before-change-functions to be reset to nil in case of error.  */
      record_unwind_protect_ptr (reset_var_on_error, &rvoe_arg);

      /* Actually run the hook functions.  */
      CALLN (Frun_hook_with_args, Qbefore_change_functions,
	     FETCH_START, FETCH_END);

      /* There was no error: unarm the reset_on_error.  */
      rvoe_arg.errorp = 0;
    }

  if (buffer_has_overlays ())
    {
      PRESERVE_VALUE;
      report_overlay_modification (FETCH_START, FETCH_END, 0,  <<<<<<<<<<<<
				   FETCH_START, FETCH_END, Qnil);
    }

FETCH_END calls marker-position, and that segfaults because the marker
points to invalid memory, which was probably unmapped from the process
address space (so I guess this is w32-specific, as GNU systems don't
really return memory to the system).  The start_marker is also
invalid, it's just that FETCH_END is called first.

Since the previous call to before-change-functions already used the
same overlay markers, I suspect that the call to
before-change-functions caused the memory to be unmapped (perhaps due
to GC).  As you see below, the value of before-change-functions is

  (t syntax-ppss-flush-cache)

So the prime suspect is what happens when syntax-ppss-flush-cache
runs, and thus I CC Stefan.  The main question to answer now from my
POV is how come a marker on buffer position 3116 which was valid
before before-change-functions was called became invalid as result of
some Lisp, in particular as result of calling before-change-functions.

Here's the backtrace; ideas for further debugging are welcome.

  Thread 1 received signal SIGSEGV, Segmentation fault.
  PSEUDOVECTORP (code=<optimized out>, a=<optimized out>) at lisp.h:1720
  1720          return PSEUDOVECTOR_TYPEP (XUNTAG (a, Lisp_Vectorlike,
  (gdb) bt
  #0  PSEUDOVECTORP (code=<optimized out>, a=<optimized out>) at lisp.h:1720
  #1  MARKERP (x=<optimized out>) at lisp.h:2618
  #2  CHECK_MARKER (x=XIL(0xa000000018ac0518)) at marker.c:133
  #3  0x010f073c in Fmarker_position (marker=XIL(0xa000000018ac0518))
      at marker.c:452
  #4  0x010edd34 in signal_before_change (preserve_ptr=0x0, end_int=3116,
      start_int=3116) at insdel.c:2179
  #5  prepare_to_modify_buffer_1 (start=start <at> entry=3116, end=end <at> entry=3116,
      preserve_ptr=preserve_ptr <at> entry=0x0) at insdel.c:2007
  #6  0x010ee27d in prepare_to_modify_buffer (start=3116, end=3116,
      preserve_ptr=preserve_ptr <at> entry=0x0) at insdel.c:2018
  #7  0x010ee54d in insert_1_both (string=string <at> entry=0x82ef1b "r",
      nchars=nchars <at> entry=1, nbytes=nbytes <at> entry=1, inherit=inherit <at> entry=true,
      prepare=prepare <at> entry=true, before_markers=before_markers <at> entry=false)
      at insdel.c:896
  #8  0x010ef005 in insert_1_both (before_markers=false, prepare=true,
      inherit=true, nbytes=1, nchars=1, string=0x82ef1b "r") at insdel.c:697
  #9  insert_and_inherit (string=string <at> entry=0x82ef1b "r",
      nbytes=nbytes <at> entry=1) at insdel.c:692
  #10 0x01107160 in internal_self_insert (c=114, n=<optimized out>)
      at cmds.c:477
  #11 0x01107804 in Fself_insert_command (n=make_fixnum(1), c=<optimized out>)
      at cmds.c:302
  #12 0x0114fb6c in funcall_subr (subr=<optimized out>,
      numargs=<optimized out>, numargs <at> entry=2, args=<optimized out>,
      args <at> entry=0x82f120) at eval.c:2869
  #13 0x0114d9fd in Ffuncall (nargs=nargs <at> entry=3, args=args <at> entry=0x82f118)
      at eval.c:2794
  #14 0x01148f7d in Ffuncall_interactively (nargs=3, args=0x82f118)
      at callint.c:254
  #15 0x0114d9fd in Ffuncall (nargs=4, args=0x82f110) at eval.c:2794
  #16 0x0114dca3 in Fapply (nargs=<optimized out>, nargs <at> entry=3,
      args=<optimized out>, args <at> entry=0x82f288) at eval.c:2424
  #17 0x0114aecb in Fcall_interactively (function=XIL(0x43b3350),
      record_flag=<optimized out>, keys=XIL(0xa000000016a31530))
      at callint.c:342
  #18 0x0114fb99 in funcall_subr (subr=<optimized out>,
      numargs=<optimized out>, numargs <at> entry=3, args=<optimized out>,
      args <at> entry=0x82f430) at eval.c:2872
  #19 0x0114d9fd in Ffuncall (nargs=4, args=args <at> entry=0x82f428) at eval.c:2794
  #20 0x0118eaf7 in exec_byte_code (bytestr=<optimized out>,
      vector=<optimized out>, maxdepth=<optimized out>,
      args_template=<optimized out>, nargs=<optimized out>, nargs <at> entry=1,
      args=<optimized out>, args <at> entry=0x82f7b8) at bytecode.c:633
  #21 0x0115125f in funcall_lambda (fun=<optimized out>, nargs=nargs <at> entry=1,
      arg_vector=arg_vector <at> entry=0x82f7b8) at eval.c:2989
  #22 0x0114d953 in Ffuncall (nargs=nargs <at> entry=2, args=args <at> entry=0x82f7b0)
      at eval.c:2808
  #23 0x0114db2c in call1 (fn=XIL(0x3f30), arg1=XIL(0x43b3350)) at eval.c:2654
  #24 0x010d0efe in command_loop_1 () at keyboard.c:1463
  #25 0x0114c91f in internal_condition_case (
      bfun=bfun <at> entry=0x10d0a0e <command_loop_1>, handlers=XIL(0x90),
      hfun=hfun <at> entry=0x10c5049 <cmd_error>) at eval.c:1355
  #26 0x010bdbda in command_loop_2 (ignore=XIL(0)) at keyboard.c:1091
  #27 0x0114c8a6 in internal_catch (tag=XIL(0xdfb0),
      func=func <at> entry=0x10bdbb3 <command_loop_2>, arg=XIL(0)) at eval.c:1116
  #28 0x010bdb5d in command_loop () at keyboard.c:1070
  #29 0x010c4bf3 in recursive_edit_1 () at keyboard.c:714
  #30 0x010c4f0c in Frecursive_edit () at keyboard.c:786
  #31 0x0124a4a4 in main (argc=<optimized out>, argv=<optimized out>)
      at emacs.c:2054

  Lisp Backtrace:
  "self-insert-command" (0x82f120)
  "funcall-interactively" (0x82f118)
  "call-interactively" (0x82f430)
  "command-execute" (0x82f7b8)
  (gdb) fr 3
  #3  CHECK_MARKER (x=XIL(0xa000000018ac0518)) at marker.c:133
  133       CHECK_TYPE (MARKERP (x), Qmarkerp, x);
  (gdb) up
  #4  0x010f073c in Fmarker_position (marker=XIL(0xa000000018ac0518))
      at marker.c:452
  452       CHECK_MARKER (marker);
  (gdb) p marker
  $3 = XIL(0xa000000018ac0518)
  (gdb) xtype
  Lisp_Vectorlike
  Cannot access memory at address 0x18ac0518
  (gdb) p marker+0
  $4 = -6917529027227155176
  (gdb) p/x marker+0
  $5 = 0xa000000018ac0518
  (gdb) up
  #5  0x010edd34 in signal_before_change (preserve_ptr=0x0, end_int=3116,
      start_int=3116) at insdel.c:2179
  2179          report_overlay_modification (FETCH_START, FETCH_END, 0,
  (gdb) p Vbefore_change_functions
  $6 = XIL(0xc000000018dbef20)
  (gdb) xtype
  Lisp_Cons
  (gdb) xcar
  $7 = 0x30
  (gdb) xtype
  Lisp_Symbol
  (gdb) xsymbol
  $8 = (struct Lisp_Symbol *) 0x15ca210 <lispsym+48>
  "t"
  (gdb) p Vbefore_change_functions
  $9 = XIL(0xc000000018dbef20)
  (gdb) xcdr
  $10 = 0xc000000018dbf410
  (gdb) xcar
  $11 = 0xd5c0
  (gdb) xtype
  Lisp_Symbol
  (gdb) xsym
  xsymbol   xsymname
  (gdb) xsymbol
  $12 = (struct Lisp_Symbol *) 0x15d77a0 <lispsym+54720>
  "syntax-ppss-flush-cache"
  (gdb) p Vbefore_change_functions
  $13 = XIL(0xc000000018dbef20)
  (gdb) xcdr
  $14 = 0xc000000018dbf410
  (gdb) xcdr
  $15 = 0x0
  (gdb) p start
  $16 = <optimized out>
  (gdb) p start_int
  $17 = 3116
  (gdb) p end_int
  $18 = 3116
  (gdb) p start_marker
  $19 = XIL(0xa000000018ac04f8)
  (gdb) p end_marker
  $20 = XIL(0xa000000018ac0518)
  (gdb) p start_marker
  $21 = XIL(0xa000000018ac04f8)
  (gdb) xtype
  Lisp_Vectorlike
  Cannot access memory at address 0x18ac04f8
  (gdb) p preserve_ptr
  $22 = (ptrdiff_t *) 0x0
  (gdb) p *(current_buffer->text->beg+3000)
  $23 = 115 's'
  (gdb) p *(current_buffer->text->beg+3000)@200
  $24 = "sense would then\nsuggest us that the feature should be extended to other means of\ndkispaying messages in the echo a", '\000' <repeats 84 times>
  (gdb) p *(current_buffer->text->beg+3116)
  $25 = 0 '\000'
  (gdb) p GPT
  $26 = 3116
  (gdb) p GPT_ADDR
  $27 = (unsigned char *) 0x7d80c2b ""
  (gdb) p current_buffer->overlays_before
  $28 = (struct Lisp_Overlay *) 0x170cb080
  (gdb) p $28->start
  $29 = XIL(0xa0000000170cb040)
  (gdb) xtype
  Lisp_Vectorlike
  Cannot access memory at address 0x18ac04f8
  (gdb) p $28->next
  $30 = (struct Lisp_Overlay *) 0x13050320
  (gdb) p $28->next->start
  $31 = XIL(0xa000000016172310)
  (gdb) xtype
  Lisp_Vectorlike
  Cannot access memory at address 0x18ac04f8
  (gdb) p current_buffer->overlays_after
  $32 = (struct Lisp_Overlay *) 0x0
  (gdb) p $28->next->next
  $33 = (struct Lisp_Overlay *) 0x0
  (gdb) p rvoe_arg.location
  $35 = (Lisp_Object *) 0x15c9298 <globals+120>
  (gdb) xtype
  Lisp_Vectorlike
  Cannot access memory at address 0x18ac04f8
  (gdb) p rvoe_arg.errorp
  $36 = false





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 22 May 2020 08:37:02 GMT) Full text and rfc822 format available.

Message #26 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Andrea Corallo <akrl <at> sdf.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 22 May 2020 08:35:55 +0000
Eli Zaretskii <eliz <at> gnu.org> writes:

> FETCH_END calls marker-position, and that segfaults because the marker
> points to invalid memory, which was probably unmapped from the process
> address space (so I guess this is w32-specific, as GNU systems don't
> really return memory to the system).  The start_marker is also
> invalid, it's just that FETCH_END is called first.
>
> Since the previous call to before-change-functions already used the
> same overlay markers, I suspect that the call to
> before-change-functions caused the memory to be unmapped (perhaps due
> to GC).  As you see below, the value of before-change-functions is
>
>   (t syntax-ppss-flush-cache)
>
> So the prime suspect is what happens when syntax-ppss-flush-cache
> runs, and thus I CC Stefan.  The main question to answer now from my
> POV is how come a marker on buffer position 3116 which was valid
> before before-change-functions was called became invalid as result of
> some Lisp, in particular as result of calling before-change-functions.
>
> Here's the backtrace; ideas for further debugging are welcome.

Hi Eli,

I'be curious of the outcome if you had a look to your 'garbage_collect'
assembly to investigate the possible relation with 41357 as suggested
here
https://lists.gnu.org/archive/html/bug-gnu-emacs/2020-05/msg01095.html

Hope it helps

  Andrea

-- 
akrl <at> sdf.org




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 22 May 2020 10:55:01 GMT) Full text and rfc822 format available.

Message #29 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 41321 <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 22 May 2020 13:54:02 +0300
> Date: Fri, 22 May 2020 10:22:56 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 41321 <at> debbugs.gnu.org
> 
> Since the previous call to before-change-functions already used the
> same overlay markers, I suspect that the call to
> before-change-functions caused the memory to be unmapped (perhaps due
> to GC).

FTR: I am now running the 27.0.91 pretest with the patch for bug#40661
applied.  It's a long shot, since the problem here is not with
pointers to buffer text, but I just want to be sure I didn't
rediscover a complicated way to reproduce that bug ;-)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 22 May 2020 11:05:01 GMT) Full text and rfc822 format available.

Message #32 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andrea Corallo <akrl <at> sdf.org>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 22 May 2020 14:04:03 +0300
> From: Andrea Corallo <akrl <at> sdf.org>
> Cc: pipcet <at> gmail.com, Stefan Monnier <monnier <at> iro.umontreal.ca>,
>         41321 <at> debbugs.gnu.org
> Date: Fri, 22 May 2020 08:35:55 +0000
> 
> I'be curious of the outcome if you had a look to your 'garbage_collect'
> assembly to investigate the possible relation with 41357 as suggested
> here
> https://lists.gnu.org/archive/html/bug-gnu-emacs/2020-05/msg01095.html

Sorry, I'm not sure I understand what you mean by the above.  Did you
mean whether I disassembled garbage_collect and looked at the code?
If so, the answer is NO, I didn't yet have time for that.

However, given the latest findings, I now doubt even more that the
issue you identified can have any relation to this problem.  As seen
by the backtrace I've shown in my last message, the buffer's overlay
list has invalid overlay objects at the point of the crash.  The 2
pointers to the overlay lists of a buffer are unconditionally marked
in mark_buffer, so I don't see how problems in GC with Lisp objects in
registers could interfere in this case.  Am I missing something?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 22 May 2020 11:48:02 GMT) Full text and rfc822 format available.

Message #35 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 22 May 2020 11:47:03 +0000
On Fri, May 22, 2020 at 7:22 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
>   (gdb) p current_buffer->overlays_before
>   $28 = (struct Lisp_Overlay *) 0x170cb080
>   (gdb) p $28->start
>   $29 = XIL(0xa0000000170cb040)
>   (gdb) xtype
>   Lisp_Vectorlike
>   Cannot access memory at address 0x18ac04f8

Note that didn't try to print $29, but the original invalid marker. In
particular, I believe 0x170cb040 is a pointer to a valid marker.

>   (gdb) p $28->next
>   $30 = (struct Lisp_Overlay *) 0x13050320
>   (gdb) p $28->next->start
>   $31 = XIL(0xa000000016172310)
>   (gdb) xtype
>   Lisp_Vectorlike
>   Cannot access memory at address 0x18ac04f8

Same here.

If you could disassemble signal_before_change, we'd know whether
start_marker and end_marker live in callee-saved registers, and thus
whether this is likely to be Andrea's bug.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 22 May 2020 12:15:01 GMT) Full text and rfc822 format available.

Message #38 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 22 May 2020 15:13:58 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 22 May 2020 11:47:03 +0000
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 41321 <at> debbugs.gnu.org
> 
> On Fri, May 22, 2020 at 7:22 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> >   (gdb) p current_buffer->overlays_before
> >   $28 = (struct Lisp_Overlay *) 0x170cb080
> >   (gdb) p $28->start
> >   $29 = XIL(0xa0000000170cb040)
> >   (gdb) xtype
> >   Lisp_Vectorlike
> >   Cannot access memory at address 0x18ac04f8
> 
> Note that didn't try to print $29, but the original invalid marker.

Sorry, I don't follow.  "xtype" shows the type of the last result,
AFAIK, in this case the type of $29.  If this changed somehow, either
we have a bug in .gdbinit or I have been using GDB incorrectly for I
don't know how many years.

What am I missing?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 22 May 2020 12:33:02 GMT) Full text and rfc822 format available.

Message #41 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 22 May 2020 15:32:42 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 22 May 2020 11:47:03 +0000
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 41321 <at> debbugs.gnu.org
> 
> On Fri, May 22, 2020 at 7:22 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> >   (gdb) p current_buffer->overlays_before
> >   $28 = (struct Lisp_Overlay *) 0x170cb080
> >   (gdb) p $28->start
> >   $29 = XIL(0xa0000000170cb040)
> >   (gdb) xtype
> >   Lisp_Vectorlike
> >   Cannot access memory at address 0x18ac04f8
> 
> Note that didn't try to print $29, but the original invalid marker. In
> particular, I believe 0x170cb040 is a pointer to a valid marker.
> 
> >   (gdb) p $28->next
> >   $30 = (struct Lisp_Overlay *) 0x13050320
> >   (gdb) p $28->next->start
> >   $31 = XIL(0xa000000016172310)
> >   (gdb) xtype
> >   Lisp_Vectorlike
> >   Cannot access memory at address 0x18ac04f8
> 
> Same here.
> 
> If you could disassemble signal_before_change, we'd know whether
> start_marker and end_marker live in callee-saved registers, and thus
> whether this is likely to be Andrea's bug.

Since $28 is neither start_marker nor end_marker, but the first
overlay on the buffer's overlay chain, how could it be affected by
whether start_marker or end_marker are in a callee-saved register?
What am I missing here?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 22 May 2020 12:41:01 GMT) Full text and rfc822 format available.

Message #44 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 22 May 2020 12:39:27 +0000
On Fri, May 22, 2020 at 12:13 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Fri, 22 May 2020 11:47:03 +0000
> > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 41321 <at> debbugs.gnu.org
> >
> > On Fri, May 22, 2020 at 7:22 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > >   (gdb) p current_buffer->overlays_before
> > >   $28 = (struct Lisp_Overlay *) 0x170cb080
> > >   (gdb) p $28->start
> > >   $29 = XIL(0xa0000000170cb040)
> > >   (gdb) xtype
> > >   Lisp_Vectorlike
> > >   Cannot access memory at address 0x18ac04f8
> >
> > Note that didn't try to print $29, but the original invalid marker.
>
> Sorry, I don't follow.  "xtype" shows the type of the last result,
> AFAIK, in this case the type of $29.  If this changed somehow, either
> we have a bug in .gdbinit or I have been using GDB incorrectly for I
> don't know how many years.

I think it's most likely to be a GDB bug, and I can't reproduce it here.

But it's definitely trying to access memory at address 0x18ac04f8,
which corresponds to start_marker.

  (gdb) p rvoe_arg.location
  $35 = (Lisp_Object *) 0x15c9298 <globals+120>
  (gdb) xtype
  Lisp_Vectorlike
  Cannot access memory at address 0x18ac04f8
  (gdb) p rvoe_arg.errorp
  $36 = false

Surely rvoe_arg.location isn't a vectorlike, so that also points to
GDB not dealing with things correctly.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 22 May 2020 12:49:01 GMT) Full text and rfc822 format available.

Message #47 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 22 May 2020 15:48:40 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 22 May 2020 12:39:27 +0000
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 41321 <at> debbugs.gnu.org
> 
> > Sorry, I don't follow.  "xtype" shows the type of the last result,
> > AFAIK, in this case the type of $29.  If this changed somehow, either
> > we have a bug in .gdbinit or I have been using GDB incorrectly for I
> > don't know how many years.
> 
> I think it's most likely to be a GDB bug, and I can't reproduce it here.
> 
> But it's definitely trying to access memory at address 0x18ac04f8,
> which corresponds to start_marker.

My interpretation of that equality was that both start_marker and the
buffer's overlay chain git invalidated because some code relocated
objects and unmapped the previously referenced memory, perhaps due to
GC.  I don't yet have an explanation for how this could happen, so
maybe this hypothesis is wrong.

>   (gdb) p rvoe_arg.location
>   $35 = (Lisp_Object *) 0x15c9298 <globals+120>
>   (gdb) xtype
>   Lisp_Vectorlike
>   Cannot access memory at address 0x18ac04f8
>   (gdb) p rvoe_arg.errorp
>   $36 = false
> 
> Surely rvoe_arg.location isn't a vectorlike, so that also points to
> GDB not dealing with things correctly.

rvoe_arg.location should be a pointer to the value of
before-change-functions, so yes, it isn't supposed to be vectorlike.
But I very much doubt there's such a blatant bug in GDB: this is the
latest GDB 9.1, and I'm using these commands from .gdbinit all the
time.  I tend to think this is somehow part of the bug that caused the
crash.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 22 May 2020 12:56:01 GMT) Full text and rfc822 format available.

Message #50 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Andrea Corallo <akrl <at> sdf.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 22 May 2020 12:55:18 +0000
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Andrea Corallo <akrl <at> sdf.org>
>> Cc: pipcet <at> gmail.com, Stefan Monnier <monnier <at> iro.umontreal.ca>,
>>         41321 <at> debbugs.gnu.org
>> Date: Fri, 22 May 2020 08:35:55 +0000
>> 
>> I'be curious of the outcome if you had a look to your 'garbage_collect'
>> assembly to investigate the possible relation with 41357 as suggested
>> here
>> https://lists.gnu.org/archive/html/bug-gnu-emacs/2020-05/msg01095.html
>
> Sorry, I'm not sure I understand what you mean by the above.  Did you
> mean whether I disassembled garbage_collect and looked at the code?

Yes, should be quick to see if callee-save regs are pushed.

> However, given the latest findings, I now doubt even more that the
> issue you identified can have any relation to this problem.  As seen
> by the backtrace I've shown in my last message, the buffer's overlay
> list has invalid overlay objects at the point of the crash.  The 2
> pointers to the overlay lists of a buffer are unconditionally marked
> in mark_buffer, so I don't see how problems in GC with Lisp objects in
> registers could interfere in this case.  Am I missing something?

Not that I'm aware, I'm no expert of the piece of code you are looking
at and haven't investigated into.  Was just a 'cheap' idea to exclude a
potential problem from the table.

  Andrea

-- 
akrl <at> sdf.org




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 22 May 2020 14:05:02 GMT) Full text and rfc822 format available.

Message #53 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 22 May 2020 14:04:03 +0000
On Fri, May 22, 2020 at 12:48 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Fri, 22 May 2020 12:39:27 +0000
> > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 41321 <at> debbugs.gnu.org
> >
> > > Sorry, I don't follow.  "xtype" shows the type of the last result,
> > > AFAIK, in this case the type of $29.  If this changed somehow, either
> > > we have a bug in .gdbinit or I have been using GDB incorrectly for I
> > > don't know how many years.
> >
> > I think it's most likely to be a GDB bug, and I can't reproduce it here.
> >
> > But it's definitely trying to access memory at address 0x18ac04f8,
> > which corresponds to start_marker.
>
> My interpretation of that equality was that both start_marker and the
> buffer's overlay chain git invalidated because some code relocated
> objects and unmapped the previously referenced memory, perhaps due to
> GC.  I don't yet have an explanation for how this could happen, so
> maybe this hypothesis is wrong.

I think it has to be, because the error message would then read
"Cannot access memory at address 0x170cb040", which is the only
address xvectype is supposed to look at.

> But I very much doubt there's such a blatant bug in GDB: this is the
> latest GDB 9.1, and I'm using these commands from .gdbinit all the
> time.  I tend to think this is somehow part of the bug that caused the
> crash.

I'm not sure how it could be. I don't think posting the disassembled
code for `signal_before_change' can hurt, since there's no easy way
for anyone else to reproduce it.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 22 May 2020 14:27:02 GMT) Full text and rfc822 format available.

Message #56 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 22 May 2020 17:26:53 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 22 May 2020 14:04:03 +0000
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 41321 <at> debbugs.gnu.org
> 
> I don't think posting the disassembled code for
> `signal_before_change' can hurt, since there's no easy way for
> anyone else to reproduce it.

I see this on two different systems where Emacs was compiled with two
different versions of GCC.  So if you want to see the disassembly, any
32-bit GCC will do, I think.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 22 May 2020 14:41:02 GMT) Full text and rfc822 format available.

Message #59 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Andrea Corallo <akrl <at> sdf.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, Pip Cet <pipcet <at> gmail.com>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 22 May 2020 14:40:05 +0000
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Pip Cet <pipcet <at> gmail.com>
>> Date: Fri, 22 May 2020 14:04:03 +0000
>> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 41321 <at> debbugs.gnu.org
>> 
>> I don't think posting the disassembled code for
>> `signal_before_change' can hurt, since there's no easy way for
>> anyone else to reproduce it.
>
> I see this on two different systems where Emacs was compiled with two
> different versions of GCC.  So if you want to see the disassembly, any
> 32-bit GCC will do, I think.

I believe the triplet can make a difference given the calling convention
can change no?  Also CFLAGS are clearly a factor.

-- 
akrl <at> sdf.org




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 22 May 2020 19:04:01 GMT) Full text and rfc822 format available.

Message #62 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andrea Corallo <akrl <at> sdf.org>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 22 May 2020 22:03:04 +0300
> From: Andrea Corallo <akrl <at> sdf.org>
> Cc: Pip Cet <pipcet <at> gmail.com>, 41321 <at> debbugs.gnu.org,
>         monnier <at> iro.umontreal.ca
> Date: Fri, 22 May 2020 14:40:05 +0000
> 
> > I see this on two different systems where Emacs was compiled with two
> > different versions of GCC.  So if you want to see the disassembly, any
> > 32-bit GCC will do, I think.
> 
> I believe the triplet can make a difference given the calling convention
> can change no?  Also CFLAGS are clearly a factor.

My CFLAGS are in my original report of this bug.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 23 May 2020 07:02:01 GMT) Full text and rfc822 format available.

Message #65 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 Andrea Corallo <akrl <at> sdf.org>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 23 May 2020 07:00:56 +0000
I believe this isn't the problem we're looking for, but it might be
related anyway.

I'm seeing this in the assembler source code for insdel.c produced
with the mingw cross compiler (i686-w64-mingw32-gcc-win32):

    movl    60(%esp), %eax
    movl    %eax, (%esp)
    movl    72(%esp), %eax
    movl    %eax, 4(%esp)
    call    _Fmarker_position
If I'm reading this correctly, it's of some concern for wide-int
builds: the two 32-bit halves of a Lisp_Object are stored
non-consecutively.

Our stack marking doesn't catch that; at least, it doesn't for
symbols, where the less-significant half isn't a valid pointer. For
pseudovectors, things should still work...

So I think we have a problem with such --wide-int builds in cases
where a stack temporary holds an unpinned uninterned symbol while GC
is called. Something like

(prog1
  (gensym)
  (garbage-collect))

might trigger it. No problem with gcc -m32 on GNU/Linux, for some reason.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 23 May 2020 17:59:02 GMT) Full text and rfc822 format available.

Message #68 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Andrea Corallo <akrl <at> sdf.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 23 May 2020 17:58:19 +0000
Pip Cet <pipcet <at> gmail.com> writes:

> I believe this isn't the problem we're looking for, but it might be
> related anyway.
>
> I'm seeing this in the assembler source code for insdel.c produced
> with the mingw cross compiler (i686-w64-mingw32-gcc-win32):
>
>     movl    60(%esp), %eax
>     movl    %eax, (%esp)
>     movl    72(%esp), %eax
>     movl    %eax, 4(%esp)
>     call    _Fmarker_position
> If I'm reading this correctly, it's of some concern for wide-int
> builds: the two 32-bit halves of a Lisp_Object are stored
> non-consecutively.
>
> Our stack marking doesn't catch that; at least, it doesn't for
> symbols, where the less-significant half isn't a valid pointer. For
> pseudovectors, things should still work...
>
> So I think we have a problem with such --wide-int builds in cases
> where a stack temporary holds an unpinned uninterned symbol while GC
> is called. Something like
>
> (prog1
>   (gensym)
>   (garbage-collect))
>
> might trigger it. No problem with gcc -m32 on GNU/Linux, for some reason.

Very interesting.  AFAIK there's no guarantees for the compiler to spill
a DI reg in adjacent memory.  Also reading the GC code your observation
seems correct to me.

-- 
akrl <at> sdf.org




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 23 May 2020 22:39:01 GMT) Full text and rfc822 format available.

Message #71 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Andrea Corallo <akrl <at> sdf.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Pip Cet <pipcet <at> gmail.com>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 23 May 2020 18:37:57 -0400
>> If I'm reading this correctly, it's of some concern for wide-int
>> builds: the two 32-bit halves of a Lisp_Object are stored
>> non-consecutively.

This shouldn't be a problem: wide-int builds use MSB tagging, so all
Lisp_Objects which contain a pointer have their lowest 32bits exactly
identical to that pointer (and the higher 32bits just contain the tag).
So we'll find them in the stack even if the two halves are separate
simply because the pointer-part will be found like any other pointer.


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 23 May 2020 22:43:01 GMT) Full text and rfc822 format available.

Message #74 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Andrea Corallo <akrl <at> sdf.org>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 23 May 2020 22:41:41 +0000
On Sat, May 23, 2020 at 10:38 PM Stefan Monnier
<monnier <at> iro.umontreal.ca> wrote:
> >> If I'm reading this correctly, it's of some concern for wide-int
> >> builds: the two 32-bit halves of a Lisp_Object are stored
> >> non-consecutively.
>
> This shouldn't be a problem: wide-int builds use MSB tagging, so all
> Lisp_Objects which contain a pointer have their lowest 32bits exactly
> identical to that pointer (and the higher 32bits just contain the tag).

As I said, I don't believe that's true for symbols. Qnil is always
binary 0, so we offset all symbols by the offset of lispsym.

> So we'll find them in the stack even if the two halves are separate
> simply because the pointer-part will be found like any other pointer.

Yes, that's what I meant to say when I said it should still work for
pseudovectors.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 23 May 2020 23:27:01 GMT) Full text and rfc822 format available.

Message #77 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Andrea Corallo <akrl <at> sdf.org>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 23 May 2020 19:26:09 -0400
>> This shouldn't be a problem: wide-int builds use MSB tagging, so all
>> Lisp_Objects which contain a pointer have their lowest 32bits exactly
>> identical to that pointer (and the higher 32bits just contain the tag).
> As I said, I don't believe that's true for symbols.  Qnil is always
> binary 0, so we offset all symbols by the offset of lispsym.

Oh, right, good point: I had completely forgotten about that "detail".
We should probably adjust our conservative stack scanning accordingly.


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 23 May 2020 23:56:01 GMT) Full text and rfc822 format available.

Message #80 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 23 May 2020 23:54:17 +0000
[Message part 1 (text/plain, inline)]
On Fri, May 22, 2020 at 7:22 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
>   #0  PSEUDOVECTORP (code=<optimized out>, a=<optimized out>) at lisp.h:1720
>   #1  MARKERP (x=<optimized out>) at lisp.h:2618
>   #2  CHECK_MARKER (x=XIL(0xa000000018ac0518)) at marker.c:133
>   #3  0x010f073c in Fmarker_position (marker=XIL(0xa000000018ac0518))
>       at marker.c:452

I think I've worked it out: it's this mingw bug:
https://sourceforge.net/p/mingw-w64/bugs/778/

On mingw, if <stdint.h> is included before/instead of stddef.h,
alignof (max_align_t) == 16. However, as can be seen by the backtrace
above, Eli's malloc only returned an 8-byte-aligned block. That's not
normally a problem, because mark_maybe_object doesn't care about
alignment; but in conjunction with the gcc behavior change, we rely or
mark_maybe_pointer to mark the pointer, and it doesn't, because the
pointer is not aligned to a LISP_ALIGNMENT = 16-byte boundary.

Brute-force patch attached until we can work out how to fix this properly.
[0001-Accept-unaligned-pointers-in-maybe_lisp_pointer.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sun, 24 May 2020 14:25:02 GMT) Full text and rfc822 format available.

Message #83 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sun, 24 May 2020 17:24:41 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Sat, 23 May 2020 23:54:17 +0000
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 41321 <at> debbugs.gnu.org
> 
> I think I've worked it out: it's this mingw bug:
> https://sourceforge.net/p/mingw-w64/bugs/778/

Thank you for working on this tricky problem.

FTR, I don't use that flavor of MinGW.

> On mingw, if <stdint.h> is included before/instead of stddef.h,
> alignof (max_align_t) == 16.

The problem with the order of inclusion doesn't exist in my header
files, so alignof (max_align_t) is always 16.

> However, as can be seen by the backtrace
> above, Eli's malloc only returned an 8-byte-aligned block.

Isn't that strange?  Lisp data is allocated via lmalloc, AFAIK, and
lmalloc is supposed to guarantee LISP_ALIGNMENT alignment.  Or am I
missing something?

> That's not normally a problem, because mark_maybe_object doesn't
> care about alignment; but in conjunction with the gcc behavior
> change, we rely or mark_maybe_pointer to mark the pointer, and it
> doesn't, because the pointer is not aligned to a LISP_ALIGNMENT =
> 16-byte boundary.

I still very much doubt that this has anything to do with stack
marking during GC, since I've shown in my backtrace that
current_buffer->overlays_before points to an overlay with invalid
markers.  And GC always marks buffer's overlays (and thus their
markers), as can be seen in mark_buffer.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sun, 24 May 2020 15:02:02 GMT) Full text and rfc822 format available.

Message #86 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sun, 24 May 2020 15:00:36 +0000
On Sun, May 24, 2020 at 2:24 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Sat, 23 May 2020 23:54:17 +0000
> > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 41321 <at> debbugs.gnu.org
> >
> > I think I've worked it out: it's this mingw bug:
> > https://sourceforge.net/p/mingw-w64/bugs/778/
>
> Thank you for working on this tricky problem.
>
> FTR, I don't use that flavor of MinGW.

So your flavor is even more broken than what Debian ships? That's
interesting, which flavor is it?

> > On mingw, if <stdint.h> is included before/instead of stddef.h,
> > alignof (max_align_t) == 16.
>
> The problem with the order of inclusion doesn't exist in my header
> files, so alignof (max_align_t) is always 16.

Okay, so that is our bug.

> > However, as can be seen by the backtrace
> > above, Eli's malloc only returned an 8-byte-aligned block.
>
> Isn't that strange?  Lisp data is allocated via lmalloc, AFAIK, and
> lmalloc is supposed to guarantee LISP_ALIGNMENT alignment.  Or am I
> missing something?

No, it relies on the compile-time constants and never checks.

The relevant code is:

enum { MALLOC_IS_LISP_ALIGNED = alignof (max_align_t) % LISP_ALIGNMENT == 0 };

static bool
laligned (void *p, size_t size)
{
  return (MALLOC_IS_LISP_ALIGNED || (intptr_t) p % LISP_ALIGNMENT == 0
      || size % LISP_ALIGNMENT != 0);
}

... so laligned is a constant "true" function on your machine, since
alignof (max_align_t) is 16 and LISP_ALIGNMENT is 16.

static void *
lmalloc (size_t size, bool clearit)
{
#ifdef USE_ALIGNED_ALLOC
  if (! MALLOC_IS_LISP_ALIGNED && size % LISP_ALIGNMENT == 0)
    {
      void *p = aligned_alloc (LISP_ALIGNMENT, size);
      if (clearit && p)
    memclear (p, size);
      return p;
    }
#endif

  while (true)
    {
      void *p = clearit ? calloc (1, size) : malloc (size);
      if (laligned (p, size))
    return p;
      free (p);
      size_t bigger = size + LISP_ALIGNMENT;
      if (size < bigger)
    size = bigger;
    }
}

That optimizes down to returning the malloc/calloc return value directly.

IOW, alloc.c relies on malloc() being max_align_t-aligned, and never
checks, not even in debug builds. That's something that needs to be
fixed, since broken-malloc environments such as yours exist.

> > That's not normally a problem, because mark_maybe_object doesn't
> > care about alignment; but in conjunction with the gcc behavior
> > change, we rely or mark_maybe_pointer to mark the pointer, and it
> > doesn't, because the pointer is not aligned to a LISP_ALIGNMENT =
> > 16-byte boundary.
>
> I still very much doubt that this has anything to do with stack
> marking during GC, since I've shown in my backtrace that
> current_buffer->overlays_before points to an overlay with invalid
> markers.

You haven't.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sun, 24 May 2020 16:26:02 GMT) Full text and rfc822 format available.

Message #89 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sun, 24 May 2020 19:25:14 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Sun, 24 May 2020 15:00:36 +0000
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 41321 <at> debbugs.gnu.org
> 
> > FTR, I don't use that flavor of MinGW.
> 
> So your flavor is even more broken than what Debian ships?

Why _more_ broken?

> That's interesting, which flavor is it?

mingw.org's MinGW.

> > Isn't that strange?  Lisp data is allocated via lmalloc, AFAIK, and
> > lmalloc is supposed to guarantee LISP_ALIGNMENT alignment.  Or am I
> > missing something?
> 
> No, it relies on the compile-time constants and never checks.

So that is the bug to fix, no?

> > I still very much doubt that this has anything to do with stack
> > marking during GC, since I've shown in my backtrace that
> > current_buffer->overlays_before points to an overlay with invalid
> > markers.
> 
> You haven't.

Of course, I have.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sun, 24 May 2020 16:56:02 GMT) Full text and rfc822 format available.

Message #92 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: pipcet <at> gmail.com
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sun, 24 May 2020 19:55:29 +0300
> Date: Sun, 24 May 2020 19:25:14 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> 
> > > I still very much doubt that this has anything to do with stack
> > > marking during GC, since I've shown in my backtrace that
> > > current_buffer->overlays_before points to an overlay with invalid
> > > markers.
> > 
> > You haven't.
> 
> Of course, I have.

Here's how healthy overlays look in a healthy buffer:

  (gdb) p current_buffer->overlays_after
  $10 = (struct Lisp_Overlay *) 0x0
  (gdb) p current_buffer->overlays_before
  $11 = (struct Lisp_Overlay *) 0x7728258
  (gdb) p $11->start
  $12 = XIL(0xa000000007728218)
  (gdb) xtype
  Lisp_Vectorlike
  PVEC_MARKER
  (gdb) xmarker
  $13 = (struct Lisp_Marker *) 0x7728218
  (gdb) p *$
  $14 = {
    header = {
      size = 1124081664
    },
    buffer = 0x728fc38,
    need_adjustment = 0,
    insertion_type = 0,
    next = 0x765eae8,
    charpos = 13968,
    bytepos = 13968
  }
  (gdb) p $11->next
  $15 = (struct Lisp_Overlay *) 0x0

And here's a reminder from how the same looked in the session that
segfaulted:

  (gdb) p current_buffer->overlays_before
  $28 = (struct Lisp_Overlay *) 0x170cb080
  (gdb) p $28->start
  $29 = XIL(0xa0000000170cb040)
  (gdb) xtype
  Lisp_Vectorlike
  Cannot access memory at address 0x18ac04f8
  (gdb) p $28->next
  $30 = (struct Lisp_Overlay *) 0x13050320
  (gdb) p $28->next->start
  $31 = XIL(0xa000000016172310)
  (gdb) xtype
  Lisp_Vectorlike
  Cannot access memory at address 0x18ac04f8
  (gdb) p current_buffer->overlays_after
  $32 = (struct Lisp_Overlay *) 0x0
  (gdb) p $28->next->next
  $33 = (struct Lisp_Overlay *) 0x0

If you still claim that I didn't demonstrate that the buffer's overlay
chain got corrupted as part of the bug that caused the segfault,
please point out what I missed here.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sun, 24 May 2020 18:05:02 GMT) Full text and rfc822 format available.

Message #95 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sun, 24 May 2020 18:03:57 +0000
On Sun, May 24, 2020 at 4:55 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> And here's a reminder from how the same looked in the session that> segfaulted:
>
>   (gdb) p current_buffer->overlays_before
>   $28 = (struct Lisp_Overlay *) 0x170cb080
>   (gdb) p $28->start
>   $29 = XIL(0xa0000000170cb040)
>   (gdb) xtype
>   Lisp_Vectorlike
>   Cannot access memory at address 0x18ac04f8

That should read "Cannot access memory at address 0x170cb080". It
doesn't. It doesn't tell you whether the memory at page 0x170cb000 is
mapped, because gdb, for whatever reason (a bug in .gdbinit, a bug in
gdb, some weird command entered at the gdb prompt before the
transcript started, or even, as you yourself suggested, somehow as the
result of the memory corruption that caused the crash), looked in the
wrong place.

Instead, it tells you that the page at 0x18ac0000 isn't mapped. Which we knew.

>   (gdb) p $28->next
>   $30 = (struct Lisp_Overlay *) 0x13050320
>   (gdb) p $28->next->start
>   $31 = XIL(0xa000000016172310)
>   (gdb) xtype
>   Lisp_Vectorlike
>   Cannot access memory at address 0x18ac04f8

Same here. It should read "Cannot access memory at address 0x16172310".

> If you still claim that I didn't demonstrate that the buffer's overlay
> chain got corrupted

I do, of course. The message GDB prints simply does not say anything
problematic about the buffer's overlay chain.

> as part of the bug that caused the segfault,
> please point out what I missed here.

You omitted the third call to xtype, which was even more clearly
nonsensical: xtype was misbehaving. We don't know in which way it was
misbehaving. So there's no evidence either way.

FWIW, running into gdb bugs is something that happens to me almost on
a regular basis. There's no point reporting those, as there's
generally no response. In your case, you're in an unusual environment
with a rather large and complicated .gdbinit file which does very
strange things to avoid running into GDB bugs that we know about. All
that increases the likelihood of your encountering a gdb bug that no
one else has, or that has been reported but never responded to.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sun, 24 May 2020 18:41:01 GMT) Full text and rfc822 format available.

Message #98 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sun, 24 May 2020 21:40:34 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Sun, 24 May 2020 18:03:57 +0000
> Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
> 
> > If you still claim that I didn't demonstrate that the buffer's overlay
> > chain got corrupted
> 
> I do, of course. The message GDB prints simply does not say anything
> problematic about the buffer's overlay chain.
> 
> > as part of the bug that caused the segfault,
> > please point out what I missed here.
> 
> You omitted the third call to xtype, which was even more clearly
> nonsensical: xtype was misbehaving. We don't know in which way it was
> misbehaving. So there's no evidence either way.
> 
> FWIW, running into gdb bugs is something that happens to me almost on
> a regular basis. There's no point reporting those, as there's
> generally no response. In your case, you're in an unusual environment
> with a rather large and complicated .gdbinit file which does very
> strange things to avoid running into GDB bugs that we know about. All
> that increases the likelihood of your encountering a gdb bug that no
> one else has, or that has been reported but never responded to.

I don't buy this, sorry.  I use GDB every day in this very "unusual
environment", both when debugging Emacs and other programs.  The
probability of these being due to some bug in GDB or in .gdbinit
commands is very low, as I and others use them all the time.  It is
much more probable that the commands I've shown are signs of a real
trouble in Emacs and not in GDB.  I'm not willing to disregard what
those commands show me because they don't match your theory.  I prefer
facts.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sun, 24 May 2020 19:01:02 GMT) Full text and rfc822 format available.

Message #101 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Moreton <andrewjmoreton <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sun, 24 May 2020 20:00:16 +0100
On Sun 24 May 2020, Pip Cet wrote:

> On Sun, May 24, 2020 at 2:24 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
>> > From: Pip Cet <pipcet <at> gmail.com>
>> > Date: Sat, 23 May 2020 23:54:17 +0000
>> > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 41321 <at> debbugs.gnu.org
>> >
>> > I think I've worked it out: it's this mingw bug:
>> > https://sourceforge.net/p/mingw-w64/bugs/778/
>>
>> Thank you for working on this tricky problem.
>>
>> FTR, I don't use that flavor of MinGW.
>
> So your flavor is even more broken than what Debian ships? That's
> interesting, which flavor is it?

FYI, there are two separate projects:
  mingw.org: 32bit only.
  mingw-w64: 32bit and 64bit, using a different C runtime.

On my machine a simple test program shows:

--------------------------------------------------------------
  project     gcc     cpu   alignof(max_align_t)
--------------------------------------------------------------
mingw.org   9.2.0    i686   16
mingw-w64  10.1.0    i686   16 (stdint.h before stddef.h)
                             8 (stdint.h after  stddef.h)
mingw-w64  10.1.0  x86_64   16
--------------------------------------------------------------

This problem only appears with the 32bit mingw-w64 toolchain.

Eli uses the mingw.org toolchain. Linux distros initially used
mingw.org, but switched to mingw-w64 cross compilers several years ago.

    AndyM





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sun, 24 May 2020 19:11:02 GMT) Full text and rfc822 format available.

Message #104 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Andy Moreton <andrewjmoreton <at> gmail.com>
Cc: 41321 <at> debbugs.gnu.org
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sun, 24 May 2020 19:09:28 +0000
On Sun, May 24, 2020 at 7:01 PM Andy Moreton <andrewjmoreton <at> gmail.com> wrote:
> > So your flavor is even more broken than what Debian ships? That's
> > interesting, which flavor is it?
>
> FYI, there are two separate projects:
>   mingw.org: 32bit only.
>   mingw-w64: 32bit and 64bit, using a different C runtime.
>
> On my machine a simple test program shows:
>
> --------------------------------------------------------------
>   project     gcc     cpu   alignof(max_align_t)
> --------------------------------------------------------------
> mingw.org   9.2.0    i686   16
> mingw-w64  10.1.0    i686   16 (stdint.h before stddef.h)
>                              8 (stdint.h after  stddef.h)
> mingw-w64  10.1.0  x86_64   16
> --------------------------------------------------------------

Thanks!

> This problem only appears with the 32bit mingw-w64 toolchain.

FWIW, the problem is that the incorrect value of 16 is returned in
some cases. All 32bit toolchains appear to be broken. I said that
mingw.org was "more broken" than mingw-w64 because it _always_ returns
the incorrect value, rather than doing so only for an unfortunate
combination of #includes.

> Eli uses the mingw.org toolchain. Linux distros initially used
> mingw.org, but switched to mingw-w64 cross compilers several years ago.

I couldn't get the mingw.org toolchain to work at all...




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sun, 24 May 2020 19:41:01 GMT) Full text and rfc822 format available.

Message #107 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sun, 24 May 2020 19:40:09 +0000
On Sun, May 24, 2020 at 6:40 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Sun, 24 May 2020 18:03:57 +0000
> > Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
> >
> > > If you still claim that I didn't demonstrate that the buffer's overlay
> > > chain got corrupted
> >
> > I do, of course. The message GDB prints simply does not say anything
> > problematic about the buffer's overlay chain.
> >
> > > as part of the bug that caused the segfault,
> > > please point out what I missed here.
> >
> > You omitted the third call to xtype, which was even more clearly
> > nonsensical: xtype was misbehaving. We don't know in which way it was
> > misbehaving. So there's no evidence either way.
> >
> > FWIW, running into gdb bugs is something that happens to me almost on
> > a regular basis. There's no point reporting those, as there's
> > generally no response. In your case, you're in an unusual environment
> > with a rather large and complicated .gdbinit file which does very
> > strange things to avoid running into GDB bugs that we know about. All
> > that increases the likelihood of your encountering a gdb bug that no
> > one else has, or that has been reported but never responded to.
>
> I don't buy this, sorry.

So you think there's a second bug, located in Emacs, which causes GDB,
which isn't supposed to be broken by anything the debuggee does, to be
broken and respond in nonsensical ways?

> I use GDB every day in this very "unusual
> environment", both when debugging Emacs and other programs.

And you've never run into GDB bugs?

> The
> probability of these being due to some bug in GDB or in .gdbinit
> commands is very low, as I and others use them all the time.

I'm perfectly willing to help you trace down this bug (in GDB or
.gdbinit; we've already found the bug in mingw and the one in Emacs)
if it serves any purpose, but I suspect you don't have the time.

But I can't conceive of an explanation in which a bug in Emacs could
cause a bug-free GDB to respond in the nonsensical way your last
invocation of xtype did.

> It is much more probable that the commands I've shown are signs of a real
> trouble in Emacs and not in GDB.

Are you saying the bug I've found isn't "a real trouble"? I'm curious
as to what trouble you're imagining.

> I'm not willing to disregard what
> those commands show me because they don't match your theory.

What they show you is that memory at a certain address, which they
helpfully specify, isn't mapped.

You conclude that memory at a totally different address isn't mapped,
even though GDB quite explicitly never says so.

That conclusion is invalid.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Mon, 25 May 2020 02:31:01 GMT) Full text and rfc822 format available.

Message #110 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Mon, 25 May 2020 05:30:37 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Sun, 24 May 2020 19:40:09 +0000
> Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
> 
> > I use GDB every day in this very "unusual
> > environment", both when debugging Emacs and other programs.
> 
> And you've never run into GDB bugs?

Not such blatant ones, no, and not lately.

> Are you saying the bug I've found isn't "a real trouble"?

I'm saying I'm not convinced that problem has anything to do with this
particular segfault.

> What they show you is that memory at a certain address, which they
> helpfully specify, isn't mapped.
> 
> You conclude that memory at a totally different address isn't mapped,
> even though GDB quite explicitly never says so.
> 
> That conclusion is invalid.

Your opinion, not mine, not yet anyway.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Mon, 25 May 2020 06:41:02 GMT) Full text and rfc822 format available.

Message #113 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Mon, 25 May 2020 06:40:11 +0000
On Mon, May 25, 2020 at 2:30 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Sun, 24 May 2020 19:40:09 +0000
> > Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
> > What they show you is that memory at a certain address, which they
> > helpfully specify, isn't mapped.
> >
> > You conclude that memory at a totally different address isn't mapped,
> > even though GDB quite explicitly never says so.
> >
> > That conclusion is invalid.
> Your opinion, not mine, not yet anyway.

Maybe I'm approaching this the wrong way: What are you actually planning to do?

I think we should work around the mingw bug on both the master and
emacs-27 branches.

We should also fix the (symbol-related) Emacs bug before it bites us:
on both branches, unless we can get a mingw user to provide the output
of "disassemble Fprog1" (and a bunch of other functions). (OTOH, we've
already decided to keep crashable GC bugs on the emacs-27 branch).

And we should wait and see whether similar crashes keep happening.

What we should not do is encourage people to keep looking for another
Emacs bug based on the existing backtraces.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Mon, 25 May 2020 11:30:02 GMT) Full text and rfc822 format available.

Message #116 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>, eggert <at> cs.ucla.edu
Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Mon, 25 May 2020 11:28:46 +0000
On Mon, May 25, 2020 at 6:40 AM Pip Cet <pipcet <at> gmail.com> wrote:
> We should also fix the (symbol-related) Emacs bug before it bites us:
> on both branches, unless we can get a mingw user to provide the output
> of "disassemble Fprog1" (and a bunch of other functions). (OTOH, we've
> already decided to keep crashable GC bugs on the emacs-27 branch).

And I just noticed strings aren't aligned to LISP_ALIGNMENT on
x86_64-pc-linux-gnu.

I think we're going to have to weaken the maybe_lisp_pointer check to
check only for GC_ALIGNMENT.

The commit that introduced this problem, for what it's worth, is
967d2c55ef3908fd378e05b2a0070663ae45f6de




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Mon, 25 May 2020 14:54:01 GMT) Full text and rfc822 format available.

Message #119 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>, eggert <at> cs.ucla.edu
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Mon, 25 May 2020 17:53:17 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Mon, 25 May 2020 11:28:46 +0000
> Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
> 
> And I just noticed strings aren't aligned to LISP_ALIGNMENT on
> x86_64-pc-linux-gnu.
> 
> I think we're going to have to weaken the maybe_lisp_pointer check to
> check only for GC_ALIGNMENT.

I tend to agree.

Paul, why did we move to max_align_t as the alignment requirement?
AFAIU, GCC enlarged that recently to allow for _Float128 type (at
least on 32-bit hosts), but do we really need that?

Also, what does this mean for stack-based Lisp objects?  AFAIU, we
previously required 8-byte alignment on 32-bit hosts (and on
MS-Windows we jump through some hoops to guarantee that in callbacks
of Windows APIs and in thread functions that manipulate Lisp objects).
Does the use of max_align_t means that now stack-based Lisp objects
will need to have 16-byte alignment on 32-bit Windows?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Mon, 25 May 2020 15:14:01 GMT) Full text and rfc822 format available.

Message #122 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org, Pip Cet <pipcet <at> gmail.com>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Mon, 25 May 2020 11:12:53 -0400
>> I think we're going to have to weaken the maybe_lisp_pointer check to
>> check only for GC_ALIGNMENT.

Sounds about right: the only alignment we really need for Lisp_Objects
is the GC_ALIGNMENT that allows us to use the 3 LSB for tags.
src/alloc.c makes efforts to ensure this alignment and for some objects
(e.g. Lisp_Floats as well as (on 32bit hosts) Lisp_Cons cells) that's
the only alignment we can meaningfully impose since those objects are
only 64bit in size.


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Mon, 25 May 2020 15:15:02 GMT) Full text and rfc822 format available.

Message #125 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Mon, 25 May 2020 18:14:22 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Mon, 25 May 2020 06:40:11 +0000
> Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
> 
> What are you actually planning to do?

Given the fact that I'm the only one who sees these problems?  Not
much: I intend to continue running Emacs under GDB and collect data
about the crashes until either I figure out what causes the crashes,
or the crashes disappear (which would mean the problem was fixed
indirectly by some other change).

> I think we should work around the mingw bug on both the master and
> emacs-27 branches.

That depends on what the proposed solution or workaround will be.  We
need to see where the discussion of the alignment issue goes and what
we decide to do about that.

> What we should not do is encourage people to keep looking for another
> Emacs bug based on the existing backtraces.

Indeed, I'm posting the backtraces for the record; no one should feel
compelled to study them unless they are interested.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Mon, 25 May 2020 17:43:01 GMT) Full text and rfc822 format available.

Message #128 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Mon, 25 May 2020 17:41:32 +0000
On Mon, May 25, 2020 at 3:14 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Mon, 25 May 2020 06:40:11 +0000
> > Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
> >
> > What are you actually planning to do?
> Not
> much: I intend to continue running Emacs under GDB and collect data
> about the crashes until either I figure out what causes the crashes,
> or the crashes disappear (which would mean the problem was fixed
> indirectly by some other change).

(Or directly, of course. I still believe my "theory" about your bug is correct.)

> > I think we should work around the mingw bug on both the master and
> > emacs-27 branches.
>
> That depends on what the proposed solution or workaround will be.

For emacs-27, reducing the alignment requirement in
maybe_lisp_pointer: that will only make us check more pointers, not
fewer, so while it is a GC change it's one that makes sense.

For master, I'd consider setting LISP_ALIGNMENT to 8 on the mingw32
platform, where memory is already scarce. I don't trust the alleged
performance hit of 20%, so we might have to collect some actual
performance data. But we definitely need to make strings aligned to
LISP_ALIGNMENT, one way or the other, because that's the original
reason for maybe_mark_pointer.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Tue, 26 May 2020 03:34:01 GMT) Full text and rfc822 format available.

Message #131 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Mon, 25 May 2020 20:33:32 -0700
On 5/25/20 4:28 AM, Pip Cet wrote:

> And I just noticed strings aren't aligned to LISP_ALIGNMENT on
> x86_64-pc-linux-gnu.

Could you explain? Strings are allocated via allocate_string -> lisp_malloc ->
lmalloc, and lmalloc is supposed to align to LISP_ALIGNMENT for strings just
like it does for other Lisp objects.

String data (struct sdata) is not Lisp-aligned, but it doesn't need to be.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Tue, 26 May 2020 03:40:01 GMT) Full text and rfc822 format available.

Message #134 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>, Pip Cet <pipcet <at> gmail.com>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Mon, 25 May 2020 20:39:24 -0700
On 5/25/20 7:53 AM, Eli Zaretskii wrote:

> why did we move to max_align_t as the alignment requirement?
> AFAIU, GCC enlarged that recently to allow for _Float128 type (at
> least on 32-bit hosts), but do we really need that?

Not on current glibc on any platform that I know, no. I was merely trying to
keep the code portable to platforms where (say) alignof (pthread_cond_t) == 16.
POSIX allows this, and this sort of thing is likely to happen somewhere in the
not-too-distant future, for performance reasons.

> Does the use of max_align_t means that now stack-based Lisp objects
> will need to have 16-byte alignment on 32-bit Windows?

No, because we don't need to GC stack-based objects themselves (the stack will
reclaim them) and the GC find everything they point to (as it scans the stack).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Tue, 26 May 2020 06:20:01 GMT) Full text and rfc822 format available.

Message #137 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Tue, 26 May 2020 06:18:52 +0000
On Tue, May 26, 2020 at 3:33 AM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 5/25/20 4:28 AM, Pip Cet wrote:
>
> > And I just noticed strings aren't aligned to LISP_ALIGNMENT on
> > x86_64-pc-linux-gnu.
>
> Could you explain? Strings are allocated via allocate_string -> lisp_malloc ->
> lmalloc, and lmalloc is supposed to align to LISP_ALIGNMENT for strings just
> like it does for other Lisp objects.

Sorry. You're right, the non-aligned strings aren't relevant for GC.

However, this is only because struct Lisp_String happens to have an
even number of words. If someone changes that, the old code would
break...

We're still going to have to deal with symbols on --wide-int builds
when the two halves of the wide int are saved non-consecutively.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Tue, 26 May 2020 06:47:02 GMT) Full text and rfc822 format available.

Message #140 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Mon, 25 May 2020 23:46:02 -0700
[Message part 1 (text/plain, inline)]
On 5/25/20 8:33 PM, Paul Eggert wrote:
> On 5/25/20 4:28 AM, Pip Cet wrote:
> 
>> And I just noticed strings aren't aligned to LISP_ALIGNMENT on
>> x86_64-pc-linux-gnu.
> 
> Could you explain?

Oh, never mind, I figured it out. Sorry about the noise.

I installed the first attached patch to fix the bug on master (as a series of
commits, the leading ones not quite right unfortunately). This patch does what
you proposed, and also tightens up some of the related alignment checks.

I propose the second patch for emacs-27; it's limited to what you proposed,
namely, it weakens maybe_lisp_pointer to check only for GC_ALIGNMENT.
[emacs.diff (text/x-patch, attachment)]
[0001-Fix-aborts-due-to-GC-losing-pseudovectors.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Tue, 26 May 2020 07:52:01 GMT) Full text and rfc822 format available.

Message #143 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Tue, 26 May 2020 00:51:05 -0700
[Message part 1 (text/plain, inline)]
On 5/25/20 11:18 PM, Pip Cet wrote:

> However, this is only because struct Lisp_String happens to have an
> even number of words. If someone changes that, the old code would
> break...

No, because struct Lisp_String contains a GCALIGNED_UNION_MEMBER, so it is
always GC-aligned, and (for older compilers that don't support alignas (8)) this
is checked statically via 'verify (GCALIGNED (struct Lisp_String))'.

Now that I've looked at it, though, I see that I forgot to do something similar
with struct Lisp_Float, which has the same issue. Fixed by installing the
attached patch on master.

> We're still going to have to deal with symbols on --wide-int builds
> when the two halves of the wide int are saved non-consecutively.

Yes, I think that's the most pressing issue in this area. I will have to take a
break now, though, since I have sleep and other work to do.
[0001-Port-struct-Lisp_FLoat-to-oddball-platforms.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Tue, 26 May 2020 08:29:02 GMT) Full text and rfc822 format available.

Message #146 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Tue, 26 May 2020 08:27:41 +0000
On Tue, May 26, 2020 at 7:51 AM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 5/25/20 11:18 PM, Pip Cet wrote:
> > However, this is only because struct Lisp_String happens to have an
> > even number of words. If someone changes that, the old code would
> > break...
>
> No, because struct Lisp_String contains a GCALIGNED_UNION_MEMBER, so it is
> always GC-aligned, and (for older compilers that don't support alignas (8)) this
> is checked statically via 'verify (GCALIGNED (struct Lisp_String))'.

As I said, this was specific to the old code, where LISP_ALIGNMENT,
not GCALIGNMENT, was used by maybe_lisp_pointer. Things should be fine
now (apart from the issue below)!

> Now that I've looked at it, though, I see that I forgot to do something similar
> with struct Lisp_Float, which has the same issue. Fixed by installing the
> attached patch on master.

LGTM.

> > We're still going to have to deal with symbols on --wide-int builds
> > when the two halves of the wide int are saved non-consecutively.
>
> Yes, I think that's the most pressing issue in this area.
> I will have to take a
> break now, though, since I have sleep and other work to do.

Thanks for all the patches and comments!




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Tue, 26 May 2020 15:19:02 GMT) Full text and rfc822 format available.

Message #149 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Tue, 26 May 2020 18:17:51 +0300
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
> Date: Mon, 25 May 2020 23:46:02 -0700
> 
> I propose the second patch for emacs-27; it's limited to what you proposed,
> namely, it weakens maybe_lisp_pointer to check only for GC_ALIGNMENT.
> 
>  static bool
>  maybe_lisp_pointer (void *p)
>  {
> -  return (uintptr_t) p % LISP_ALIGNMENT == 0;
> +  return (uintptr_t) p % GCALIGNMENT == 0;
>  }

On non-USE_LSB_TAG systems, GCALIGNMENT is 1, so this doesn't look
right (or maybe I'm missing something).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Tue, 26 May 2020 22:50:02 GMT) Full text and rfc822 format available.

Message #152 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Tue, 26 May 2020 15:49:24 -0700
[Message part 1 (text/plain, inline)]
On 5/26/20 8:17 AM, Eli Zaretskii wrote:
>>  static bool
>>  maybe_lisp_pointer (void *p)
>>  {
>> -  return (uintptr_t) p % LISP_ALIGNMENT == 0;
>> +  return (uintptr_t) p % GCALIGNMENT == 0;
>>  }
> On non-USE_LSB_TAG systems, GCALIGNMENT is 1, so this doesn't look
> right (or maybe I'm missing something).

Good point; I'd neglected that. I.e., on !USE_LSB_TAG systems the proposed
emacs-27 patch is overly-conservative, as it causes maybe_lisp_pointer to always
return true. Although this hurts GC performance it doesn't affect correctness
and the patch does fix a crash on USE_LSB_TAG systems, so it (or something like
it) is needed for emacs-27.

I installed the attached patch into master to fix the !USE_LSB_TAG performance
issue you raised.  This patch does not fix crashes; it's merely a performance tweak.

I am planning on looking into related crashes for Lisp_Symbol next. Perhaps we
should wait on that before worrying about what exact patch should go into emacs-27.
[0001-Tweak-GC-performance-if-USE_LSB_TAG.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Wed, 27 May 2020 15:27:01 GMT) Full text and rfc822 format available.

Message #155 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Wed, 27 May 2020 18:26:36 +0300
> Cc: pipcet <at> gmail.com, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Tue, 26 May 2020 15:49:24 -0700
> 
> > On non-USE_LSB_TAG systems, GCALIGNMENT is 1, so this doesn't look
> > right (or maybe I'm missing something).
> 
> Good point; I'd neglected that. I.e., on !USE_LSB_TAG systems the proposed
> emacs-27 patch is overly-conservative, as it causes maybe_lisp_pointer to always
> return true. Although this hurts GC performance it doesn't affect correctness
> and the patch does fix a crash on USE_LSB_TAG systems, so it (or something like
> it) is needed for emacs-27.

We used to rely on 8-byte alignment on those systems, and I don't see
any reason not to continue relying on that and punishing those
systems' performance.  What would we gain?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Wed, 27 May 2020 16:59:02 GMT) Full text and rfc822 format available.

Message #158 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Wed, 27 May 2020 09:58:11 -0700
On 5/27/20 8:26 AM, Eli Zaretskii wrote:
> We used to rely on 8-byte alignment on those systems, and I don't see
> any reason not to continue relying on that and punishing those
> systems' performance.  What would we gain?

In looking into this more, it appears that the maybe_lisp_pointer idea is wrong,
in that compilers can make pointers into a Lisp object while losing the address
of the original object (and we've seen them do this) and there's no guarantee
that these sub-pointers are GCALIGNED. This sort of failure should be quite rare
but can cause crashes such as the one you observed. I am looking into a fix and
plan to apply it to master (I've already installed some minor glitches I
observed on the way); we can then talk about what to do with emacs-27.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Wed, 27 May 2020 17:34:01 GMT) Full text and rfc822 format available.

Message #161 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Wed, 27 May 2020 20:33:14 +0300
> Cc: pipcet <at> gmail.com, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Wed, 27 May 2020 09:58:11 -0700
> 
> In looking into this more, it appears that the maybe_lisp_pointer idea is wrong,
> in that compilers can make pointers into a Lisp object while losing the address
> of the original object (and we've seen them do this) and there's no guarantee
> that these sub-pointers are GCALIGNED.

Sorry, I don't follow: what do you mean by "losing the address of the
original object" in this case?  Can you show an example?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Wed, 27 May 2020 17:54:02 GMT) Full text and rfc822 format available.

Message #164 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Wed, 27 May 2020 10:53:22 -0700
On 5/27/20 10:33 AM, Eli Zaretskii wrote:
> Sorry, I don't follow: what do you mean by "losing the address of the
> original object" in this case?  Can you show an example?

The source code says

   for (i = 0; i < size; i++)
      foo (AREF (obj, i));

This is the last reference to obj, so the compiler reuses the register R holding
obj, and has that register R contain &XVECTOR (obj)->contents[0], &XVECTOR
(obj)->contents[1], etc. each time through the loop, and transforms the call
into foo (*R) as an optimization. When foo calls the garbage collector,
maybe_lisp_pointer (R) can be false because R doesn't point directly at a Lisp
object: it points somewhere into the middle of a Lisp object and R's value is
not GC-aligned.

We've seen compilers do that.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Wed, 27 May 2020 17:59:02 GMT) Full text and rfc822 format available.

Message #167 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Wed, 27 May 2020 17:57:31 +0000
On Wed, May 27, 2020 at 4:58 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 5/27/20 8:26 AM, Eli Zaretskii wrote:
> > We used to rely on 8-byte alignment on those systems, and I don't see
> > any reason not to continue relying on that and punishing those
> > systems' performance.  What would we gain?
>
> In looking into this more, it appears that the maybe_lisp_pointer idea is wrong,
> in that compilers can make pointers into a Lisp object while losing the address
> of the original object (and we've seen them do this) and there's no guarantee
> that these sub-pointers are GCALIGNED.

Do you know of anything like this happening on 64-bit systems? Because
I think it doesn't; Emacs GC does rely, and has always relied since
GCPRO was removed, on compilers being sensible about what they put on
the stack. There's no guarantee in the C standard that that's true,
but there never will be.

> This sort of failure should be quite rare
> but can cause crashes such as the one you observed.

I'm pretty sure we figured out the crash that Eli observed. It's not
anything that involved, just a Lisp_Object being stored
non-consecutively and simultaneously being misaligned for the purposes
of maybe_lisp_pointer.

> I am looking into a fix and
> plan to apply it to master (I've already installed some minor glitches I
> observed on the way); we can then talk about what to do with emacs-27.

I may be out of line, but I think it's rash to change things like
that, even on master, with no opportunity for prior discussion. This
isn't a minor bug, or a spelling fix: it's a fundamental change in
what we expect from our C compiler and how GC works. In particular, I
don't see how you plan to solve it without treating any pointer that
points even in the vicinity of a valid lisp object as keeping that
object alive.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Wed, 27 May 2020 18:25:02 GMT) Full text and rfc822 format available.

Message #170 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Wed, 27 May 2020 21:24:26 +0300
> Cc: pipcet <at> gmail.com, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Wed, 27 May 2020 10:53:22 -0700
> 
> The source code says
> 
>    for (i = 0; i < size; i++)
>       foo (AREF (obj, i));
> 
> This is the last reference to obj, so the compiler reuses the register R holding
> obj, and has that register R contain &XVECTOR (obj)->contents[0], &XVECTOR
> (obj)->contents[1], etc. each time through the loop, and transforms the call
> into foo (*R) as an optimization. When foo calls the garbage collector,
> maybe_lisp_pointer (R) can be false because R doesn't point directly at a Lisp
> object: it points somewhere into the middle of a Lisp object and R's value is
> not GC-aligned.

For this to cause trouble, you'd need to arrange for no other
reference to obj, neither anywhere else up the callstack, nor from
another object we will mark.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Wed, 27 May 2020 18:40:02 GMT) Full text and rfc822 format available.

Message #173 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Wed, 27 May 2020 11:39:06 -0700
On 5/27/20 10:57 AM, Pip Cet wrote:

> Do you know of anything like this happening on 64-bit systems?

I think it's unlikely on 64-bit systems; it'd happen only on platforms where
alignof (void *) < 8, such as x86.

> Emacs GC does rely, and has always relied since
> GCPRO was removed, on compilers being sensible about what they put on
> the stack.

This isn't merely an issue about what compilers put into the stack; it's an also
an issue of what's in registers. There may not be any pointer in the stack that
points into the Lisp object. And compilers are not always "sensible" about
temps; they may cache &P->x into a temp with no copy of P anywhere.

> I'm pretty sure we figured out the crash that Eli observed. It's not
> anything that involved, just a Lisp_Object being stored
> non-consecutively and simultaneously being misaligned for the purposes
> of maybe_lisp_pointer.

Not sure what the point is here. None of this is "that involved". We can have
pointers into Lisp objects, pointers that are not aligned for the purposes of
maybe_lisp_pointer. Emacs should follow all of them, not just the one that Eli
happened to observe.

> I don't see how you plan to solve it without treating any pointer that
> points even in the vicinity of a valid lisp object as keeping that
> object alive.
Yes, of course. Any pointer that points somewhere within a Lisp object (in the C
sense) should count as pointing to the object. If memory serves, we already
treat pointers that way in some places; unfortunately we're not doing it
consistently.

But I take your point; I'll post the change here before committing to master.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Wed, 27 May 2020 18:40:02 GMT) Full text and rfc822 format available.

Message #176 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Wed, 27 May 2020 11:39:21 -0700
On 5/27/20 11:24 AM, Eli Zaretskii wrote:
> For this to cause trouble, you'd need to arrange for no other
> reference to obj, neither anywhere else up the callstack, nor from
> another object we will mark.

Yes, that's right. It's unlikely, but it does happen and we've seen it happen in
the past.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Wed, 27 May 2020 18:58:01 GMT) Full text and rfc822 format available.

Message #179 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Wed, 27 May 2020 18:56:17 +0000
On Wed, May 27, 2020 at 6:39 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 5/27/20 10:57 AM, Pip Cet wrote:
>
> > Do you know of anything like this happening on 64-bit systems?
>
> I think it's unlikely on 64-bit systems; it'd happen only on platforms where
> alignof (void *) < 8, such as x86.
>
> > Emacs GC does rely, and has always relied since
> > GCPRO was removed, on compilers being sensible about what they put on
> > the stack.
>
> This isn't merely an issue about what compilers put into the stack; it's an also
> an issue of what's in registers. There may not be any pointer in the stack that
> points into the Lisp object. And compilers are not always "sensible" about
> temps; they may cache &P->x into a temp with no copy of P anywhere.

Or they may cache &P->x + 1, and use negative offsets to access it.
That used to be the most efficient way of accessing arrays on some
machines. We simply can't cater to that.

Think about code like:

Lisp_Object reverse(Lisp_Object vector)
{
  ptrdiff_t count = ASIZE (vector);
  Lisp_Object new_vector = make_nil_vector (count);
  Lisp_Object *p = aref_addr (vector, count);
  Lisp_Object *q = new_vector->contents;
  while (count--)
    {
      garbage_collect ();
      *q++ = *--p;
    }
}

(which is what many compilers would generate from more sensible code).
On the first iteration, p points to a totally different vector, or
some random other object, but it still needs to keep its vector alive.

So, at the very least, we need to always keep the immediately
preceding object alive if we go that way.

> > I'm pretty sure we figured out the crash that Eli observed. It's not
> > anything that involved, just a Lisp_Object being stored
> > non-consecutively and simultaneously being misaligned for the purposes
> > of maybe_lisp_pointer.
>
> Not sure what the point is here. None of this is "that involved". We can have
> pointers into Lisp objects, pointers that are not aligned for the purposes of
> maybe_lisp_pointer. Emacs should follow all of them, not just the one that Eli
> happened to observe.

Or pointers past them, and that's a significant overhead because it
usually means two objects are being kept alive by one reference.

> > I don't see how you plan to solve it without treating any pointer that
> > points even in the vicinity of a valid lisp object as keeping that
> > object alive.

> Yes, of course.

I didn't mean just "within the object", I did mean "in the vicinity".
With prefetch instructions, it's quite likely the compiler concludes
it's easiest to prefetch something 256 bytes ahead of where it
actually makes the access, then make the actual access relative to
that address...

> Any pointer that points somewhere within a Lisp object (in the C
> sense) should count as pointing to the object.

The C standard explicitly allows pointers (and that's C pointers) to
point one past the end of an allocated array, I believe.

> If memory serves, we already
> treat pointers that way in some places; unfortunately we're not doing it
> consistently.

Yes, we do.

> But I take your point; I'll post the change here before committing to master.

I'm sorry, I misunderstood. If you want to fix only pointers within
objects, that is quite a small change, but I believe it is incomplete.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Thu, 28 May 2020 01:22:01 GMT) Full text and rfc822 format available.

Message #182 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Wed, 27 May 2020 18:21:04 -0700
On 5/27/20 11:56 AM, Pip Cet wrote:

> So, at the very least, we need to always keep the immediately
> preceding object alive if we go that way.

Yes, I'm assuming that. I'll check that the code is doing that (if it isn't
doing it already).

> that's a significant overhead because it
> usually means two objects are being kept alive by one reference.

For Lisp_Objects with nonzero tags this shouldn't be an issue, since the tags
mean the pointers won't tie down two objects. For Lisp_Symbols (whose tags are
zero) it is an issue; also for untagged pointers to the start of objects.

I'll measure how much overhead is involved in my usual 'make compile-always'
benchmark. If it's not that much then we'll be OK. I'm hoping that's the case.
If not, there are some more measures we can take.

> With prefetch instructions, it's quite likely the compiler concludes
> it's easiest to prefetch something 256 bytes ahead of where it
> actually makes the access, then make the actual access relative to
> that address...

I wouldn't worry about that; it's so unlikely that it's not a practical concern.
"Some C optimizers may lose the last undisguised pointer to a memory object as a
consequence of clever optimizations. This has almost never been observed in
practice." <https://github.com/ivmai/bdwgc> As I understand it, the times "in
practice" that Hans-J. Boehm was talking about were for C code deliberately
designed to fool the compiler / GC combination.

I think it unlikely that a modern compiler would break all the code out there
that uses conservative GC.

(Besides, if that stuff really were of practical concern we'd have to give up on
conservative GC entirely. :-)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Thu, 28 May 2020 02:45:02 GMT) Full text and rfc822 format available.

Message #185 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Wed, 27 May 2020 22:43:52 -0400
> (obj)->contents[1], etc. each time through the loop, and transforms the call
> into foo (*R) as an optimization. When foo calls the garbage collector,
> maybe_lisp_pointer (R) can be false because R doesn't point directly at a Lisp
> object: it points somewhere into the middle of a Lisp object and R's value is
> not GC-aligned.

Indeed, basically `maybe_lisp_pointer` goes against the effort we've put
into replacing `live_string_p` with `live_string_holding` (i.e. to
recognize anything that points into any part of a Lisp_String so as to
prevent collecting it).


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Thu, 28 May 2020 06:33:02 GMT) Full text and rfc822 format available.

Message #188 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Thu, 28 May 2020 06:31:44 +0000
On Thu, May 28, 2020 at 1:21 AM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 5/27/20 11:56 AM, Pip Cet wrote:
>
> > So, at the very least, we need to always keep the immediately
> > preceding object alive if we go that way.
>
> Yes, I'm assuming that. I'll check that the code is doing that (if it isn't
> doing it already).

Okay, that makes sense.

> > that's a significant overhead because it
> > usually means two objects are being kept alive by one reference.
>
> For Lisp_Objects with nonzero tags this shouldn't be an issue, since the tags
> mean the pointers won't tie down two objects.

On USE_LSB_TAG systems, you're correct.

> I'll measure how much overhead is involved in my usual 'make compile-always'
> benchmark. If it's not that much then we'll be OK. I'm hoping that's the case.
> If not, there are some more measures we can take.

I suspect that garbage collection is only slowed down significantly
when there are large objects on the stack; that happens when GC
happens during redisplay, for example. (All the more reason to make
the struct it stack heap-allocated as I'd proposed).

> > With prefetch instructions, it's quite likely the compiler concludes
> > it's easiest to prefetch something 256 bytes ahead of where it
> > actually makes the access, then make the actual access relative to
> > that address...
>
> I wouldn't worry about that; it's so unlikely that it's not a practical concern.

Fingers crossed.

> "Some C optimizers may lose the last undisguised pointer to a memory object as a
> consequence of clever optimizations. This has almost never been observed in
> practice." <https://github.com/ivmai/bdwgc> As I understand it, the times "in
> practice" that Hans-J. Boehm was talking about were for C code deliberately
> designed to fool the compiler / GC combination.
>
> I think it unlikely that a modern compiler would break all the code out there
> that uses conservative GC.
>
> (Besides, if that stuff really were of practical concern we'd have to give up on
> conservative GC entirely. :-)

I hope you're right, in that compilers will support GC better before
they move on to clever optimizations that break it :-)

(I'm not sure what the current state is of "real" GC support in LLVM;
I'm pretty sure not much has happened in GCC.)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Thu, 28 May 2020 07:28:02 GMT) Full text and rfc822 format available.

Message #191 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Thu, 28 May 2020 10:27:15 +0300
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: Eli Zaretskii <eliz <at> gnu.org>,  pipcet <at> gmail.com,  41321 <at> debbugs.gnu.org
> Date: Wed, 27 May 2020 22:43:52 -0400
> 
> Indeed, basically `maybe_lisp_pointer` goes against the effort we've put
> into replacing `live_string_p` with `live_string_holding` (i.e. to
> recognize anything that points into any part of a Lisp_String so as to
> prevent collecting it).

You are suggesting that we go back to using live_string_p?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Thu, 28 May 2020 07:42:02 GMT) Full text and rfc822 format available.

Message #194 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 41321 <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Thu, 28 May 2020 00:41:33 -0700
On 5/28/20 12:27 AM, Eli Zaretskii wrote:
>> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
>> Date: Wed, 27 May 2020 22:43:52 -0400
>>
>> Indeed, basically `maybe_lisp_pointer` goes against the effort we've put
>> into replacing `live_string_p` with `live_string_holding` (i.e. to
>> recognize anything that points into any part of a Lisp_String so as to
>> prevent collecting it).
> 
> You are suggesting that we go back to using live_string_p?

I think he's saying just the opposite: namely, that maybe_lisp_pointer is a
mistake, in that it goes against the (solid) reasons we've replaced some calls
to live_string_p with calls to live_string_holding.

After looking into it I agree. I'll propose a patch shortly that does away with
maybe_lisp_pointer.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Thu, 28 May 2020 07:48:01 GMT) Full text and rfc822 format available.

Message #197 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Thu, 28 May 2020 00:47:05 -0700
[Message part 1 (text/plain, inline)]
On 5/27/20 11:31 PM, Pip Cet wrote:
> I hope you're right, in that compilers will support GC better before
> they move on to clever optimizations that break it :-)

After looking into it, I decided it wasn't worth the hassle of treating pointers
just past the end of a Lisp object as pointing into the object. Although such
pointers can exist, I can't think of a realistic-with-today's-compilers scenario
at the machine level where (1) a pointer like that will exist, (2) no pointers
into the middle or start of the object will exist, and (3) the object might be
accessed later. In contrast we have seen scenarios with pointers into the middle
of Lisp objects.

With that in mind, attached is a proposed patch to master that I hope deals with
some of the more-serious problems mentioned so far in this thread, in particular
the problem with Lisp_Object representations of symbols being split into two
registers in a --with-wide-int build. I haven't tested this as much as I'd like,
but I need to turn my attention to sleep and work and so this is a good place to
broadcast a checkpoint.

This patch doesn't address the LISP_ALIGNMENT issues you mentioned, both in
lisp.h and in the pdumper; I can work on that soon, I think.

PS. Thanks for helping bring this problem to our attention; it's been fun to
look into it.
[0001-Fix-crashes-due-to-misidentified-pointers.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Thu, 28 May 2020 08:13:02 GMT) Full text and rfc822 format available.

Message #200 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Thu, 28 May 2020 08:11:42 +0000
On Thu, May 28, 2020 at 7:47 AM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 5/27/20 11:31 PM, Pip Cet wrote:
> > I hope you're right, in that compilers will support GC better before
> > they move on to clever optimizations that break it :-)
>
> After looking into it, I decided it wasn't worth the hassle of treating pointers
> just past the end of a Lisp object as pointing into the object. Although such
> pointers can exist, I can't think of a realistic-with-today's-compilers scenario
> at the machine level where (1) a pointer like that will exist, (2) no pointers
> into the middle or start of the object will exist, and (3) the object might be
> accessed later. In contrast we have seen scenarios with pointers into the middle
> of Lisp objects.

Okay. I was about to write that I'd concluded the same thing, after
failing to come up with an example other than that hypothetical
Freverse implementation.

> With that in mind, attached is a proposed patch to master that I hope deals with
> some of the more-serious problems mentioned so far in this thread, in particular
> the problem with Lisp_Object representations of symbols being split into two
> registers in a --with-wide-int build. I haven't tested this as much as I'd like,
> but I need to turn my attention to sleep and work and so this is a good place to
> broadcast a checkpoint.

Thanks! Looks great generally, though I confess I haven't checked what
would happen in a (hypothetical?) !USE_LSB_TAG 64-bit case.

+      if (!symbol_only && live_float_p (m, p))
+        obj = make_lisp_ptr (cp - (uintptr_t) cp % GCALIGNMENT, Lisp_Float);
       break;

I'm not sure about this code, though, it assumes GCALIGNMENT == sizeof
Lisp_Float.

> PS. Thanks for helping bring this problem to our attention; it's been fun to
> look into it.

I agree. I'll certainly continue looking for bugs and working on
Emacs, but at this point I'm unsure it's worth it to actually share
such work with anyone. But that doesn't really belong here.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Thu, 28 May 2020 13:31:02 GMT) Full text and rfc822 format available.

Message #203 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Thu, 28 May 2020 09:30:20 -0400
>> You are suggesting that we go back to using live_string_p?
> I think he's saying just the opposite: namely, that maybe_lisp_pointer is a
> mistake, in that it goes against the (solid) reasons we've replaced some calls
> to live_string_p with calls to live_string_holding.
> After looking into it I agree. I'll propose a patch shortly that does away with
> maybe_lisp_pointer.

Exactly.  More specifically, `maybe_lisp_pointer` tries to filter out
false positives but does it based on the assumption that we should only
accept numbers that look like pointers to the beginning of
a Lisp_Object.

If we still want to try and filter out false positives we need to do it
more carefully by considering what is the smallest alignment possible
for a pointer to an internal field of a Lisp_Object.

And if this least alignment is not the same for all Lisp_Objects, then
this test should likely be moved to the respective `live_<foo>_holding`.

I suspect that for vectorlike objects, the least alignement is 1 because
of some `char` or `bool` fields in some of the pseudovectors.
Of course, we could do better by checking for "false positives" after
checking the specific kind of vectorlike object (so as to use
a different least-alignment-check for those objects that contains
`char`s than for those who only contain `int`s, for example).


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Thu, 28 May 2020 14:30:02 GMT) Full text and rfc822 format available.

Message #206 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>,
 41321 <at> debbugs.gnu.org
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Thu, 28 May 2020 14:28:29 +0000
On Thu, May 28, 2020 at 1:30 PM Stefan Monnier <monnier <at> iro.umontreal.ca> wrote:
> >> You are suggesting that we go back to using live_string_p?
> > I think he's saying just the opposite: namely, that maybe_lisp_pointer is a
> > mistake, in that it goes against the (solid) reasons we've replaced some calls
> > to live_string_p with calls to live_string_holding.
> > After looking into it I agree. I'll propose a patch shortly that does away with
> > maybe_lisp_pointer.
>
> Exactly.  More specifically, `maybe_lisp_pointer` tries to filter out
> false positives but does it based on the assumption that we should only
> accept numbers that look like pointers to the beginning of
> a Lisp_Object.
>
> If we still want to try and filter out false positives we need to do it
> more carefully by considering what is the smallest alignment possible
> for a pointer to an internal field of a Lisp_Object.
>
> And if this least alignment is not the same for all Lisp_Objects, then
> this test should likely be moved to the respective `live_<foo>_holding`.

But at that point, we already have walked the rbtree, which is
probably the main performance problem.

My suggestion is instead to put MEM_TYPE_SYMBOL blocks into the rbtree
twice, once at their proper address and once at the lispsym-based
offset.

We could then look up each pointer precisely once, though sometimes
the blocks might overlap and we'd end up marking two objects for one
pointer.

But that would lead to overlapping rbtree entries, and that requires
some extra code which wouldn't be exercised very often... still, I
think it might be worth doing, particularly since there are relatively
few symbol blocks on most systems.

> I suspect that for vectorlike objects, the least alignement is 1 because
> of some `char` or `bool` fields in some of the pseudovectors.
> Of course, we could do better by checking for "false positives" after
> checking the specific kind of vectorlike object (so as to use
> a different least-alignment-check for those objects that contains
> `char`s than for those who only contain `int`s, for example).

I think the point of maybe_lisp_pointer wasn't to mark fewer objects,
it was to look up fewer pointers in the rbtree. I might be wrong.

On 64-bit systems with ASLR, at least, it's quite unlikely that we
have what looks like a valid pointer into a Lisp object that we can
conclude is not based on its offset or alignment...




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Thu, 28 May 2020 16:26:02 GMT) Full text and rfc822 format available.

Message #209 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>,
 41321 <at> debbugs.gnu.org
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Thu, 28 May 2020 12:24:53 -0400
> But at that point, we already have walked the rbtree, which is
> probably the main performance problem.

Indeed, lisp_maybe_pointer can avoid this cost, but I was more concerned
with the risk of increasing the number of objects kept live because of
false-positives (i.e. a random integer/float/younameit that happens to
look like it's pointing into the object).

> I think the point of maybe_lisp_pointer wasn't to mark fewer objects,
> it was to look up fewer pointers in the rbtree.  I might be wrong.

You might right.


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Thu, 28 May 2020 18:28:02 GMT) Full text and rfc822 format available.

Message #212 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Thu, 28 May 2020 21:27:31 +0300
> Cc: pipcet <at> gmail.com, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Wed, 27 May 2020 09:58:11 -0700
> 
> we can then talk about what to do with emacs-27.

After thinking about this some, I think the only sensible thing to do
on emacs-27 is to return to 8-byte alignment test in GC for 32-bit
MinGW builds.  That is, replace max_align_t with just 8 in the
definition of LISP_ALIGNMENT in that case.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Thu, 28 May 2020 19:34:02 GMT) Full text and rfc822 format available.

Message #215 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Thu, 28 May 2020 12:33:10 -0700
[Message part 1 (text/plain, inline)]
On 5/28/20 11:27 AM, Eli Zaretskii wrote:
>> Cc: pipcet <at> gmail.com, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
>> From: Paul Eggert <eggert <at> cs.ucla.edu>
>> Date: Wed, 27 May 2020 09:58:11 -0700
>>
>> we can then talk about what to do with emacs-27.
> 
> After thinking about this some, I think the only sensible thing to do
> on emacs-27 is to return to 8-byte alignment test in GC for 32-bit
> MinGW builds.  That is, replace max_align_t with just 8 in the
> definition of LISP_ALIGNMENT in that case.

Exactly the same problem can occur for other x86 platforms (e.g., GNU/Linux, GCC
7-and-later, glibc 2.25-and-earlier), because these other platforms also have
the bug that malloc can return a pointer that is 8 modulo 16 even though alignof
(max_align_t) is 16.  so I suggest doing the replacement for those platforms
too, as in the attached patch.
[0001-Fix-aborts-due-to-GC-losing-pseudovectors.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 06:20:01 GMT) Full text and rfc822 format available.

Message #218 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Fri, 29 May 2020 09:19:34 +0300
> Cc: pipcet <at> gmail.com, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Thu, 28 May 2020 12:33:10 -0700
> 
> > After thinking about this some, I think the only sensible thing to do
> > on emacs-27 is to return to 8-byte alignment test in GC for 32-bit
> > MinGW builds.  That is, replace max_align_t with just 8 in the
> > definition of LISP_ALIGNMENT in that case.
> 
> Exactly the same problem can occur for other x86 platforms (e.g., GNU/Linux, GCC
> 7-and-later, glibc 2.25-and-earlier), because these other platforms also have
> the bug that malloc can return a pointer that is 8 modulo 16 even though alignof
> (max_align_t) is 16.  so I suggest doing the replacement for those platforms
> too, as in the attached patch.

I'm okay with doing this on other platforms, but...

>  static bool
>  maybe_lisp_pointer (void *p)
>  {
> -  return (uintptr_t) p % LISP_ALIGNMENT == 0;
> +  return (uintptr_t) p % GCALIGNMENT == 0;
>  }

...replacing LISP_ALIGNMENT with GCALIGNMENT just here doesn't sound
right to me: by keeping the current value of LISP_ALIGNMENT, we
basically declare that Lisp objects shall be aligned on that boundary,
whereas that isn't really the case.  Why not change the value of
LISP_ALIGNMENT instead?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 08:26:02 GMT) Full text and rfc822 format available.

Message #221 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 29 May 2020 08:25:09 +0000
On Thu, May 28, 2020 at 7:33 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> too, as in the attached patch.

Are you sure you attached the correct file? This patch is identical to
one you'd sent earlier, and which Eli criticized for being overly
conservative on GCALIGNMENT==1 systems.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 09:45:02 GMT) Full text and rfc822 format available.

Message #224 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>,
 41321 <at> debbugs.gnu.org
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 29 May 2020 09:43:24 +0000
[Message part 1 (text/plain, inline)]
On Thu, May 28, 2020 at 2:28 PM Pip Cet <pipcet <at> gmail.com> wrote:
> My suggestion is instead to put MEM_TYPE_SYMBOL blocks into the rbtree
> twice, once at their proper address and once at the lispsym-based
> offset.
>
> We could then look up each pointer precisely once, though sometimes
> the blocks might overlap and we'd end up marking two objects for one
> pointer.
>
> But that would lead to overlapping rbtree entries, and that requires
> some extra code which wouldn't be exercised very often... still, I
> think it might be worth doing, particularly since there are relatively
> few symbol blocks on most systems.

Okay, here's some initial code that does that. It's a little tricky,
because real addresses and symbol offsets can overlap arbitrarily and
become mapped and unmapped in any order. The basic idea is that symbol
offsets are marked two ways:
1. an overlaps_with_symbols flag on a "normal" memory node
2. a mem node type of MEM_TYPE_SYMBOL_ADJUSTED

(2) implies (1), but not the other way around. There's only one flag
per normal memory node, which is true if any of the addresses in the
node are also valid symbol offsets. MEM_TYPE_SYMBOL_ADJUSTED nodes
have start and end addresses that do not necessarily correspond to
symbol blocks or even symbols; their length is arbitrary.

When we insert or delete memory nodes, we perform the obvious
operations to keep MEM_TYPE_SYMBOL_ADJUSTED blocks accurate: i.e.,
when a MEM_TYPE_SYMBOL_ADJUSTED node is split by an
intervening/overlapping normal node, we insert one or two new
MEM_TYPE_SYMBOL_ADJUSTED nodes to cover the remaining offsets, and set
the overlaps_with_symbols flag on the normal node, to cover those,
etc.

As I said, the code is tricky (i.e. might contain bugs that can only
be discovered through extensive testing on 32-bit systems), and it
complicates what should be generic functions for the rbtree
implementation, so this is probably a 32-bit optimization that is too
late because 32-bit systems are no longer that relevant...
[0001-snapshot.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 09:52:01 GMT) Full text and rfc822 format available.

Message #227 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 29 May 2020 12:51:17 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 22 May 2020 11:47:03 +0000
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 41321 <at> debbugs.gnu.org
> 
> If you could disassemble signal_before_change, we'd know whether
> start_marker and end_marker live in callee-saved registers, and thus
> whether this is likely to be Andrea's bug.

signal_before_change cannot be disassembled because it's inlined.
Diassemblying its caller, prepare_to_modify_buffer_1, seems to
indicate that start_marker and end_marker are pushed onto the stack
when they are returned by copy-marker, and taken from there when we
later call marker-position (which segfaults):

2163          PRESERVE_START_END;
   0x010ed99e <+834>:   mov    0x58(%esp),%eax
   0x010ed9a2 <+838>:   or     0x4c(%esp),%eax
   0x010ed9a6 <+842>:   je     0x10edd77 <prepare_to_modify_buffer_1+1819>
   0x010ed9ac <+848>:   mov    0x44(%esp),%ecx
   0x010ed9b0 <+852>:   or     0x38(%esp),%ecx
   0x010ed9b4 <+856>:   je     0x10edf90 <prepare_to_modify_buffer_1+2356>
   0x010edd77 <+1819>:  movl   $0x0,0x8(%esp)
   0x010edd7f <+1827>:  movl   $0x0,0xc(%esp)
   0x010edd87 <+1835>:  mov    0x50(%esp),%eax
   0x010edd8b <+1839>:  mov    0x54(%esp),%edx
   0x010edd8f <+1843>:  mov    %eax,(%esp)
   0x010edd92 <+1846>:  mov    %edx,0x4(%esp)
   0x010edd96 <+1850>:  call   0x10f15a5 <Fcopy_marker>
   0x010edd9b <+1855>:  mov    %eax,0x4c(%esp)   <<<<<<<<<<<<<<<<<<<<<
   0x010edd9f <+1859>:  mov    %edx,0x58(%esp)   <<<<<<<<<<<<<<<<<<<<<
   0x010edda3 <+1863>:  mov    0x44(%esp),%ecx
   0x010edda7 <+1867>:  or     0x38(%esp),%ecx
   0x010eddab <+1871>:  jne    0x10ede59 <prepare_to_modify_buffer_1+2045>
   0x010eddb1 <+1877>:  movl   $0x0,0x8(%esp)
   0x010eddb9 <+1885>:  movl   $0x0,0xc(%esp)
   0x010eddc1 <+1893>:  mov    %esi,(%esp)
   0x010eddc4 <+1896>:  mov    %edi,0x4(%esp)
   0x010eddc8 <+1900>:  call   0x10f15a5 <Fcopy_marker>
   0x010eddcd <+1905>:  mov    %eax,0x38(%esp)   <<<<<<<<<<<<<<<<<<<<
   0x010eddd1 <+1909>:  mov    %edx,0x44(%esp)   <<<<<<<<<<<<<<<<<<<<
   0x010edf90 <+2356>:  movl   $0x0,0x8(%esp)
   0x010edf98 <+2364>:  movl   $0x0,0xc(%esp)
   0x010edfa0 <+2372>:  mov    %esi,(%esp)
   0x010edfa3 <+2375>:  mov    %edi,0x4(%esp)
   0x010edfa7 <+2379>:  call   0x10f15a5 <Fcopy_marker>
   0x010edfac <+2384>:  mov    %eax,0x38(%esp)
   0x010edfb0 <+2388>:  mov    %edx,0x44(%esp)
   [...]
2179          report_overlay_modification (FETCH_START, FETCH_END, 0,
   0x010eda5f <+1027>:  mov    0x44(%esp),%eax
   0x010eda63 <+1031>:  or     0x38(%esp),%eax
   0x010eda67 <+1035>:  jne    0x10edd20 <prepare_to_modify_buffer_1+1732>
   0x010eda6d <+1041>:  mov    0x58(%esp),%ecx
   0x010eda71 <+1045>:  or     0x4c(%esp),%ecx
   0x010eda75 <+1049>:  jne    0x10edf1e <prepare_to_modify_buffer_1+2242>
   0x010eda7b <+1055>:  mov    %esi,0x68(%esp)
   0x010eda7f <+1059>:  mov    %edi,0x6c(%esp)
   0x010eda83 <+1063>:  mov    0x50(%esp),%eax
   0x010eda87 <+1067>:  mov    0x54(%esp),%edx
   0x010eda8b <+1071>:  mov    %eax,0x60(%esp)
   0x010eda8f <+1075>:  mov    %edx,0x64(%esp)
   0x010eda93 <+1079>:  movl   $0x0,0x24(%esp)
   0x010eda9b <+1087>:  movl   $0x0,0x28(%esp)
   0x010edaa3 <+1095>:  mov    0x68(%esp),%eax
   0x010edaa7 <+1099>:  mov    0x6c(%esp),%edx
   0x010edaab <+1103>:  mov    %eax,0x1c(%esp)
   0x010edaaf <+1107>:  mov    %edx,0x20(%esp)
   0x010edab3 <+1111>:  mov    0x60(%esp),%eax
   0x010edab7 <+1115>:  mov    0x64(%esp),%edx
   0x010edabb <+1119>:  mov    %eax,0x14(%esp)
   0x010edabf <+1123>:  mov    %edx,0x18(%esp)
   0x010edac3 <+1127>:  movl   $0x0,0x10(%esp)
   0x010edacb <+1135>:  mov    %esi,0x8(%esp)
   0x010edacf <+1139>:  mov    %edi,0xc(%esp)
   0x010edad3 <+1143>:  mov    0x50(%esp),%eax
   0x010edad7 <+1147>:  mov    0x54(%esp),%edx
   0x010edadb <+1151>:  mov    %eax,(%esp)
   0x010edade <+1154>:  mov    %edx,0x4(%esp)
   0x010edae2 <+1158>:  call   0x10e76ea <report_overlay_modification>
   0x010edd20 <+1732>:  mov    0x38(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edd24 <+1736>:  mov    %eax,(%esp)
   0x010edd27 <+1739>:  mov    0x44(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edd2b <+1743>:  mov    %eax,0x4(%esp)
   0x010edd2f <+1747>:  call   0x10f072a <Fmarker_position>
   0x010edd34 <+1752>:  mov    %eax,0x68(%esp)
   0x010edd38 <+1756>:  mov    %edx,0x6c(%esp)
   0x010edd3c <+1760>:  mov    0x58(%esp),%eax 
   0x010edd40 <+1764>:  or     0x4c(%esp),%eax
   0x010edd44 <+1768>:  jne    0x10edeba <prepare_to_modify_buffer_1+2142>
   0x010edd4a <+1774>:  mov    0x38(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<
   0x010edd4e <+1778>:  mov    %eax,(%esp)
   0x010edd51 <+1781>:  mov    0x44(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<
   0x010edd55 <+1785>:  mov    %eax,0x4(%esp)
   0x010edd59 <+1789>:  call   0x10f072a <Fmarker_position>
   0x010edd5e <+1794>:  mov    %eax,%esi
   0x010edd60 <+1796>:  mov    %edx,%edi
   0x010edd62 <+1798>:  mov    0x50(%esp),%eax
   0x010edd66 <+1802>:  mov    0x54(%esp),%edx
   0x010edd6a <+1806>:  mov    %eax,0x60(%esp)
   0x010edd6e <+1810>:  mov    %edx,0x64(%esp)
   0x010edd72 <+1814>:  jmp    0x10eda93 <prepare_to_modify_buffer_1+1079>
   0x010edeba <+2142>:  mov    0x4c(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edebe <+2146>:  mov    %eax,(%esp)
   0x010edec1 <+2149>:  mov    0x58(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edec5 <+2153>:  mov    %eax,0x4(%esp)
   0x010edec9 <+2157>:  call   0x10f072a <Fmarker_position>
   0x010edece <+2162>:  mov    %eax,0x60(%esp)
   0x010eded2 <+2166>:  mov    %edx,0x64(%esp)
   0x010eded6 <+2170>:  mov    0x38(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010ededa <+2174>:  mov    %eax,(%esp)
   0x010ededd <+2177>:  mov    0x44(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edee1 <+2181>:  mov    %eax,0x4(%esp)
   0x010edee5 <+2185>:  call   0x10f072a <Fmarker_position>
   0x010edeea <+2190>:  mov    %eax,%esi
   0x010edeec <+2192>:  mov    %edx,%edi
   0x010edeee <+2194>:  mov    0x4c(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edef2 <+2198>:  mov    %eax,(%esp)
   0x010edef5 <+2201>:  mov    0x58(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edef9 <+2205>:  mov    %eax,0x4(%esp)
   0x010edefd <+2209>:  call   0x10f072a <Fmarker_position>
   0x010edf02 <+2214>:  mov    %eax,0x50(%esp)
   0x010edf06 <+2218>:  mov    %edx,0x54(%esp)
   0x010edf0a <+2222>:  jmp    0x10eda93 <prepare_to_modify_buffer_1+1079>
   0x010edf1e <+2242>:  mov    0x4c(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edf22 <+2246>:  mov    %eax,(%esp)
   0x010edf25 <+2249>:  mov    0x58(%esp),%eax   <<<<<<<<<<<<<<<<<<<<<<
   0x010edf29 <+2253>:  mov    %eax,0x4(%esp)
   0x010edf2d <+2257>:  call   0x10f072a <Fmarker_position>




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 10:02:02 GMT) Full text and rfc822 format available.

Message #230 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 29 May 2020 10:00:39 +0000
On Fri, May 29, 2020 at 9:51 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Fri, 22 May 2020 11:47:03 +0000
> > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 41321 <at> debbugs.gnu.org
> >
> > If you could disassemble signal_before_change, we'd know whether
> > start_marker and end_marker live in callee-saved registers, and thus
> > whether this is likely to be Andrea's bug.
>
> signal_before_change cannot be disassembled because it's inlined.

Sorry. On my system, gdb does the right thing if I enter "disassemble
signal_before_change".

> Diassemblying its caller, prepare_to_modify_buffer_1, seems to
> indicate that start_marker and end_marker are pushed onto the stack
> when they are returned by copy-marker, and taken from there when we
> later call marker-position (which segfaults):

That's my reading as well.

>    0x010edd96 <+1850>:  call   0x10f15a5 <Fcopy_marker>
>    0x010edd9b <+1855>:  mov    %eax,0x4c(%esp)   <<<<<<<<<<<<<<<<<<<<<
>    0x010edd9f <+1859>:  mov    %edx,0x58(%esp)   <<<<<<<<<<<<<<<<<<<<<

As you can see, the stack positions aren't consecutive: the
Lisp_Object is split between bytes 0x58..5b(%esp) and bytes
0x4c..0x4f(%esp).

>    0x010eddc8 <+1900>:  call   0x10f15a5 <Fcopy_marker>
>    0x010eddcd <+1905>:  mov    %eax,0x38(%esp)   <<<<<<<<<<<<<<<<<<<<
>    0x010eddd1 <+1909>:  mov    %edx,0x44(%esp)   <<<<<<<<<<<<<<<<<<<<

Same here.

So we know (from your backtrace) these objects aren't 16-byte-aligned,
and we know your GC won't mark them because they're
discontinuously-stored and max_align_t has an alignment of 16 on your
system. We also know the only reference to them is on the stack.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 10:17:02 GMT) Full text and rfc822 format available.

Message #233 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 29 May 2020 13:16:17 +0300
> Date: Fri, 22 May 2020 10:22:56 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 41321 <at> debbugs.gnu.org
> 
> > > I'm already running with such a breakpoint, let's how it will catch
> > > something.                                        ^^^
> > 
> > Should have been "hope".  Sorry.
> 
> It happened again, and now insert-file-contents wasn't involved, so I
> guess it's off the hook.  The command which triggered the problem was
> self-insert-command, as shown in the backtrace below.  The problem
> seems to be with handling overlays when buffer text changes.

One more segfault very similar to the last one I reported: it happened
when calling report_overlay_modification due to text being inserted
into a buffer.

The backtrace and the debugging session are below.  Noteworthy
observations:

. The buffer's overlay chain and the buffer's marker chain are both
  intact and valid.

. The two markers, start_marker and end_marker, which are created by
  PRESERVE_START_END before calling before-change-functions, are NOT
  in the buffer's marker chain after run-hook-with-args returns.  This
  most probably means GC was invoked while run-hook-with-args ran and
  decided to GC those 2 markers, which then unchains them via
  unchain_dead_markers.

. last_marked[] doesn't seem to mention start_marker or end_marker, at
  least not in its last 470 slots:

    (gdb) find /g1 &last_marked[0], last_marked[last_marked_index-1], 0xa00000001ffac2c8
    Pattern not found.

  This seems to be a supporting evidence that those two markers were
  GC'ed.

. start_marker and end_marker encode pointers which are 8-byte
  aligned, not 16-byte aligned.  The values of the pointers are
  0x1ffac2a8 and 0x1ffac2c8, as can be seen from the debug session.

. There's nothing wrong with rvoe_arg.location; in the previous
  sessions we forgot to dereference it (it's a pointer to a Lisp
  object).  Here's how it looks when shown correctly:

    (gdb) p rvoe_arg.location
    $14 = (Lisp_Object *) 0x15c9298 <globals+120>
    (gdb) p *rvoe_arg.location
    $15 = XIL(0xc00000001646b9b0)
    (gdb) xtype
    Lisp_Cons
    (gdb) xcar
    $16 = 0x30
    (gdb) xsymbol
    $17 = (struct Lisp_Symbol *) 0x15ca210 <lispsym+48>
    "t"
    (gdb) p *rvoe_arg.location
    $18 = XIL(0xc00000001646b9b0)
    (gdb) xcdr
    $19 = 0xc00000001646b9d0
    (gdb) xtype
    Lisp_Cons
    (gdb) xcar
    $20 = 0xd5c0
    (gdb) xtype
    Lisp_Symbol
    (gdb) xsymbol
    $21 = (struct Lisp_Symbol *) 0x15d77a0 <lispsym+54720>
    "syntax-ppss-flush-cache"
    (gdb) p *rvoe_arg.location
    $22 = XIL(0xc00000001646b9b0)
    (gdb) xcdr
    $23 = 0xc00000001646b9d0
    (gdb) xcdr
    $24 = 0x0
    [...]
    (gdb) pp *rvoe_arg.location
    (t syntax-ppss-flush-cache)

. There's nothing wrong with GDB's xtype command: it fails when a Lisp
  object encodes a pointer to invalid memory:

    (gdb) p start_marker
    $25 = XIL(0xa00000001ffac2a8)
    (gdb) xtype
    Lisp_Vectorlike
    Cannot access memory at address 0x1ffac2a8
    (gdb) p/x start_marker
    $26 = 0xa00000001ffac2a8
    (gdb) xgettype $26
    (gdb) p $type
    $27 = Lisp_Vectorlike
    (gdb) xvectype $26
    Cannot access memory at address 0x1ffac2a8
    (gdb) p/x ((struct Lisp_Vector *) $26)->header.size
    warning: value truncated
    Cannot access memory at address 0x1ffac2a8
    (gdb) p/x ((struct Lisp_Vector *) $26)->header
    warning: value truncated
    Cannot access memory at address 0x1ffac2a8
    (gdb) p/x ((struct Lisp_Vector *) $26)
    warning: value truncated
    $35 = 0x1ffac2a8
    (gdb) p/x end_marker
    $38 = 0xa00000001ffac2c8
    (gdb) xtype
    Lisp_Vectorlike
    Cannot access memory at address 0x1ffac2a8
    (gdb) p/x ((struct Lisp_Vector *)0x1ffac2c8)->header
    Cannot access memory at address 0x1ffac2c8

. Provisional conclusion: the two temporary markers created by
  signal_before_change were on the stack (see my other message with
  code disassembly), and were GC'ed as side effect or running
  syntax-ppss-flush-cache via before-change-functions.  So we should
  see whether fixing the LISP_ALIGNMENT vs GCALIGNMENT discrepancy
  fixes this problem.

Here's the backtrace and the full debug session after the crash, with
some omissions:

Thread 1 received signal SIGSEGV, Segmentation fault.
PSEUDOVECTORP (code=<optimized out>, a=<optimized out>) at lisp.h:1720
1720          return PSEUDOVECTOR_TYPEP (XUNTAG (a, Lisp_Vectorlike,
(gdb) bt
#0  PSEUDOVECTORP (code=<optimized out>, a=<optimized out>) at lisp.h:1720
#1  MARKERP (x=<optimized out>) at lisp.h:2618
#2  CHECK_MARKER (x=XIL(0xa00000001ffac2c8)) at marker.c:133
#3  0x010f073c in Fmarker_position (marker=XIL(0xa00000001ffac2c8))
    at marker.c:452
#4  0x010edd34 in signal_before_change (preserve_ptr=0x0, end_int=276884,
    start_int=276884) at insdel.c:2179
#5  prepare_to_modify_buffer_1 (start=start <at> entry=276884,
    end=end <at> entry=276884, preserve_ptr=preserve_ptr <at> entry=0x0)
    at insdel.c:2007
#6  0x010ee27d in prepare_to_modify_buffer (start=276884, end=276884,
    preserve_ptr=preserve_ptr <at> entry=0x0) at insdel.c:2018
#7  0x010ee54d in insert_1_both (
    string=0x1e3c9c08 " 2823D 26-May  gdb-patches <at> sourceware.or [244] Re: [PATCH, testsuite] Fix some duplicate test names\n\r...",
    nchars=100, nbytes=100, inherit=false, prepare=true, before_markers=false)
    at insdel.c:896
#8  0x010ee5c5 in insert_1_both (string=<optimized out>,
    nchars=<optimized out>, nchars <at> entry=100, nbytes=<optimized out>,
    nbytes <at> entry=100, inherit=inherit <at> entry=false,
    prepare=prepare <at> entry=true, before_markers=before_markers <at> entry=false)
    at insdel.c:947
#9  0x01174188 in Fprinc (object=XIL(0x800000001e05f278),
    printcharfun=<optimized out>) at print.c:734
#10 0x0114fc5c in funcall_subr (subr=<optimized out>,
    numargs=<optimized out>, numargs <at> entry=2, args=<optimized out>,
    args <at> entry=0x82d9b8) at eval.c:2869
#11 0x0114daed in Ffuncall (nargs=3, args=args <at> entry=0x82d9b0) at eval.c:2794
#12 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=<optimized out>, nargs <at> entry=4,
    args=<optimized out>, args <at> entry=0x82dde8) at bytecode.c:633
#13 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs <at> entry=4,
    arg_vector=arg_vector <at> entry=0x82dde8) at eval.c:2989
#14 0x0114da43 in Ffuncall (nargs=5, args=args <at> entry=0x82dde0) at eval.c:2808
#15 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=<optimized out>, nargs <at> entry=3,
    args=<optimized out>, args <at> entry=0x82e1b0) at bytecode.c:633
#16 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs <at> entry=3,
    arg_vector=arg_vector <at> entry=0x82e1b0) at eval.c:2989
#17 0x0114da43 in Ffuncall (nargs=4, args=args <at> entry=0x82e1a8) at eval.c:2808
#18 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=<optimized out>, nargs <at> entry=0,
    args=<optimized out>, args <at> entry=0x82e570) at bytecode.c:633
#19 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs <at> entry=0,
    arg_vector=arg_vector <at> entry=0x82e570) at eval.c:2989
#20 0x0114da43 in Ffuncall (nargs=nargs <at> entry=1, args=args <at> entry=0x82e568)
    at eval.c:2808
#21 0x0114de2d in Fapply (nargs=2, args=0x82e568) at eval.c:2377
#22 0x0114daed in Ffuncall (nargs=3, args=args <at> entry=0x82e560) at eval.c:2794
#23 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=<optimized out>, nargs <at> entry=0,
    args=<optimized out>, args <at> entry=0x82e8c0) at bytecode.c:633
#24 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs <at> entry=0,
    arg_vector=arg_vector <at> entry=0x82e8c0) at eval.c:2989
#25 0x0114da43 in Ffuncall (nargs=1, args=args <at> entry=0x82e8b8) at eval.c:2808
#26 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=<optimized out>, nargs <at> entry=3,
    args=<optimized out>, args <at> entry=0x82ed30) at bytecode.c:633
#27 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs <at> entry=3,
    arg_vector=arg_vector <at> entry=0x82ed30) at eval.c:2989
#28 0x0114da43 in Ffuncall (nargs=4, args=args <at> entry=0x82ed28) at eval.c:2808
#29 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=<optimized out>, nargs <at> entry=1,
    args=<optimized out>, args <at> entry=0x82f298) at bytecode.c:633
#30 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs <at> entry=1,
    arg_vector=arg_vector <at> entry=0x82f298) at eval.c:2989
#31 0x0114da43 in Ffuncall (nargs=nargs <at> entry=2, args=args <at> entry=0x82f290)
    at eval.c:2808
#32 0x0114906d in Ffuncall_interactively (nargs=2, args=0x82f290)
    at callint.c:254
#33 0x0114daed in Ffuncall (nargs=nargs <at> entry=3, args=args <at> entry=0x82f288)
    at eval.c:2794
#34 0x0114df22 in Fapply (nargs=nargs <at> entry=3, args=args <at> entry=0x82f288)
    at eval.c:2381
#35 0x0114afbb in Fcall_interactively (function=XIL(0x5f2c790),
    record_flag=<optimized out>, keys=XIL(0xa00000000759f578))
    at callint.c:342
#36 0x0114fc89 in funcall_subr (subr=<optimized out>,
    numargs=<optimized out>, numargs <at> entry=3, args=<optimized out>,
    args <at> entry=0x82f430) at eval.c:2872
#37 0x0114daed in Ffuncall (nargs=4, args=args <at> entry=0x82f428) at eval.c:2794
#38 0x0118ebe7 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=<optimized out>, nargs <at> entry=1,
    args=<optimized out>, args <at> entry=0x82f7b8) at bytecode.c:633
#39 0x0115134f in funcall_lambda (fun=<optimized out>, nargs=nargs <at> entry=1,
    arg_vector=arg_vector <at> entry=0x82f7b8) at eval.c:2989
#40 0x0114da43 in Ffuncall (nargs=nargs <at> entry=2, args=args <at> entry=0x82f7b0)
    at eval.c:2808
#41 0x0114dc1c in call1 (fn=XIL(0x3f30), arg1=XIL(0x5f2c790)) at eval.c:2654
#42 0x010d0efe in command_loop_1 () at keyboard.c:1463
#43 0x0114ca0f in internal_condition_case (
    bfun=bfun <at> entry=0x10d0a0e <command_loop_1>, handlers=XIL(0x90),
    hfun=hfun <at> entry=0x10c5049 <cmd_error>) at eval.c:1355
#44 0x010bdbda in command_loop_2 (ignore=XIL(0)) at keyboard.c:1091
#45 0x0114c996 in internal_catch (tag=XIL(0xdfb0),
    func=func <at> entry=0x10bdbb3 <command_loop_2>, arg=XIL(0)) at eval.c:1116
#46 0x010bdb5d in command_loop () at keyboard.c:1070
#47 0x010c4bf3 in recursive_edit_1 () at keyboard.c:714
#48 0x010c4f0c in Frecursive_edit () at keyboard.c:786
#49 0x0124a594 in main (argc=<optimized out>, argv=<optimized out>)
    at emacs.c:2054

Lisp Backtrace:
"princ" (0x82d9b8)
"rmail-new-summary-1" (0x82dde8)
"rmail-new-summary" (0x82e1b0)
"rmail-summary" (0x82e570)
"apply" (0x82e568)
"rmail-update-summary" (0x82e8c0)
"rmail-get-new-mail-1" (0x82ed30)
"rmail-get-new-mail" (0x82f298)
"funcall-interactively" (0x82f290)
"call-interactively" (0x82f430)
"command-execute" (0x82f7b8)
(gdb) fr 4
#4  0x010f073c in Fmarker_position (marker=XIL(0xa00000001ffac2c8))
    at marker.c:452
452       CHECK_MARKER (marker);
(gdb) up
#5  0x010edd34 in signal_before_change (preserve_ptr=0x0, end_int=276884,
    start_int=276884) at insdel.c:2179
2179          report_overlay_modification (FETCH_START, FETCH_END, 0,
(gdb) p current_buffer->overlays_before
$1 = (struct Lisp_Overlay *) 0x75ac520
(gdb) p *$
$2 = {
  header = {
    size = 1140854787
  },
  start = XIL(0xa0000000075ac4e0),
  end = XIL(0xa0000000075ac500),
  plist = XIL(0xc0000000077f2340),
  next = 0x0
}
(gdb) p/x $1->header.size
$3 = 0x44001003
(gdb) p current_buffer->name_
$4 = XIL(0x8000000007364540)
(gdb) xtype
Lisp_String
(gdb) xstring
$5 = (struct Lisp_String *) 0x7364540
"INBOX-summary"
(gdb) p current_buffer->overlays_before->start
$6 = XIL(0xa0000000075ac4e0)
(gdb) p *$
$7 = 1124081664
(gdb) p current_buffer->overlays_before->start
$8 = XIL(0xa0000000075ac4e0)
(gdb) xtype
Lisp_Vectorlike
PVEC_MARKER
(gdb) xmarker
$9 = (struct Lisp_Marker *) 0x75ac4e0
(gdb) p *$
$10 = {
  header = {
    size = 1124081664
  },
  buffer = 0x7519948,
  need_adjustment = 0,
  insertion_type = 0,
  next = 0x0,
  charpos = 1,
  bytepos = 1
}
(gdb) p current_buffer->overlays_before->next
$11 = (struct Lisp_Overlay *) 0x0
(gdb) p current_buffer->overlays_after
$12 = (struct Lisp_Overlay *) 0x0
(gdb) p rvoe_arg
$13 = {
  location = 0x15c9298 <globals+120>,
  errorp = false
}
(gdb) p rvoe_arg.location
$14 = (Lisp_Object *) 0x15c9298 <globals+120>
(gdb) p *rvoe_arg.location
$15 = XIL(0xc00000001646b9b0)
(gdb) xtype
Lisp_Cons
(gdb) xcar
$16 = 0x30
(gdb) xsymbol
$17 = (struct Lisp_Symbol *) 0x15ca210 <lispsym+48>
"t"
(gdb) p *rvoe_arg.location
$18 = XIL(0xc00000001646b9b0)
(gdb) xcdr
$19 = 0xc00000001646b9d0
(gdb) xtype
Lisp_Cons
(gdb) xcar
$20 = 0xd5c0
(gdb) xtype
Lisp_Symbol
(gdb) xsymbol
$21 = (struct Lisp_Symbol *) 0x15d77a0 <lispsym+54720>
"syntax-ppss-flush-cache"
(gdb) p *rvoe_arg.location
$22 = XIL(0xc00000001646b9b0)
(gdb) xcdr
$23 = 0xc00000001646b9d0
(gdb) xcdr
$24 = 0x0
(gdb) p start_marker
$25 = XIL(0xa00000001ffac2a8)
(gdb) xtype
Lisp_Vectorlike
Cannot access memory at address 0x1ffac2a8
(gdb) p/x start_marker
$26 = 0xa00000001ffac2a8
(gdb) xgettype $26
(gdb) p $type
$27 = Lisp_Vectorlike
(gdb) xvectype $26
Cannot access memory at address 0x1ffac2a8
(gdb) p/x ((struct Lisp_Vector *) $26)->header.size
warning: value truncated
Cannot access memory at address 0x1ffac2a8
(gdb) p/x ((struct Lisp_Vector *) $26)->header
warning: value truncated
Cannot access memory at address 0x1ffac2a8
(gdb) p/x ((struct Lisp_Vector *) $26)
warning: value truncated
$35 = 0x1ffac2a8
(gdb) p/x $26
$36 = 0xa00000001ffac2a8
(gdb) p/x ((struct Lisp_Vector *)0x1ffac2a8
A syntax error in expression, near `'.
(gdb) p/x ((struct Lisp_Vector *)0x1ffac2a8)
$37 = 0x1ffac2a8
(gdb) p/x *((struct Lisp_Vector *)0x1ffac2a8)
Cannot access memory at address 0x1ffac2a8
(gdb) p/x end_marker
$38 = 0xa00000001ffac2c8
(gdb) xtype
Lisp_Vectorlike
Cannot access memory at address 0x1ffac2a8
(gdb) p/x ((struct Lisp_Vector *)0x1ffac2c8)->header
Cannot access memory at address 0x1ffac2c8
(gdb) p Vfirst_change_hook
$39 = XIL(0)
(gdb) p current_buffer->text->markers
$40 = (struct Lisp_Marker *) 0x76353a0
(gdb) p *$
$41 = {
  header = {
    size = 1124081664
  },
  buffer = 0x7519948,
  need_adjustment = 0,
  insertion_type = 0,
  next = 0x76353e0,
  charpos = 1,
  bytepos = 1
}
(gdb) p current_buffer->text->markers->next
$42 = (struct Lisp_Marker *) 0x76353e0
(gdb) p *$
$43 = {
  header = {
    size = 1124081664
  },
  buffer = 0x7519948,
  need_adjustment = 0,
  insertion_type = 0,
  next = 0x7635420,
  charpos = 1,
  bytepos = 1
}
(gdb) p current_buffer->text->markers->next->next
$44 = (struct Lisp_Marker *) 0x7635420
(gdb) p *$
$45 = {
  header = {
    size = 1124081664
  },
  buffer = 0x7519948,
  need_adjustment = 0,
  insertion_type = 0,
  next = 0x16b6a5d0,
  charpos = 1,
  bytepos = 1
}
(gdb) p current_buffer->text->markers->next->next->next
$46 = (struct Lisp_Marker *) 0x16b6a5d0
(gdb) p *$
$47 = {
  header = {
    size = 1124081664
  },
  buffer = 0x7519948,
  need_adjustment = 0,
  insertion_type = 0,
  next = 0x16b6a5b0,
  charpos = 1,
  bytepos = 1
}
(gdb) p/x start_marker
$98 = 0xa00000001ffac2c8
(gdb) pp *rvoe_arg.location
(t syntax-ppss-flush-cache)
(gdb) p last_mar
last_marked        last_marked_index
(gdb) p last_marked_index
$99 = 498
(gdb) p last_marked[497]
$100 = XIL(0x439c370)
(gdb) xtype
Lisp_Vectorlike
Cannot access memory at address 0x1ffac2a8
(gdb) find /g1 &last_marked[0], last_marked[last_marked_index-1], 0xa00000001ffac2a8
Pattern not found.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 10:36:01 GMT) Full text and rfc822 format available.

Message #236 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 29 May 2020 10:34:20 +0000
On Fri, May 29, 2020 at 10:16 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > Date: Fri, 22 May 2020 10:22:56 +0300
> > From: Eli Zaretskii <eliz <at> gnu.org>
> > Cc: 41321 <at> debbugs.gnu.org
> >
> > > > I'm already running with such a breakpoint, let's how it will catch
> > > > something.                                        ^^^
> > >
> > > Should have been "hope".  Sorry.
> >
> > It happened again, and now insert-file-contents wasn't involved, so I
> > guess it's off the hook.  The command which triggered the problem was
> > self-insert-command, as shown in the backtrace below.  The problem
> > seems to be with handling overlays when buffer text changes.
>
> One more segfault very similar to the last one I reported: it happened
> when calling report_overlay_modification due to text being inserted
> into a buffer.

Everything looks consistent with the bug I described.

> . There's nothing wrong with GDB's xtype command: it fails when a Lisp
>   object encodes a pointer to invalid memory:

(gdb) p last_marked[497]
$100 = XIL(0x439c370)
(gdb) xtype
Lisp_Vectorlike
Cannot access memory at address 0x1ffac2a8

Again, that can't be right. $100 is a Lisp_Symbol, not a vectorlike,
and it's not at address 0x1ffac2a8.

So my suspicion remains that this is a gdb bug, and it appears to be a
reproducible one!

> . So we should
>   see whether fixing the LISP_ALIGNMENT vs GCALIGNMENT discrepancy
>   fixes this problem.

I concur.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 10:56:01 GMT) Full text and rfc822 format available.

Message #239 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 29 May 2020 13:55:22 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 29 May 2020 10:34:20 +0000
> Cc: Paul Eggert <eggert <at> cs.ucla.edu>, Stefan Monnier <monnier <at> iro.umontreal.ca>, 41321 <at> debbugs.gnu.org
> 
> > . There's nothing wrong with GDB's xtype command: it fails when a Lisp
> >   object encodes a pointer to invalid memory:
> 
> (gdb) p last_marked[497]
> $100 = XIL(0x439c370)
> (gdb) xtype
> Lisp_Vectorlike
> Cannot access memory at address 0x1ffac2a8
> 
> Again, that can't be right. $100 is a Lisp_Symbol, not a vectorlike,
> and it's not at address 0x1ffac2a8.
> 
> So my suspicion remains that this is a gdb bug, and it appears to be a
> reproducible one!

There's no bug: the $size variable was not updated inside pvectype
because the 'set' command tried to access invalid memory.  So the rest
is using the stale value of $size.  Puff! no miracle and no bug.

You just don't need to assign too much importance to the address the
error message displays, it might not be the problematic address.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 11:49:02 GMT) Full text and rfc822 format available.

Message #242 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 29 May 2020 11:47:46 +0000
On Fri, May 29, 2020 at 10:55 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Fri, 29 May 2020 10:34:20 +0000
> > Cc: Paul Eggert <eggert <at> cs.ucla.edu>, Stefan Monnier <monnier <at> iro.umontreal.ca>, 41321 <at> debbugs.gnu.org
> >
> > > . There's nothing wrong with GDB's xtype command: it fails when a Lisp
> > >   object encodes a pointer to invalid memory:
> >
> > (gdb) p last_marked[497]
> > $100 = XIL(0x439c370)
> > (gdb) xtype
> > Lisp_Vectorlike
> > Cannot access memory at address 0x1ffac2a8
> >
> > Again, that can't be right. $100 is a Lisp_Symbol, not a vectorlike,
> > and it's not at address 0x1ffac2a8.
> >
> > So my suspicion remains that this is a gdb bug, and it appears to be a
> > reproducible one!
>
> There's no bug:

I believe there is.

> the $size variable was not updated inside pvectype
> because the 'set' command tried to access invalid memory.

Why would pvectype be called at all? xtype should have said
"Lisp_Symbol", not "Lisp_Vectorlike", and never gotten to pvectype at
all.

Feel free to try that, in a fresh GDB session:

p 0x439c370
xtype

> So the rest
> is using the stale value of $size.  Puff! no miracle and no bug.

Which rest? There's no message after "Cannot access memory at address
0x1ffac2a8"

> You just don't need to assign too much importance to the address the
> error message displays, it might not be the problematic address.

Or there might not be a problematic address, because xtype is somehow
using the value of $ which it used when it encountered the initial bug
even for subsequent calls. It doesn't do that here.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 13:54:02 GMT) Full text and rfc822 format available.

Message #245 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 29 May 2020 16:52:54 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 29 May 2020 11:47:46 +0000
> Cc: eggert <at> cs.ucla.edu, Stefan Monnier <monnier <at> iro.umontreal.ca>, 41321 <at> debbugs.gnu.org
> 
> > There's no bug:
> 
> I believe there is.
> 
> > the $size variable was not updated inside pvectype
> > because the 'set' command tried to access invalid memory.
> 
> Why would pvectype be called at all? xtype should have said
> "Lisp_Symbol", not "Lisp_Vectorlike", and never gotten to pvectype at
> all.

Look at what xtype does, and you will see.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 14:21:01 GMT) Full text and rfc822 format available.

Message #248 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 29 May 2020 14:19:04 +0000
On Fri, May 29, 2020 at 1:53 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Fri, 29 May 2020 11:47:46 +0000
> > Cc: eggert <at> cs.ucla.edu, Stefan Monnier <monnier <at> iro.umontreal.ca>, 41321 <at> debbugs.gnu.org
> >
> > > There's no bug:
> >
> > I believe there is.
> >
> > > the $size variable was not updated inside pvectype
> > > because the 'set' command tried to access invalid memory.
> >
> > Why would pvectype be called at all? xtype should have said
> > "Lisp_Symbol", not "Lisp_Vectorlike", and never gotten to pvectype at
> > all.
>
> Look at what xtype does, and you will see.

So you think it's a bug in xtype?

The relevant definitions are:

define xtype
  xgettype $
  output $type
  echo \n
  if $type == Lisp_Vectorlike
    xvectype
  end
end

define xgettype
  if (CHECK_LISP_OBJECT_TYPE)
    set $bugfix = $arg0.i
  else
    set $bugfix = $arg0
  end
  set $type = (enum Lisp_Type) (USE_LSB_TAG ? (EMACS_INT) $bugfix & (1
<< GCTYPEBITS) - 1 : (EMACS_UINT) $bugfix >> VALBITS)
end

Both look fine to me: xtype calls xgettype (not xvectype), which sets
$type to the type bits, then outputs them. But the bug must have
happened by then, because what's output is "Lisp_Vectorlike" even
though $ is a Lisp_Symbol. I fail to see how xvectype and pvectype are
relevant at all...




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 18:32:01 GMT) Full text and rfc822 format available.

Message #251 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>, Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Fri, 29 May 2020 11:31:05 -0700
On 5/29/20 2:43 AM, Pip Cet wrote:
> As I said, the code is tricky (i.e. might contain bugs that can only
> be discovered through extensive testing on 32-bit systems), and it
> complicates what should be generic functions for the rbtree
> implementation, so this is probably a 32-bit optimization that is too
> late because 32-bit systems are no longer that relevant...

At least at first, it may make more sense to keep the red-black trees as-is, and
to look up what appear to be symbol-tagged pointers twice, once as-is (to find
any kind of object) and once offset by '(char *) lispsym - Lisp_Symbol' (to find
only symbols). Although a bit slower, this won't require any changes to the
rbtree code so it's cleaner. We can then time the optimization you have in mind,
to see whether it's worth doing.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 18:39:01 GMT) Full text and rfc822 format available.

Message #254 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 29 May 2020 18:37:42 +0000
On Fri, May 29, 2020 at 6:31 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 5/29/20 2:43 AM, Pip Cet wrote:
> > As I said, the code is tricky (i.e. might contain bugs that can only
> > be discovered through extensive testing on 32-bit systems), and it
> > complicates what should be generic functions for the rbtree
> > implementation, so this is probably a 32-bit optimization that is too
> > late because 32-bit systems are no longer that relevant...
>
> At least at first, it may make more sense to keep the red-black trees as-is, and
> to look up what appear to be symbol-tagged pointers twice, once as-is (to find
> any kind of object) and once offset by '(char *) lispsym - Lisp_Symbol' (to find
> only symbols).

Having had some time to think about this, I agree. I'm certainly not
very confident in that code.

But the main reason is that it's not an optimization in all
circumstances: if you have a very large vector, and a symbol block
aliasing it as symbol offsets goes away, you have to search for other
symbol blocks with that property, which might take a long time.

However, I wonder what you mean by "what appear to be symbol-tagged
pointers"? Surely we need to look up all pointers twice, no matter
what their tag is, since they might be a reference to something inside
the struct Lisp_Symbol.

Of course, on 64-bit machines, this line of code would usually save us
the trouble:

  if (start < min_heap_address || start > max_heap_address)
    return MEM_NIL;

So that's another reason to leave the code as it is for now.

> Although a bit slower, this won't require any changes to the
> rbtree code so it's cleaner.

> We can then time the optimization you have in mind, to see whether it's worth doing.

... or something simpler that might actually work better :-)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 19:33:02 GMT) Full text and rfc822 format available.

Message #257 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Fri, 29 May 2020 12:32:05 -0700
On 5/29/20 11:37 AM, Pip Cet wrote:
> if you have a very large vector, and a symbol block
> aliasing it as symbol offsets goes away, you have to search for other
> symbol blocks with that property, which might take a long time.

It shouldn't be that bad, because when you are worrying about symbols offset by
'lispsym', you need to look only for symbol blocks; it won't matter if these
values appear to point into a vector because you won't follow them in that case.

> However, I wonder what you mean by "what appear to be symbol-tagged
> pointers"? Surely we need to look up all pointers twice, no matter
> what their tag is, since they might be a reference to something inside
> the struct Lisp_Symbol.

What I was trying to say is that if a pointer lacks the symbol tag, then we
needn't worry about it being offset by 'lispsym'. These pointers need to be
looked up only once, even if they happen to be pointers into a struct
Lisp_Symbol. We can safely assume that a compiler won't take a Lisp_Object that
is a symbol, and add a small offset to it without also adding 'lispsym'.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 19:39:02 GMT) Full text and rfc822 format available.

Message #260 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 29 May 2020 19:37:29 +0000
On Fri, May 29, 2020 at 7:32 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 5/29/20 11:37 AM, Pip Cet wrote:
> > if you have a very large vector, and a symbol block
> > aliasing it as symbol offsets goes away, you have to search for other
> > symbol blocks with that property, which might take a long time.
>
> It shouldn't be that bad, because when you are worrying about symbols offset by
> 'lispsym', you need to look only for symbol blocks; it won't matter if these
> values appear to point into a vector because you won't follow them in that case.

You mean it shouldn't be that bad with the existing code? You're probably right.

It would have been very bad with the code I posted though, so best ignore that.

> > However, I wonder what you mean by "what appear to be symbol-tagged
> > pointers"? Surely we need to look up all pointers twice, no matter
> > what their tag is, since they might be a reference to something inside
> > the struct Lisp_Symbol.
>
> What I was trying to say is that if a pointer lacks the symbol tag, then we
> needn't worry about it being offset by 'lispsym'. These pointers need to be
> looked up only once, even if they happen to be pointers into a struct
> Lisp_Symbol. We can safely assume that a compiler won't take a Lisp_Object that
> is a symbol, and add a small offset to it without also adding 'lispsym'.

Oh! You're right, of course. How silly of me not to realize.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 20:26:01 GMT) Full text and rfc822 format available.

Message #263 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Fri, 29 May 2020 13:24:55 -0700
[Message part 1 (text/plain, inline)]
On 5/28/20 11:19 PM, Eli Zaretskii wrote:
>> -  return (uintptr_t) p % LISP_ALIGNMENT == 0;
>> +  return (uintptr_t) p % GCALIGNMENT == 0;
>>  }
> ...replacing LISP_ALIGNMENT with GCALIGNMENT just here doesn't sound
> right to me: by keeping the current value of LISP_ALIGNMENT, we
> basically declare that Lisp objects shall be aligned on that boundary,
> whereas that isn't really the case.  Why not change the value of
> LISP_ALIGNMENT instead?

There are really two bugs here.

1. The idea of taking the address modulo LISP_ALIGNMENT is wrong, as a pointer
can point into the middle of (say) a pseudovector and not be
LISP_ALIGNMENT-aligned. Replacing LISP_ALIGNMENT with GCALIGNMENT does not fix
this bug in general, because such a pointer might not be GCALIGNMENT-aligned
either. This bug can cause crashes because it causes GC to think an object is
garbage when it's not garbage.

2. LISP_ALIGNMENT is too large on MinGW and some other platforms.

The patch I sent earlier attempted to be the simplest patch that would fix the
bug you observed on MinGW, which is a special case of (1). It does not attempt
to fix all plausible cases of (1), nor does it address (2).

We can fix these two bugs separately, by installing the attached patches into
emacs-27. The first patch fixes (1) and thus fixes the crash along with other
plausible crashes. The second one fixes (2), and this fixes the MinGW crash in a
different way but does not fix the crash on other plausible platforms. (1)
probably has better performance than (2), though I doubt whether users will notice.
[0001-Remove-maybe_lisp_pointer.patch (text/x-patch, attachment)]
[0002-Don-t-overalign-Lisp-objects.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 20:28:02 GMT) Full text and rfc822 format available.

Message #266 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Pip Cet <pipcet <at> gmail.com>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Fri, 29 May 2020 16:26:59 -0400
> What I was trying to say is that if a pointer lacks the symbol tag, then we
> needn't worry about it being offset by 'lispsym'. These pointers need to be
> looked up only once, even if they happen to be pointers into a struct
> Lisp_Symbol. We can safely assume that a compiler won't take a Lisp_Object that
> is a symbol, and add a small offset to it without also adding 'lispsym'.

I don't think that true.

The original problematic case is for wide-int where a 64bit Lisp_Object
containing a symbol is split into a 32bit tag saying "this is a symbol"
and a 32bit pointer to which an offset has been added.

So when we encounter a 32bit word on the stack, it may be a "plain
pointer" or it may be the 32bit of a pointer to a symbol with an
offset applied but we can't tell which it is because we don't have the
tag at that point.


        Stefan "looking forward to bignums replacing wide-ints"





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 20:41:02 GMT) Full text and rfc822 format available.

Message #269 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Pip Cet <pipcet <at> gmail.com>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Fri, 29 May 2020 13:40:33 -0700
On 5/29/20 1:26 PM, Stefan Monnier wrote:

> The original problematic case is for wide-int where a 64bit Lisp_Object
> containing a symbol is split into a 32bit tag saying "this is a symbol"
> and a 32bit pointer to which an offset has been added.
> 
> So when we encounter a 32bit word on the stack, it may be a "plain
> pointer" or it may be the 32bit of a pointer to a symbol with an
> offset applied but we can't tell which it is because we don't have the
> tag at that point.

Oh, you're right. Thanks, I was thinking only of the USE_LSB_TAG case.

For the !USE_LSB_TAG case, we should check whether the word is aligned for
'struct Lisp_Symbol', not whether it has the Lisp_Symbol tag, when deciding
quickly whether to add 'lispsym' and then do the second rbtree lookup. Something
like this:

  (USE_LSB_TAG
   ? (uintptr_t) word % GCALIGNMENT == Lisp_Symbol
   : (uintptr_t) word % alignof (struct Lisp_Symbol) == 0)

I'll fold this idea into the next iteration of the patch I'm working on.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Fri, 29 May 2020 21:03:02 GMT) Full text and rfc822 format available.

Message #272 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Fri, 29 May 2020 21:01:39 +0000
On Fri, May 29, 2020 at 8:24 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 5/28/20 11:19 PM, Eli Zaretskii wrote:
> >> -  return (uintptr_t) p % LISP_ALIGNMENT == 0;
> >> +  return (uintptr_t) p % GCALIGNMENT == 0;
> >>  }
> > ...replacing LISP_ALIGNMENT with GCALIGNMENT just here doesn't sound
> > right to me: by keeping the current value of LISP_ALIGNMENT, we
> > basically declare that Lisp objects shall be aligned on that boundary,
> > whereas that isn't really the case.  Why not change the value of
> > LISP_ALIGNMENT instead?
>
> There are really two bugs here.
>
> 1. The idea of taking the address modulo LISP_ALIGNMENT is wrong, as a pointer
> can point into the middle of (say) a pseudovector and not be
> LISP_ALIGNMENT-aligned. Replacing LISP_ALIGNMENT with GCALIGNMENT does not fix
> this bug in general, because such a pointer might not be GCALIGNMENT-aligned
> either. This bug can cause crashes because it causes GC to think an object is
> garbage when it's not garbage.
>
> 2. LISP_ALIGNMENT is too large on MinGW and some other platforms.
>
> The patch I sent earlier attempted to be the simplest patch that would fix the
> bug you observed on MinGW, which is a special case of (1).

I'm not convinced. I think Eli only observed (2). There were no
pointers into the middle of pseudovectors in his backtrace or
disassembly...

> It does not attempt
> to fix all plausible cases of (1), nor does it address (2).

It does address (2). It doesn't address all cases of (1).

> We can fix these two bugs separately, by installing the attached patches into
> We can fix these two bugs separately, by installing the attached patches into
> emacs-27. The first patch fixes (1) and thus fixes the crash along with other
> plausible crashes. The second one fixes (2), and this fixes the MinGW crash in a
> different way but does not fix the crash on other plausible platforms. (1)
> probably has better performance than (2), though I doubt whether users will notice.

(1) says:
It’s an invalid optimization, since pointers can address the
middle of Lisp_Object data.

That may be true (we still haven't observed it), but it's not what
happened in Eli's case: in that case, the "pointer" was actually the
lower half of a Lisp_Object, so it pointed at the beginning of a
struct Lisp_Vector. That just happened to be misaligned.

(2) has this comment:
+/* Alignment needed for memory blocks that are allocated via malloc
+   and that contain Lisp objects.  On typical hosts malloc already
+   aligns sufficiently, but extra work is needed on oddball hosts
+   where Emacs would crash if malloc returned a non-GCALIGNED pointer.  */

I can't make sense of that comment. It describes two problems that
don't happen, and omits the problem that does happen.
1. malloc() % GCALIGNMENT != 0. Never happens, as far as I can tell.
2. A Lisp object requires greater alignment than malloc() gives it.
IIRC, there was at least one RISC architecture whose specification
supported atomic operations only on the first word in each
32-byte-aligned block, but that's such a rare case (and wasn't true
for the silicon implementations, I seem to recall) that it seems silly
to worry about it today.

I'm not saying it's the best solution, but I would prefer simply
defining LISP_ALIGNMENT to be 8 to either patch.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 05:51:01 GMT) Full text and rfc822 format available.

Message #275 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 08:50:18 +0300
> Cc: pipcet <at> gmail.com, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Fri, 29 May 2020 13:24:55 -0700
> 
> There are really two bugs here.
> 
> 1. The idea of taking the address modulo LISP_ALIGNMENT is wrong, as a pointer
> can point into the middle of (say) a pseudovector and not be
> LISP_ALIGNMENT-aligned. Replacing LISP_ALIGNMENT with GCALIGNMENT does not fix
> this bug in general, because such a pointer might not be GCALIGNMENT-aligned
> either. This bug can cause crashes because it causes GC to think an object is
> garbage when it's not garbage.
> 
> 2. LISP_ALIGNMENT is too large on MinGW and some other platforms.
> 
> The patch I sent earlier attempted to be the simplest patch that would fix the
> bug you observed on MinGW, which is a special case of (1). It does not attempt
> to fix all plausible cases of (1), nor does it address (2).
> 
> We can fix these two bugs separately, by installing the attached patches into
> emacs-27. The first patch fixes (1) and thus fixes the crash along with other
> plausible crashes. The second one fixes (2), and this fixes the MinGW crash in a
> different way but does not fix the crash on other plausible platforms. (1)
> probably has better performance than (2), though I doubt whether users will notice.

Since (1) is for now purely theoretical (and rare even in that
theoretical case), I'd like to see (2) applied to emacs-27.  Let's do
that soon, as I'd like to have another pretest in the near future.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 05:53:02 GMT) Full text and rfc822 format available.

Message #278 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 08:51:49 +0300
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: Pip Cet <pipcet <at> gmail.com>,  Eli Zaretskii <eliz <at> gnu.org>,
>   41321 <at> debbugs.gnu.org
> Date: Fri, 29 May 2020 16:26:59 -0400
> 
>         Stefan "looking forward to bignums replacing wide-ints"

Why? so that Emacs could be slower still?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 05:55:01 GMT) Full text and rfc822 format available.

Message #281 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 08:54:25 +0300
> Cc: Pip Cet <pipcet <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>,
>  41321 <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Fri, 29 May 2020 13:40:33 -0700
> 
>   (USE_LSB_TAG
>    ? (uintptr_t) word % GCALIGNMENT == Lisp_Symbol
>    : (uintptr_t) word % alignof (struct Lisp_Symbol) == 0)

I don't understand how this will work, given that Lisp object on the
stack can be pushed as 2 non-contiguous 32-bit words.  Can you
explain?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 05:59:02 GMT) Full text and rfc822 format available.

Message #284 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 30 May 2020 08:58:05 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 29 May 2020 21:01:39 +0000
> Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org, 
> 	Stefan Monnier <monnier <at> iro.umontreal.ca>
> 
> (2) has this comment:
> +/* Alignment needed for memory blocks that are allocated via malloc
> +   and that contain Lisp objects.  On typical hosts malloc already
> +   aligns sufficiently, but extra work is needed on oddball hosts
> +   where Emacs would crash if malloc returned a non-GCALIGNED pointer.  */
> 
> I can't make sense of that comment. It describes two problems that
> don't happen, and omits the problem that does happen.
> 1. malloc() % GCALIGNMENT != 0. Never happens, as far as I can tell.
> 2. A Lisp object requires greater alignment than malloc() gives it.
> IIRC, there was at least one RISC architecture whose specification
> supported atomic operations only on the first word in each
> 32-byte-aligned block, but that's such a rare case (and wasn't true
> for the silicon implementations, I seem to recall) that it seems silly
> to worry about it today.
> 
> I'm not saying it's the best solution, but I would prefer simply
> defining LISP_ALIGNMENT to be 8 to either patch.

I agree, but patch 2 basically does that, so I'm okay with saying "8"
in so many words.

Btw, can someone remind me why we started requiring non-default
alignment from Lisp objects?

Also, given the fact that in the crashing case the 2 32-bit parts of a
Lisp object were pushed onto the stack non-contiguously, will fixing
the alignment alone cause those Lisp objects to be marked?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 07:21:01 GMT) Full text and rfc822 format available.

Message #287 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 30 May 2020 07:19:18 +0000
On Sat, May 30, 2020 at 5:58 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Fri, 29 May 2020 21:01:39 +0000
> > Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
> >       Stefan Monnier <monnier <at> iro.umontreal.ca>
> >
> > (2) has this comment:
> > +/* Alignment needed for memory blocks that are allocated via malloc
> > +   and that contain Lisp objects.  On typical hosts malloc already
> > +   aligns sufficiently, but extra work is needed on oddball hosts
> > +   where Emacs would crash if malloc returned a non-GCALIGNED pointer.  */
> >
> > I can't make sense of that comment. It describes two problems that
> > don't happen, and omits the problem that does happen.
> > 1. malloc() % GCALIGNMENT != 0. Never happens, as far as I can tell.
> > 2. A Lisp object requires greater alignment than malloc() gives it.
> > IIRC, there was at least one RISC architecture whose specification
> > supported atomic operations only on the first word in each
> > 32-byte-aligned block, but that's such a rare case (and wasn't true
> > for the silicon implementations, I seem to recall) that it seems silly
> > to worry about it today.
> >
> > I'm not saying it's the best solution, but I would prefer simply
> > defining LISP_ALIGNMENT to be 8 to either patch.
>
> I agree, but patch 2 basically does that, so I'm okay with saying "8"
> in so many words.

Okay.

> Btw, can someone remind me why we started requiring non-default
> alignment from Lisp objects?

max_align_t was changed to include a float128 type, and
alignof(float128) == 16 on x86, even though virtually all x86 systems
are configured to allow unaligned accesses.

If I understand Paul's concerns correctly, he believes it's possible a
system will once again come into use in which atomic accesses only
work for offsets aligned to, say, 32 bytes. Since pthread variables
require atomic accesses, such a platform would see weird crashes if a
pthread inside a Lisp_Vector wasn't aligned to 32 bytes.

Of course, it remains to be seen/checked whether any such system would
actually define max_align_t to have an alignment of 32, since it
covers only primitive types.

> Also, given the fact that in the crashing case the 2 32-bit parts of a
> Lisp object were pushed onto the stack non-contiguously, will fixing
> the alignment alone cause those Lisp objects to be marked?

Yes. The lower 32-bit part was ignored because its value wasn't
16-byte aligned, not because its stack location wasn't 8-byte aligned.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 09:09:01 GMT) Full text and rfc822 format available.

Message #290 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 30 May 2020 12:08:35 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Sat, 30 May 2020 07:19:18 +0000
> Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org, 
> 	Stefan Monnier <monnier <at> iro.umontreal.ca>
> 
> > Btw, can someone remind me why we started requiring non-default
> > alignment from Lisp objects?
> 
> max_align_t was changed to include a float128 type, and
> alignof(float128) == 16 on x86, even though virtually all x86 systems
> are configured to allow unaligned accesses.

I understand that part, but my question was why, even before the
change in max_align_t, did we start requiring 8-byte alignment on
systems where that is not automatically guaranteed?

> If I understand Paul's concerns correctly, he believes it's possible a
> system will once again come into use in which atomic accesses only
> work for offsets aligned to, say, 32 bytes. Since pthread variables
> require atomic accesses, such a platform would see weird crashes if a
> pthread inside a Lisp_Vector wasn't aligned to 32 bytes.

So this alignment requirement is only due to pthreads being used?  But
MinGW doesn't use pthreads.

> > Also, given the fact that in the crashing case the 2 32-bit parts of a
> > Lisp object were pushed onto the stack non-contiguously, will fixing
> > the alignment alone cause those Lisp objects to be marked?
> 
> Yes. The lower 32-bit part was ignored because its value wasn't
> 16-byte aligned, not because its stack location wasn't 8-byte aligned.

Right, but I'm talking about marking.  AFAIU, when scanning the stack
finds a value that looks like a Lisp object, we mark that object.  If
the two 32-bit parts of the object are non-contiguous, will we be able
to recognize such an object, and will we be able to mark it correctly,
and if so, how?  IOW, don't we need the upper 32-bit (which encodes
the object type) for the purposes of marking it?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 11:08:01 GMT) Full text and rfc822 format available.

Message #293 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 30 May 2020 11:06:52 +0000
On Sat, May 30, 2020 at 9:08 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Sat, 30 May 2020 07:19:18 +0000
> > Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org,
> >       Stefan Monnier <monnier <at> iro.umontreal.ca>
> >
> > > Btw, can someone remind me why we started requiring non-default
> > > alignment from Lisp objects?
> >
> > max_align_t was changed to include a float128 type, and
> > alignof(float128) == 16 on x86, even though virtually all x86 systems
> > are configured to allow unaligned accesses.
>
> I understand that part, but my question was why, even before the
> change in max_align_t, did we start requiring 8-byte alignment on
> systems where that is not automatically guaranteed?

I don't know. As I said, I think that was always buggy on pdumper
systems, though the bug was very subtle. My guess is it predates
pdumper, at which time it was a valid optimization.

> > If I understand Paul's concerns correctly, he believes it's possible a
> > system will once again come into use in which atomic accesses only
> > work for offsets aligned to, say, 32 bytes. Since pthread variables
> > require atomic accesses, such a platform would see weird crashes if a
> > pthread inside a Lisp_Vector wasn't aligned to 32 bytes.
>
> So this alignment requirement is only due to pthreads being used?

I'm not sure what you're asking. Obviously there are systems on which
unaligned accesses will fault or be very slow indeed, so we need to
make sure, say, pure space allocations are aligned somehow. That
requires a LISP_ALIGNMENT of 8. Everything beyond that is only for
performance, pthreads, and SIMD types.

> > > Also, given the fact that in the crashing case the 2 32-bit parts of a
> > > Lisp object were pushed onto the stack non-contiguously, will fixing
> > > the alignment alone cause those Lisp objects to be marked?
> >
> > Yes. The lower 32-bit part was ignored because its value wasn't
> > 16-byte aligned, not because its stack location wasn't 8-byte aligned.
>
> Right, but I'm talking about marking.  AFAIU, when scanning the stack
> finds a value that looks like a Lisp object, we mark that object.

And if we find a value that looks like a pointer to a Lisp structure,
as the lower half of a non-symbol Lisp_Object does, we mark the
corresponding Lisp object.

> If
> the two 32-bit parts of the object are non-contiguous, will we be able
> to recognize such an object, and will we be able to mark it correctly,
> and if so, how?  IOW, don't we need the upper 32-bit (which encodes
> the object type) for the purposes of marking it?

For everything but symbols, we don't, mark_maybe_pointer called on the
low 32 bits suffices. For symbols, mark_maybe_pointer needs to be
changed to also check the pointer at <low 32-bit word> + &lispsym.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 11:32:02 GMT) Full text and rfc822 format available.

Message #296 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 30 May 2020 14:31:02 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Sat, 30 May 2020 11:06:52 +0000
> Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org, 
> 	Stefan Monnier <monnier <at> iro.umontreal.ca>
> 
> > I understand that part, but my question was why, even before the
> > change in max_align_t, did we start requiring 8-byte alignment on
> > systems where that is not automatically guaranteed?
> 
> I don't know. As I said, I think that was always buggy on pdumper
> systems, though the bug was very subtle. My guess is it predates
> pdumper, at which time it was a valid optimization.

How is pdumper involved here?

> > So this alignment requirement is only due to pthreads being used?
> 
> I'm not sure what you're asking. Obviously there are systems on which
> unaligned accesses will fault or be very slow indeed, so we need to
> make sure, say, pure space allocations are aligned somehow. That
> requires a LISP_ALIGNMENT of 8. Everything beyond that is only for
> performance, pthreads, and SIMD types.

If the system guarantees 4-byte alignment from malloc (and/or a
similar alignment of the runtime C stack), then using that doesn't
trigger problems related to unaligned accesses, right?  So let me
rephrase: why isn't 4-byte alignment "good enough" for us on systems
where malloc and the runtime stack are guaranteed to be thus aligned?

> > If
> > the two 32-bit parts of the object are non-contiguous, will we be able
> > to recognize such an object, and will we be able to mark it correctly,
> > and if so, how?  IOW, don't we need the upper 32-bit (which encodes
> > the object type) for the purposes of marking it?
> 
> For everything but symbols, we don't, mark_maybe_pointer called on the
> low 32 bits suffices. For symbols, mark_maybe_pointer needs to be
> changed to also check the pointer at <low 32-bit word> + &lispsym.

Right, that's what I thought.  So this issue also has to be fixed on
emacs-27 in order for us to provide a stable Emacs 27.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 13:31:02 GMT) Full text and rfc822 format available.

Message #299 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 30 May 2020 13:29:33 +0000
[Message part 1 (text/plain, inline)]
On Sat, May 30, 2020 at 11:31 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Sat, 30 May 2020 11:06:52 +0000
> > Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org,
> >       Stefan Monnier <monnier <at> iro.umontreal.ca>
> >
> > > I understand that part, but my question was why, even before the
> > > change in max_align_t, did we start requiring 8-byte alignment on
> > > systems where that is not automatically guaranteed?
> >
> > I don't know. As I said, I think that was always buggy on pdumper
> > systems, though the bug was very subtle. My guess is it predates
> > pdumper, at which time it was a valid optimization.
>
> How is pdumper involved here?

See the pdumper issue I described above. I can't imagine this being a
significant bug, because it needs the sole surviving reference to a
pdumper object to be on the stack, while simultaneously being the key
in a weak-key hash table cell...

> > > So this alignment requirement is only due to pthreads being used?
> >
> > I'm not sure what you're asking. Obviously there are systems on which
> > unaligned accesses will fault or be very slow indeed, so we need to
> > make sure, say, pure space allocations are aligned somehow. That
> > requires a LISP_ALIGNMENT of 8. Everything beyond that is only for
> > performance, pthreads, and SIMD types.
>
> If the system guarantees 4-byte alignment from malloc (and/or a
> similar alignment of the runtime C stack), then using that doesn't
> trigger problems related to unaligned accesses, right?  So let me
> rephrase: why isn't 4-byte alignment "good enough" for us on systems
> where malloc and the runtime stack are guaranteed to be thus aligned?

(The runtime stack isn't relevant, as far as I can tell, since we walk
that in 4-byte steps on such systems anyway.)

You're correct that on such a system, we could get away with a
LISP_ALIGNMENT of 4, but a LISP_ALIGNMENT of 8 wouldn't hurt either.

> > > If
> > > the two 32-bit parts of the object are non-contiguous, will we be able
> > > to recognize such an object, and will we be able to mark it correctly,
> > > and if so, how?  IOW, don't we need the upper 32-bit (which encodes
> > > the object type) for the purposes of marking it?
> >
> > For everything but symbols, we don't, mark_maybe_pointer called on the
> > low 32 bits suffices. For symbols, mark_maybe_pointer needs to be
> > changed to also check the pointer at <low 32-bit word> + &lispsym.
>
> Right, that's what I thought.  So this issue also has to be fixed on
> emacs-27 in order for us to provide a stable Emacs 27.

I'm surprised, but glad that you think so. Patch for emacs-27 attached.
[0001-Be-more-aggressive-in-marking-objects-during-GC-bug-.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 14:28:01 GMT) Full text and rfc822 format available.

Message #302 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 10:26:53 -0400
>>         Stefan "looking forward to bignums replacing wide-ints"
> Why? so that Emacs could be slower still?

Well, if performance is a serious problem, then maybe "bignums replacing
wide-ints" will never happen.  IOW the above assumes that we can make
them work as fast if not faster (more specifically, using bignums
should(!?) result is better performance in buffers <512MB, while it will
indeed likely result is worse performance in buffers bigger than that).


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 16:32:02 GMT) Full text and rfc822 format available.

Message #305 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 09:31:49 -0700
On 5/29/20 2:01 PM, Pip Cet wrote:

> (1) says:
> It’s an invalid optimization, since pointers can address the
> middle of Lisp_Object data.
> 
> That may be true (we still haven't observed it),

I observed it earlier, in code that iterated through a Lisp vector; at the
machine level the only pointer was into the middle of that vector. Addresses of
Lisp_Vector elements are not GCALIGNED on x86 and other platforms.

> but it's not what
> happened in Eli's case:

Yes, that's right. That is, the patch for (1) fixed not only Eli's case, but
other plausible cases.

> 1. malloc() % GCALIGNMENT != 0. Never happens, as far as I can tell.

Although that's true of all current Emacs porting targets as far as I know, I'd
rather not hardwire this into the code, as neither POSIX nor the C standard
require it. This is why the comment refers to platforms where malloc() % 8 != 0
as "oddball hosts".

> 2. A Lisp object requires greater alignment than malloc() gives it.
> IIRC, there was at least one RISC architecture whose specification

We don't need anything that obscure. Just use __int128 on x86 with glibc 2.24.
On that platform __int128's alignment is 16, malloc's is 8.

> I'm not saying it's the best solution, but I would prefer simply
> defining LISP_ALIGNMENT to be 8 to either patch.

That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native
alignment (there's no need to align objects to 8 because the tags are at the
high end).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 16:33:01 GMT) Full text and rfc822 format available.

Message #308 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 30 May 2020 19:32:46 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Sat, 30 May 2020 13:29:33 +0000
> Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org, 
> 	Stefan Monnier <monnier <at> iro.umontreal.ca>
> 
> > > > So this alignment requirement is only due to pthreads being used?
> > >
> > > I'm not sure what you're asking. Obviously there are systems on which
> > > unaligned accesses will fault or be very slow indeed, so we need to
> > > make sure, say, pure space allocations are aligned somehow. That
> > > requires a LISP_ALIGNMENT of 8. Everything beyond that is only for
> > > performance, pthreads, and SIMD types.
> >
> > If the system guarantees 4-byte alignment from malloc (and/or a
> > similar alignment of the runtime C stack), then using that doesn't
> > trigger problems related to unaligned accesses, right?  So let me
> > rephrase: why isn't 4-byte alignment "good enough" for us on systems
> > where malloc and the runtime stack are guaranteed to be thus aligned?
> 
> (The runtime stack isn't relevant, as far as I can tell, since we walk
> that in 4-byte steps on such systems anyway.)

I think it might be relevant for stack-based Lisp objects (if we keep
requiring that Lisp objects are 8-byte aligned on 32-bit platforms).

> You're correct that on such a system, we could get away with a
> LISP_ALIGNMENT of 4, but a LISP_ALIGNMENT of 8 wouldn't hurt either.

That's for sure.  I just wondered why did we start requiring 8-byte
alignment back when we did.  Perhaps someone still remembers the
reason.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 16:38:02 GMT) Full text and rfc822 format available.

Message #311 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 30 May 2020 16:36:31 +0000
On Sat, May 30, 2020 at 4:32 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Sat, 30 May 2020 13:29:33 +0000
> > Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org,
> >       Stefan Monnier <monnier <at> iro.umontreal.ca>
> > (The runtime stack isn't relevant, as far as I can tell, since we walk
> > that in 4-byte steps on such systems anyway.)
>
> I think it might be relevant for stack-based Lisp objects (if we keep
> requiring that Lisp objects are 8-byte aligned on 32-bit platforms).

We should never mark stack-based Lisp objects, no matter how
well-aligned they are!




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 16:43:02 GMT) Full text and rfc822 format available.

Message #314 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 19:42:41 +0300
> Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
>  Stefan Monnier <monnier <at> iro.umontreal.ca>
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Sat, 30 May 2020 09:31:49 -0700
> 
> > I'm not saying it's the best solution, but I would prefer simply
> > defining LISP_ALIGNMENT to be 8 to either patch.
> 
> That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native
> alignment (there's no need to align objects to 8 because the tags are at the
> high end).

I'm using a !USE_LSB_TAG build, but LISP_ALIGNMENT is 8 nonetheless.
What am I missing?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 16:46:02 GMT) Full text and rfc822 format available.

Message #317 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 30 May 2020 19:45:24 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Sat, 30 May 2020 16:36:31 +0000
> Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org, 
> 	Stefan Monnier <monnier <at> iro.umontreal.ca>
> 
> On Sat, May 30, 2020 at 4:32 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > > From: Pip Cet <pipcet <at> gmail.com>
> > > Date: Sat, 30 May 2020 13:29:33 +0000
> > > Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org,
> > >       Stefan Monnier <monnier <at> iro.umontreal.ca>
> > > (The runtime stack isn't relevant, as far as I can tell, since we walk
> > > that in 4-byte steps on such systems anyway.)
> >
> > I think it might be relevant for stack-based Lisp objects (if we keep
> > requiring that Lisp objects are 8-byte aligned on 32-bit platforms).
> 
> We should never mark stack-based Lisp objects, no matter how
> well-aligned they are!

But we do require them to be aligned, at least in the current
codebase.  We actually had crashes in the past when the Windows build
didn't force GCC to align stack on 8-byte boundary in callback
functions.  I don't remember if this was related to GC or not, but the
requirement is definitely there.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 16:55:01 GMT) Full text and rfc822 format available.

Message #320 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 30 May 2020 16:53:53 +0000
On Sat, May 30, 2020 at 4:31 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 5/29/20 2:01 PM, Pip Cet wrote:
> > (1) says:
> > It’s an invalid optimization, since pointers can address the
> > middle of Lisp_Object data.
> >
> > That may be true (we still haven't observed it),
>
> I observed it earlier, in code that iterated through a Lisp vector;

Sorry, I must have missed that.

> at the
> machine level the only pointer was into the middle of that vector. Addresses of
> Lisp_Vector elements are not GCALIGNED on x86 and other platforms.

True.

> > 1. malloc() % GCALIGNMENT != 0. Never happens, as far as I can tell.
>
> Although that's true of all current Emacs porting targets as far as I know, I'd
> rather not hardwire this into the code, as neither POSIX nor the C standard
> require it. This is why the comment refers to platforms where malloc() % 8 != 0
> as "oddball hosts".

But we can't figure out what alignment malloc guarantees, on practical
hosts. To say we assume a malloc alignment of 8 is much better than to
say we assume one of alignof (max_align_t), which is false on many
systems.

> > 2. A Lisp object requires greater alignment than malloc() gives it.
> > IIRC, there was at least one RISC architecture whose specification
>
> We don't need anything that obscure. Just use __int128 on x86 with glibc 2.24.
> On that platform __int128's alignment is 16, malloc's is 8.

Sorry, but I think a type that is actually used by Emacs is less
obscure than __float128 (which I think you mean; __int128 doesn't
exist on x86), nevermind the question of whether the alignment of that
should have been 16, since it works just fine misaligned (except when
AC is set, but that's no longer x86-as-we-know-and-hate-it).

> > I'm not saying it's the best solution, but I would prefer simply
> > defining LISP_ALIGNMENT to be 8 to either patch.
>
> That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native
> alignment (there's no need to align objects to 8 because the tags are at the
> high end).

How is it incorrect? Suboptimal, maybe, though there's a performance
improvement keeping things you access together in the same cache line.

There's no need to align anything (non-SIMD) to anything on x86
without AC set, it's just good for performance; and that performance
improvement applies whether or not Lisp_Objects are natively 64-bit or
2x32-bit.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 17:07:01 GMT) Full text and rfc822 format available.

Message #323 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 10:06:35 -0700
On 5/30/20 9:42 AM, Eli Zaretskii wrote:
>> That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native
>> alignment (there's no need to align objects to 8 because the tags are at the
>> high end).
> I'm using a !USE_LSB_TAG build, but LISP_ALIGNMENT is 8 nonetheless.

That's true for your platform, since alignof (max_align_t) == 8 on your
platform. But neither the C standard nor POSIX guarantee that alignof
(max_align_t) is 8. Admittedly these days one would have to look hard to find a
platform where alignof (max_align_t) is 4 or less.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 17:24:01 GMT) Full text and rfc822 format available.

Message #326 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 20:22:55 +0300
> Cc: pipcet <at> gmail.com, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Sat, 30 May 2020 10:06:35 -0700
> 
> On 5/30/20 9:42 AM, Eli Zaretskii wrote:
> >> That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native
> >> alignment (there's no need to align objects to 8 because the tags are at the
> >> high end).
> > I'm using a !USE_LSB_TAG build, but LISP_ALIGNMENT is 8 nonetheless.
> 
> That's true for your platform, since alignof (max_align_t) == 8 on your
> platform.

No, it's 16.  And I don't understand what does that have to do with
LISP_ALIGNMENT on the master branch, since we all but removed
max_align_t from there.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 17:53:02 GMT) Full text and rfc822 format available.

Message #329 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 10:52:07 -0700
On 5/29/20 10:54 PM, Eli Zaretskii wrote:
>>    (USE_LSB_TAG
>>     ? (uintptr_t) word % GCALIGNMENT == Lisp_Symbol
>>     : (uintptr_t) word % alignof (struct Lisp_Symbol) == 0)
> I don't understand how this will work, given that Lisp object on the
> stack can be pushed as 2 non-contiguous 32-bit words.  Can you
> explain?

On a --with-wide-int host where !USE_LSB_TAG, the above test will work 
correctly on the low-order word of a Lisp object that is a symbol, 
because ((uintptr_t) word % alignof (struct Lisp_Symbol) == 0) must be 
true on such a word.

The test is only for symbols; it's not for other Lisp objects.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 18:06:01 GMT) Full text and rfc822 format available.

Message #332 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 11:04:55 -0700
[Message part 1 (text/plain, inline)]
On 5/30/20 6:29 AM, Pip Cet wrote:

> I'm surprised, but glad that you think so. Patch for emacs-27 attached.
> 

That patch is on the right track but it's not clear whether it will 
cause GC to fail to mark some objects that it should, both because it 
omits mark_maybe_object on platforms like x86 --with-wide-int where 
alignof (void *) < sizeof (Lisp_Object), and because it skips 
mark_maybe_pointer on more-typical platforms where alignof (void *) == 
sizeof (Lisp_Object).

For emacs-27 I propose the attached, more-conservative patch instead. 
This is a backport of part of a patch I've been working on for master. 
As part of that effort I've found some other obscure GC-related bugs 
that we've been lucky to avoid; this patch focuses only on the area Eli 
encountered.
[0001-Be-more-aggressive-in-marking-objects-during-GC.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 18:12:01 GMT) Full text and rfc822 format available.

Message #335 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 21:11:39 +0300
> Cc: monnier <at> iro.umontreal.ca, pipcet <at> gmail.com, 41321 <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Sat, 30 May 2020 10:52:07 -0700
> 
> On 5/29/20 10:54 PM, Eli Zaretskii wrote:
> >>    (USE_LSB_TAG
> >>     ? (uintptr_t) word % GCALIGNMENT == Lisp_Symbol
> >>     : (uintptr_t) word % alignof (struct Lisp_Symbol) == 0)
> > I don't understand how this will work, given that Lisp object on the
> > stack can be pushed as 2 non-contiguous 32-bit words.  Can you
> > explain?
> 
> On a --with-wide-int host where !USE_LSB_TAG, the above test will work 
> correctly on the low-order word of a Lisp object that is a symbol, 
> because ((uintptr_t) word % alignof (struct Lisp_Symbol) == 0) must be 
> true on such a word.
> 
> The test is only for symbols; it's not for other Lisp objects.

So any pointer whose alignment is the same as 'struct Lisp_Symbol'
will pass the test, regardless of the tag bits?  That's basically most
of the struct pointers on those architectures, no?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 18:13:02 GMT) Full text and rfc822 format available.

Message #338 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 30 May 2020 18:12:03 +0000
On Sat, May 30, 2020 at 6:04 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> For emacs-27 I propose the attached, more-conservative patch instead.

More conservative is good! So, yes, I prefer your patch.

> This is a backport of part of a patch I've been working on for master.
> As part of that effort I've found some other obscure GC-related bugs
> that we've been lucky to avoid; this patch focuses only on the area Eli
> encountered.

Looking forward to hearing about those :-)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 18:13:02 GMT) Full text and rfc822 format available.

Message #341 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 11:12:49 -0700
On 5/30/20 10:22 AM, Eli Zaretskii wrote:
>>>> That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native
>>>> alignment (there's no need to align objects to 8 because the tags are at the
>>>> high end).
>>> I'm using a !USE_LSB_TAG build, but LISP_ALIGNMENT is 8 nonetheless.
>> That's true for your platform, since alignof (max_align_t) == 8 on your
>> platform.
> No, it's 16.  And I don't understand what does that have to do with
> LISP_ALIGNMENT on the master branch, since we all but removed
> max_align_t from there.

Oh, I thought you were talking about the emacs-27 branch which is still 
using max_align_t.

You're right that LISP_ALIGNMENT is 8 on your platform on the master 
branch. However, my comment "That's not correct for !USE_LSB_TAG ..." 
(Bug#41321#305) was responding to Pip Cet's earlier comment "I would 
prefer simply defining LISP_ALIGNMENT to be 8" (Bug#41321#272) which was 
talking about the emacs-27 branch.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 18:18:01 GMT) Full text and rfc822 format available.

Message #344 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 21:16:52 +0300
> Cc: 41321 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Sat, 30 May 2020 11:04:55 -0700
> 
> For emacs-27 I propose the attached, more-conservative patch instead. 
> This is a backport of part of a patch I've been working on for master. 
> As part of that effort I've found some other obscure GC-related bugs 
> that we've been lucky to avoid; this patch focuses only on the area Eli 
> encountered.

Please explain in comments why we are marking one more pointer in the
loop.  Also, I don't think I understand why this solves all of the
problems we were discussing; is this in addition to another patch that
you propose for emacs-27?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 18:18:02 GMT) Full text and rfc822 format available.

Message #347 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 11:17:31 -0700
On 5/30/20 11:11 AM, Eli Zaretskii wrote:
> So any pointer whose alignment is the same as 'struct Lisp_Symbol'
> will pass the test, regardless of the tag bits?  That's basically most
> of the struct pointers on those architectures, no?

Yes, pretty much.

This is an inevitable consequence of the problem at hand. For aligned 
pointers we must consult the red-black tree no matter what solution we 
pick, because the compiler may have aligned a pointer for us.

Just to make sure we're on the same page here. This stuff is only about 
how to improve performance (compared to the patch proposed for emacs-27 
in Bug#41321#332) by doing fast checks on words before giving them to 
the red-black search.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 18:22:01 GMT) Full text and rfc822 format available.

Message #350 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 21:21:15 +0300
> Cc: pipcet <at> gmail.com, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Sat, 30 May 2020 11:12:49 -0700
> 
> On 5/30/20 10:22 AM, Eli Zaretskii wrote:
> >>>> That's not correct for !USE_LSB_TAG, where LISP_ALIGNMENT is merely the native
> >>>> alignment (there's no need to align objects to 8 because the tags are at the
> >>>> high end).
> >>> I'm using a !USE_LSB_TAG build, but LISP_ALIGNMENT is 8 nonetheless.
> >> That's true for your platform, since alignof (max_align_t) == 8 on your
> >> platform.
> > No, it's 16.  And I don't understand what does that have to do with
> > LISP_ALIGNMENT on the master branch, since we all but removed
> > max_align_t from there.
> 
> Oh, I thought you were talking about the emacs-27 branch which is still 
> using max_align_t.
> 
> You're right that LISP_ALIGNMENT is 8 on your platform on the master 
> branch. However, my comment "That's not correct for !USE_LSB_TAG ..." 
> (Bug#41321#305) was responding to Pip Cet's earlier comment "I would 
> prefer simply defining LISP_ALIGNMENT to be 8" (Bug#41321#272) which was 
> talking about the emacs-27 branch.

I'm still confused, because on current emacs-27, both LISP_ALIGNMENT
and alignof(max_align_t) are 16 in my builds.  And I still don't
understand why using LISP_ALIGNMENT of 8 is not right in this case (on
emacs-27).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 18:41:02 GMT) Full text and rfc822 format available.

Message #353 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 30 May 2020 18:39:36 +0000
[Message part 1 (text/plain, inline)]
On Sat, May 30, 2020 at 6:04 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 5/30/20 6:29 AM, Pip Cet wrote:
> > I'm surprised, but glad that you think so. Patch for emacs-27 attached.
> That patch is on the right track but it's not clear whether it will
> cause GC to fail to mark some objects that it should, both because it
> omits mark_maybe_object on platforms like x86 --with-wide-int where
> alignof (void *) < sizeof (Lisp_Object), and because it skips
> mark_maybe_pointer on more-typical platforms where alignof (void *) ==
> sizeof (Lisp_Object).

I've thought about this for a while, but I fail to see the problem
with my patch. mark_maybe_object is unnecessary on x86
--with-wide-int, and mark_maybe_pointer (off + lispsym) is unnecessary
on platforms that don't rip apart our precious Lisp_Objects. The other
call to mark_maybe_pointer isn't skipped.

I still think we ought to use yours (and accept a ~25% performance
penalty in this particular loop on Eli's platform), but include a
comment like the one I had in mine. It might hide further bugs, but
that's probably what we want to do on emacs-27.

Proposed patch attached.
[0001-Be-more-aggressive-in-marking-objects-during-GC-bug-.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 18:46:02 GMT) Full text and rfc822 format available.

Message #356 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 11:45:04 -0700
[Message part 1 (text/plain, inline)]
On 5/30/20 11:16 AM, Eli Zaretskii wrote:

> Please explain in comments why we are marking one more pointer in the
> loop.

Sure. I'm attaching the revised patch proposed for emacs-27. This is 
very similar to what Pip Cet just proposed in Bug#41321#353, but the 
code is simpler with fewer casts (and I like my comment better :-).

> I don't think I understand why this solves all of the
> problems we were discussing; is this in addition to another patch that
> you propose for emacs-27?

This replaces all the patches that I proposed for emacs-27 in this 
thread. Although this patch doesn't solve all the problems we have been 
discussing, it does solve the urgent ones:

* The problem you observed on MinGW for markers; it can also occur for 
many other object types. This problem can cause the GC to incorrectly 
reclaim storage for objects, causing the usual disasters.

* The similar problem that Pip Cet noted for symbols.

The patch does not solve less-urgent problems we've talked about, such 
as over-alignment of Lisp objects on MinGW (this is a relatively minor 
performance issue), or the more-obscure and unlikely GC bugs that we've 
been living with for a while (which I haven't had the time to think 
through entirely).
[0001-Be-more-aggressive-in-marking-objects-during-GC.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 18:59:01 GMT) Full text and rfc822 format available.

Message #359 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 11:57:52 -0700
On 5/30/20 11:39 AM, Pip Cet wrote:
> I fail to see the problem
> with my patch. mark_maybe_object is unnecessary on x86
> --with-wide-int, and mark_maybe_pointer (off + lispsym) is unnecessary
> on platforms that don't rip apart our precious Lisp_Objects. The other
> call to mark_maybe_pointer isn't skipped.

The other alloc.c code is inconsistent with respect to the 
live_*_holding versus live_*_p functions. There is no live_float_holding 
function, which means we're relying entirely on mark_maybe_object to 
find roots that contain Lisp floats. So it's dicey that your earlier 
(Bug#41321#299) patch skips the call to mark_maybe_object on some platforms.

I've been working on improving this for master.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 19:08:01 GMT) Full text and rfc822 format available.

Message #362 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 30 May 2020 19:06:29 +0000
On Sat, May 30, 2020 at 6:57 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 5/30/20 11:39 AM, Pip Cet wrote:
> > I fail to see the problem
> > with my patch. mark_maybe_object is unnecessary on x86
> > --with-wide-int, and mark_maybe_pointer (off + lispsym) is unnecessary
> > on platforms that don't rip apart our precious Lisp_Objects. The other
> > call to mark_maybe_pointer isn't skipped.
>
> The other alloc.c code is inconsistent with respect to the
> live_*_holding versus live_*_p functions. There is no live_float_holding
> function,

Indeed. There's just live_float_p.

> which means we're relying entirely on mark_maybe_object to
> find roots that contain Lisp floats.

No, we're not. There's code in mark_maybe_pointer to handle the float
case, by calling live_float_p.

Is it misaligned pointers into floats you're worried about?

> So it's dicey that your earlier
> (Bug#41321#299) patch skips the call to mark_maybe_object on some platforms.

I still fail to see how.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 19:15:01 GMT) Full text and rfc822 format available.

Message #365 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 12:14:14 -0700
On 5/30/20 11:21 AM, Eli Zaretskii wrote:

> on current emacs-27, both LISP_ALIGNMENT
> and alignof(max_align_t) are 16 in my builds.  And I still don't
> understand why using LISP_ALIGNMENT of 8 is not right in this case (on
> emacs-27).

You're right that LISP_ALIGNMENT is 16 on your host on the emacs-27 
branch, because alignof (max_align_t) is 16 there. And you're also right 
that setting LISP_ALIGNMENT to be 8 on your host would fix the marker 
bug you observed there, because it would work around your host's bug 
where malloc returns a pointer that is not a multiple of 
alignof(max_align_t). However, C and POSIX allow platforms where 
LISP_ALIGNMENT should be greater than 8, or (if !USE_LSB_TAG) should be 
less than 8, so I'd be leery about changing LISP_ALIGNMENT on any host 
that doesn't have your host's idiosyncrasies. And that specific 
workaround should not be needed anyway if we install the emacs-27 patch 
that I have most-recently suggested (or Pip Cet's very-similar recent 
patch), since this patch solves the problem in a more-general way that 
should help to prevent more bugs like this one.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 19:34:01 GMT) Full text and rfc822 format available.

Message #368 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 22:33:27 +0300
> Cc: pipcet <at> gmail.com, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Sat, 30 May 2020 12:14:14 -0700
> 
> On 5/30/20 11:21 AM, Eli Zaretskii wrote:
> 
> > on current emacs-27, both LISP_ALIGNMENT
> > and alignof(max_align_t) are 16 in my builds.  And I still don't
> > understand why using LISP_ALIGNMENT of 8 is not right in this case (on
> > emacs-27).
> 
> You're right that LISP_ALIGNMENT is 16 on your host on the emacs-27 
> branch, because alignof (max_align_t) is 16 there. And you're also right 
> that setting LISP_ALIGNMENT to be 8 on your host would fix the marker 
> bug you observed there, because it would work around your host's bug 
> where malloc returns a pointer that is not a multiple of 
> alignof(max_align_t). However, C and POSIX allow platforms where 
> LISP_ALIGNMENT should be greater than 8, or (if !USE_LSB_TAG) should be 
> less than 8, so I'd be leery about changing LISP_ALIGNMENT on any host 
> that doesn't have your host's idiosyncrasies.

Posix may require it, but do we actually know of any supported
important platforms where this happens?  If not, let's worry about the
more general fix on master, where we still have time to try various
solutions, and settle for a simpler and easier fix on emacs-27.

> And that specific workaround should not be needed anyway if we
> install the emacs-27 patch that I have most-recently suggested (or
> Pip Cet's very-similar recent patch), since this patch solves the
> problem in a more-general way that should help to prevent more bugs
> like this one.

But your proposal is also less efficient, isn't it?   If so, its more
general nature comes at a price.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 21:28:01 GMT) Full text and rfc822 format available.

Message #371 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 14:27:10 -0700
On 5/30/20 12:06 PM, Pip Cet wrote:

> Is it misaligned pointers into floats you're worried about?

Yes, and it's plausible there will be pointers misaligned because 
Lisp_Float has been added to them.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 21:50:02 GMT) Full text and rfc822 format available.

Message #374 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 30 May 2020 21:49:04 +0000
On Sat, May 30, 2020 at 9:27 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> > Is it misaligned pointers into floats you're worried about?
>
> Yes, and it's plausible there will be pointers misaligned because
> Lisp_Float has been added to them.

Sorry for being dense, but I still don't understand. This is on
!LSB_TAG machines, where Lisp_Float does not affect the representation
of the lower 32 bits. On LSB_TAG machines, the other code path is
taken.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 22:20:01 GMT) Full text and rfc822 format available.

Message #377 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 15:18:53 -0700
On 5/30/20 12:33 PM, Eli Zaretskii wrote:

> Posix may require it, but do we actually know of any supported
> important platforms where this happens?

That depends on what the question is. If the question is "Are there 
platforms where the lost-marker bug occurs?", then no, we don't know of 
any supported important platforms. But if the question is "Are there 
platforms where LISP_ALIGNMENT should be some value other than 8?", then 
yes, LISP_ALIGNMENT should be 4 on Ubuntu 18.04.3 i386 when Emacs is 
configured --with-wide-int (I just tested this, and it is indeed 4 on 
that platform in the Emacs master branch). This is because on this 
platform Lisp objects have a native alignment of 4, and this platform is 
!USE_LSB_TAG so the presence of tag bits imposes no extra alignment 
requirement.

> let's worry about the
> more general fix on master, where we still have time to try various
> solutions, and settle for a simpler and easier fix on emacs-27.

Yes, that's what we're trying to do, and it's what's in the latest patch 
that Pip Cet and I proposed very similar variants of.

> But your proposal is also less efficient, isn't it?   If so, its more
> general nature comes at a price.

Sure. Compared to simply making LISP_ALIGNMENT = 8 as a workaround 
(which is not correct as noted above but which fixes the lost-marker 
bug), the proposed patch is about a 1% CPU-time hit in my usual 
benchmark (make compile-always) on a 32-bit platform compiled with 
--with-wide-int (this is Ubuntu 18.04.4, gcc -m32, Xeon E3-1225 v2). We 
can surely speed this up with some cost in complexity (that's what I was 
working on on the master branch), but for emacs-27 I thought that 
reliability took precedence over 1% performance improvements.

I expect that most of the performance hit is not due to the 
LISP_ALIGNMENT thing, it's due to the "you have to check pointers three 
times" thing. In my master-branch draft I'm working on getting this down 
to "you have to check pointers an average of 1+epsilon times" for some 
suitable value of epsilon. I don't know yet what epsilon will be. But 
anyway, this is only about improving that 1% performance hit.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 22:24:01 GMT) Full text and rfc822 format available.

Message #380 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sat, 30 May 2020 15:23:22 -0700
On 5/30/20 2:49 PM, Pip Cet wrote:
> On Sat, May 30, 2020 at 9:27 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>>> Is it misaligned pointers into floats you're worried about?
>>
>> Yes, and it's plausible there will be pointers misaligned because
>> Lisp_Float has been added to them.
> 
> Sorry for being dense, but I still don't understand. This is on
> !LSB_TAG machines, where Lisp_Float does not affect the representation
> of the lower 32 bits. On LSB_TAG machines, the other code path is
> taken.
> 

Oh, I see I am being the dense one. I was thinking based on some of my 
master-branch improvements. One option is to do away with 
mark_maybe_object entirely, so that one needn't deal with looking at 
each part of the stack twice (this is for efficiency).

In emacs-27 the patch you proposed earlier is probably OK, though I 
haven't had time to think through all the possibilities.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sat, 30 May 2020 22:55:02 GMT) Full text and rfc822 format available.

Message #383 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sat, 30 May 2020 22:54:13 +0000
On Sat, May 30, 2020 at 10:23 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Oh, I see I am being the dense one. I was thinking based on some of my
> master-branch improvements. One option is to do away with
> mark_maybe_object entirely, so that one needn't deal with looking at
> each part of the stack twice (this is for efficiency).

Yes, I thought you'd already done that on master. I must not have been
keeping up with the patches.

Much as I like thinking about putting symbols in the rbtree twice and
walking it smartly to retrieve up to two overlapping nodes, I suspect
there are much easier ways of fixing this, at least on 64-bit
architectures. We could make sure, for example, that all symbol blocks
come after lispsym in memory, and store lispsym - address in the
Lisp_Object. Those values would then fall outside the 48-bit space of
actually valid x86_64 addresses, so we could get away with
mark_maybe_pointer (word < 0 ? lispsym - word : word) on that
architecture.

> In emacs-27 the patch you proposed earlier is probably OK, though I
> haven't had time to think through all the possibilities.

I was just curious. I think we should go with your latest patch.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sun, 31 May 2020 15:49:02 GMT) Full text and rfc822 format available.

Message #386 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sun, 31 May 2020 18:48:28 +0300
> Cc: pipcet <at> gmail.com, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Sat, 30 May 2020 15:18:53 -0700
> 
> On 5/30/20 12:33 PM, Eli Zaretskii wrote:
> 
> > Posix may require it, but do we actually know of any supported
> > important platforms where this happens?
> 
> > But your proposal is also less efficient, isn't it?   If so, its more
> > general nature comes at a price.
> 
> Sure. Compared to simply making LISP_ALIGNMENT = 8 as a workaround 
> (which is not correct as noted above but which fixes the lost-marker 
> bug), the proposed patch is about a 1% CPU-time hit in my usual 
> benchmark (make compile-always) on a 32-bit platform compiled with 
> --with-wide-int (this is Ubuntu 18.04.4, gcc -m32, Xeon E3-1225 v2). We 
> can surely speed this up with some cost in complexity (that's what I was 
> working on on the master branch), but for emacs-27 I thought that 
> reliability took precedence over 1% performance improvements.
> 
> I expect that most of the performance hit is not due to the 
> LISP_ALIGNMENT thing, it's due to the "you have to check pointers three 
> times" thing. In my master-branch draft I'm working on getting this down 
> to "you have to check pointers an average of 1+epsilon times" for some 
> suitable value of epsilon. I don't know yet what epsilon will be. But 
> anyway, this is only about improving that 1% performance hit.

OK, then let's get this change into emacs-27, and thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Mon, 01 Jun 2020 14:49:02 GMT) Full text and rfc822 format available.

Message #389 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: eggert <at> cs.ucla.edu
Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Mon, 01 Jun 2020 17:48:42 +0300
> Date: Sun, 31 May 2020 18:48:28 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca, pipcet <at> gmail.com
> 
> > Sure. Compared to simply making LISP_ALIGNMENT = 8 as a workaround 
> > (which is not correct as noted above but which fixes the lost-marker 
> > bug), the proposed patch is about a 1% CPU-time hit in my usual 
> > benchmark (make compile-always) on a 32-bit platform compiled with 
> > --with-wide-int (this is Ubuntu 18.04.4, gcc -m32, Xeon E3-1225 v2). We 
> > can surely speed this up with some cost in complexity (that's what I was 
> > working on on the master branch), but for emacs-27 I thought that 
> > reliability took precedence over 1% performance improvements.
> > 
> > I expect that most of the performance hit is not due to the 
> > LISP_ALIGNMENT thing, it's due to the "you have to check pointers three 
> > times" thing. In my master-branch draft I'm working on getting this down 
> > to "you have to check pointers an average of 1+epsilon times" for some 
> > suitable value of epsilon. I don't know yet what epsilon will be. But 
> > anyway, this is only about improving that 1% performance hit.
> 
> OK, then let's get this change into emacs-27, and thanks.

FTR, I'm now running Emacs 27.0.91 pretest patched with Paul's changes
in commit  68b6dad1d8e22fe700871c9a5a18da3dd496cc8a.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sun, 27 Sep 2020 14:41:02 GMT) Full text and rfc822 format available.

Message #392 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca,
 pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sun, 27 Sep 2020 16:39:51 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> OK, then let's get this change into emacs-27, and thanks.
>
> FTR, I'm now running Emacs 27.0.91 pretest patched with Paul's changes
> in commit  68b6dad1d8e22fe700871c9a5a18da3dd496cc8a.

I've just lightly skimmed this thread, but does this mean that the bug
was fixed and this bug report can be closed?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sun, 27 Sep 2020 14:46:01 GMT) Full text and rfc822 format available.

Message #395 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91;
 Emacs aborts due to invalid pseudovector objects
Date: Sun, 27 Sep 2020 14:45:06 +0000
On Sun, Sep 27, 2020 at 2:40 PM Lars Ingebrigtsen <larsi <at> gnus.org> wrote:
> Eli Zaretskii <eliz <at> gnu.org> writes:
> >> OK, then let's get this change into emacs-27, and thanks.
> >
> > FTR, I'm now running Emacs 27.0.91 pretest patched with Paul's changes
> > in commit  68b6dad1d8e22fe700871c9a5a18da3dd496cc8a.
>
> I've just lightly skimmed this thread, but does this mean that the bug
> was fixed and this bug report can be closed?

I believe it can be, yes, though I'm not sure I ever managed to
convince Eli that the bug I found was the bug he was seeing...

(Sorry for not getting to the other bug reports, BTW, I'm incredibly
busy with family business right now.)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#41321; Package emacs. (Sun, 27 Sep 2020 15:03:02 GMT) Full text and rfc822 format available.

Message #398 received at 41321 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, eggert <at> cs.ucla.edu, 41321 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sun, 27 Sep 2020 17:02:02 +0200
Pip Cet <pipcet <at> gmail.com> writes:

> (Sorry for not getting to the other bug reports, BTW, I'm incredibly
> busy with family business right now.)

Sure; no problem.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Sun, 27 Sep 2020 15:18:02 GMT) Full text and rfc822 format available.

Notification sent to Eli Zaretskii <eliz <at> gnu.org>:
bug acknowledged by developer. (Sun, 27 Sep 2020 15:18:02 GMT) Full text and rfc822 format available.

Message #403 received at 41321-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: eggert <at> cs.ucla.edu, 41321-done <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca,
 pipcet <at> gmail.com
Subject: Re: bug#41321: 27.0.91; Emacs aborts due to invalid pseudovector
 objects
Date: Sun, 27 Sep 2020 18:16:40 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: eggert <at> cs.ucla.edu,  41321 <at> debbugs.gnu.org,  monnier <at> iro.umontreal.ca,
>   pipcet <at> gmail.com
> Date: Sun, 27 Sep 2020 16:39:51 +0200
> 
> > FTR, I'm now running Emacs 27.0.91 pretest patched with Paul's changes
> > in commit  68b6dad1d8e22fe700871c9a5a18da3dd496cc8a.
> 
> I've just lightly skimmed this thread, but does this mean that the bug
> was fixed and this bug report can be closed?

Yes, done.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 26 Oct 2020 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 254 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.