GNU bug report logs - #57789
Emacs 28.1 clone build with native compilation crashes on s390x

Previous Next

Package: emacs;

Reported by: Rob Browning <rlb <at> defaultvalue.org>

Date: Wed, 14 Sep 2022 01:05:01 UTC

Severity: normal

Tags: moreinfo

To reply to this bug, email your comments to 57789 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Wed, 14 Sep 2022 01:05:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Rob Browning <rlb <at> defaultvalue.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 14 Sep 2022 01:05:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Rob Browning <rlb <at> defaultvalue.org>
To: bug-gnu-emacs <at> gnu.org
Subject: Emacs 28.1 clone build with native compilation crashes on s390x
Date: Tue, 13 Sep 2022 20:04:32 -0500
On zelenka.debian.org https://db.debian.org/machines.cgi?host=zelenka
the build crashes with a segfault with current Debian sid (unstable).  I
can produce the crash like this:

  git clone --single-branch --branch emacs-28 .../emacs.git
  cd emacs
  ./autogen.sh
  ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation
  make check

The debian package produced a similar failure earlier:

  https://buildd.debian.org/status/fetch.php?pkg=emacs&arch=s390x&ver=1%3A28.1%2B1-3&stamp=1662863442&raw=0

Here's the final bit of the clone build's log, and I'm happy to help
test on the machine if that'd be useful:

  Loading /home/rlb/emacs/lisp/electric.el (source)...
  Loading /home/rlb/emacs/lisp/paren.el (source)...
  Loading /home/rlb/emacs/lisp/emacs-lisp/shorthands.el (source)...
  Loading /home/rlb/emacs/lisp/emacs-lisp/eldoc.el (source)...
  Loading /home/rlb/emacs/lisp/cus-start.el (source)...
  Loading /home/rlb/emacs/lisp/tooltip.el (source)...
  Loading /home/rlb/emacs/lisp/international/iso-transl.el (source)...
  Finding pointers to doc strings...
  Finding pointers to doc strings...done
  Dumping under the name bootstrap-emacs.pdmp
  Dumping fingerprint: b4b1b9ac4d82ce4537c0e1eb6527b2b7f5831cb6de31c7f9b2fd2a1a0c4531c4
  Dump complete
  Byte counts: header=100 hot=14915588 discardable=175392 cold=10410424
  Reloc counts: hot=1048047 discardable=5080
  make -C ../lisp compile-first EMACS="../src/bootstrap-emacs"
  make[2]: Entering directory '/home/rlb/emacs/lisp'
    ELC+ELN  emacs-lisp/macroexp.elc
    ELC+ELN  emacs-lisp/cconv.elc
    ELC+ELN  emacs-lisp/byte-opt.elc
    ELC+ELN  emacs-lisp/bytecomp.elc
    ELC+ELN  emacs-lisp/comp.elc
    ELC+ELN  emacs-lisp/comp-cstr.elc
    ELC+ELN  emacs-lisp/cl-macs.elc
    ELC+ELN  emacs-lisp/rx.elc
    ELC+ELN  emacs-lisp/cl-seq.elc
  Fatal error 11: Segmentation fault
  Backtrace:
  ../src/bootstrap-emacs(+0x15deb6)[0x2aa0a7ddeb6]
  ../src/bootstrap-emacs(+0x4efc4)[0x2aa0a6cefc4]
  ../src/bootstrap-emacs(+0x4f1fe)[0x2aa0a6cf1fe]
  ../src/bootstrap-emacs(+0x15c240)[0x2aa0a7dc240]
  ../src/bootstrap-emacs(+0x15c2d2)[0x2aa0a7dc2d2]
  ../src/bootstrap-emacs(+0x6a47d8)[0x2aa0ad247d8]
  ../src/bootstrap-emacs(+0x1a7de0)[0x2aa0a827de0]
  ../src/bootstrap-emacs(+0x1a8ee6)[0x2aa0a828ee6]
  ../src/bootstrap-emacs(+0x1a8ee6)[0x2aa0a828ee6]
  ../src/bootstrap-emacs(+0x1a8ee6)[0x2aa0a828ee6]
  ../src/bootstrap-emacs(+0x1a8ee6)[0x2aa0a828ee6]
  ../src/bootstrap-emacs(+0x1a7c3e)[0x2aa0a827c3e]
  ../src/bootstrap-emacs(+0x1a9094)[0x2aa0a829094]
  ../src/bootstrap-emacs(eval_sub+0x410)[0x2aa0a84cc28]
  ../src/bootstrap-emacs(+0x1cdc2e)[0x2aa0a84dc2e]
  ../src/bootstrap-emacs(+0x1cdf10)[0x2aa0a84df10]
  ../src/bootstrap-emacs(eval_sub+0x1b8)[0x2aa0a84c9d0]
  ../src/bootstrap-emacs(+0x1cdeb8)[0x2aa0a84deb8]
  ../src/bootstrap-emacs(eval_sub+0x1b8)[0x2aa0a84c9d0]
  ../src/bootstrap-emacs(eval_sub+0x2c4)[0x2aa0a84cadc]
  ../src/bootstrap-emacs(+0x1cd26a)[0x2aa0a84d26a]
  ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa0a84cd4a]
  ../src/bootstrap-emacs(eval_sub+0x4ba)[0x2aa0a84ccd2]
  ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa0a84cd4a]
  ../src/bootstrap-emacs(+0x1ce488)[0x2aa0a84e488]
  ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa0a84cd4a]
  ../src/bootstrap-emacs(+0x1cd8cc)[0x2aa0a84d8cc]
  ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa0a84cd4a]
  ../src/bootstrap-emacs(+0x1cdc2e)[0x2aa0a84dc2e]
  ../src/bootstrap-emacs(+0x1cdf10)[0x2aa0a84df10]
  ../src/bootstrap-emacs(eval_sub+0x1b8)[0x2aa0a84c9d0]
  ../src/bootstrap-emacs(eval_sub+0x2c4)[0x2aa0a84cadc]
  ../src/bootstrap-emacs(+0x1cdc2e)[0x2aa0a84dc2e]
  ../src/bootstrap-emacs(Ffuncall+0x1f2)[0x2aa0a84a202]
  ../src/bootstrap-emacs(+0x1cc6a4)[0x2aa0a84c6a4]
  ../src/bootstrap-emacs(+0x1ce26c)[0x2aa0a84e26c]
  ../src/bootstrap-emacs(eval_sub+0x638)[0x2aa0a84ce50]
  ../src/bootstrap-emacs(+0x1ce7ec)[0x2aa0a84e7ec]
  ../src/bootstrap-emacs(eval_sub+0x532)[0x2aa0a84cd4a]
  ../src/bootstrap-emacs(+0x1cdc2e)[0x2aa0a84dc2e]
  ../src/bootstrap-emacs(+0x1cdf10)[0x2aa0a84df10]
  ...
  make[2]: *** [Makefile:316: emacs-lisp/cl-seq.elc] Segmentation fault
  make[2]: Leaving directory '/home/rlb/emacs/lisp'
  make[1]: *** [Makefile:870: bootstrap-emacs.pdmp] Error 2
  make[1]: Leaving directory '/home/rlb/emacs/src'
  make: *** [Makefile:449: src] Error 2

Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Wed, 14 Sep 2022 02:43:02 GMT) Full text and rfc822 format available.

Message #8 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Rob Browning <rlb <at> defaultvalue.org>
Cc: 57789 <at> debbugs.gnu.org
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation crashes
 on s390x
Date: Wed, 14 Sep 2022 05:42:19 +0300
> From: Rob Browning <rlb <at> defaultvalue.org>
> Date: Tue, 13 Sep 2022 20:04:32 -0500
> 
> On zelenka.debian.org https://db.debian.org/machines.cgi?host=zelenka
> the build crashes with a segfault with current Debian sid (unstable).  I
> can produce the crash like this:
> 
>   git clone --single-branch --branch emacs-28 .../emacs.git

If you build the current emacs-28 branch, then it isn't Emacs 28.1,
it's Emacs 28.2.50, right?

>   cd emacs
>   ./autogen.sh
>   ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation
>   make check
> 
> The debian package produced a similar failure earlier:
> 
>   https://buildd.debian.org/status/fetch.php?pkg=emacs&arch=s390x&ver=1%3A28.1%2B1-3&stamp=1662863442&raw=0
> 
> Here's the final bit of the clone build's log, and I'm happy to help
> test on the machine if that'd be useful:

Please run the crashing command under GDB, and when it segfaults,
produce the C-level and Lisp-level backtrace, and post them here.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Wed, 14 Sep 2022 03:07:01 GMT) Full text and rfc822 format available.

Message #11 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Rob Browning <rlb <at> defaultvalue.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 57789 <at> debbugs.gnu.org
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation
 crashes on s390x
Date: Tue, 13 Sep 2022 22:06:41 -0500
Eli Zaretskii <eliz <at> gnu.org> writes:

> If you build the current emacs-28 branch, then it isn't Emacs 28.1,
> it's Emacs 28.2.50, right?

Right, sorry, the clone test was the current branch tip, and the buildd
log was for (Debian's partially altered) tree, derived from the
emacs-28.1 tag.  I can easily re-test the 28.1 tag if we like.

> Please run the crashing command under GDB, and when it segfaults,
> produce the C-level and Lisp-level backtrace, and post them here.

Will attempt.

Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Wed, 14 Sep 2022 03:21:01 GMT) Full text and rfc822 format available.

Message #14 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Rob Browning <rlb <at> defaultvalue.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 57789 <at> debbugs.gnu.org
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation
 crashes on s390x
Date: Tue, 13 Sep 2022 22:20:46 -0500
Rob Browning <rlb <at> defaultvalue.org> writes:

> Will attempt.

Hmm, so I ran "make V=1" from the same tree and saw thw command that
repeatably crashed, which was:

  EMACSLOADPATH= '../src/bootstrap-emacs' -batch --no-site-file --no-site-lisp --eval '(setq load-prefer-newer t)'  \
    -l comp -f batch-byte+native-compile international/titdic-cnv.el

I then ran that manually via 

  (cd lisp
   && EMACSLOADPATH= '../src/bootstrap-emacs' -batch --no-site-file --no-site-lisp --eval '(setq load-prefer-newer t)'  \
      -l comp -f batch-byte+native-compile international/titdic-cnv.el)

which ran for a bit and succeeded.  After that a make worked fine until
bindings.el where it crashed again, this time with an "Aborted", and
running it manually didn't help.

In any case, I'm going to start over and try to get the backtraces for
the titdic-cnv.el failure.

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Wed, 14 Sep 2022 20:20:02 GMT) Full text and rfc822 format available.

Message #17 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Rob Browning <rlb <at> defaultvalue.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 57789 <at> debbugs.gnu.org
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation
 crashes on s390x
Date: Wed, 14 Sep 2022 15:19:24 -0500
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:

> Please run the crashing command under GDB, and when it segfaults,
> produce the C-level and Lisp-level backtrace, and post them here.

Starting from scratch with the emacs-28.1 commit I can reproduce the
failure when building via

  ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation

It crashes with the same segfault repeatably, i.e. if you run make
again, it crashes again on the previously mentioned "... -l comp -f
batch-byte+native-compile international/titdic-cnv.el" invocation.  That
crash output is attached below.

After adjusting the Makefile.in invocation so I could run it with gdb in
exactly the same environment once it's failing on that command, I
captured the backtrace and included it below.

With respect to the Lisp-level backtrace, I imagined you probably meant
an xbacktrace?  If so (and assuming I'm guessing right about how I
should do that), I haven't figured out how to arrange sourcing the
src/.gdbinit from the src/Makefile.in command.  I'm likely doing
something wrong, but it doesn't seem to want to load the file.

It looked like it might be because there were no debug symbols, so I
tried adding a CFLAGS=-g3 to the end of the ./configure, but that caused
the crash to disappear entirely.

Finally (and this was just a random guess based on previous experiences,
particularly with programs like guile that play (normal, traditional)
tricks with pointers/coercions/etc.) I noticed that emacs doesn't
specify -fno-strict-aliasing, and unless all the C code has been written
with that in mind, I assume that might open a window allowing the
optimizer to introduce undesirable changes.  So I added a
CFLAGS=-fno-strict-aliasing to the end of the ./configure command, and
then the build and tests worked fine (twice in a row):

  ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation \
    CFLAGS=-fno-strict-aliasing

Of course that's not remotely conclusive, but if all of the C code
wasn't written with strict-aliasing in mind, then I wondered if it might
make sense to consider adding -fno-strict-aliasing as a default option.

Also, even if that ends up being desirable, I'm not sure it'll be
sufficient.  That is, I suspect I might want to run the full build/check
with -fno-strict-aliasing in a loop for a bit to make sure the clean
build/check is reliable, since I think I may have seen some test crashes
(not the build crash) on one earlier run with that option, but I'm not
sure that was a clean attempt.

The make crash:

[emacs-s390x-crash (text/plain, inline)]
make[2]: Entering directory '/home/rlb/emacs/lisp'
EMACSLOADPATH= '../src/bootstrap-emacs' -batch --no-site-file --no-site-lisp --eval '(setq load-prefer-newer t)'  \
-l comp -f batch-byte+native-compile international/titdic-cnv.el
Fatal error 11: Segmentation fault
Backtrace:
../src/bootstrap-emacs(+0x15deb6)[0x2aa293ddeb6]
../src/bootstrap-emacs(+0x4efc4)[0x2aa292cefc4]
../src/bootstrap-emacs(+0x4f1fe)[0x2aa292cf1fe]
../src/bootstrap-emacs(+0x15c240)[0x2aa293dc240]
../src/bootstrap-emacs(+0x15c2d2)[0x2aa293dc2d2]
../src/bootstrap-emacs(+0x6a47d8)[0x2aa299247d8]
../src/bootstrap-emacs(+0x1a7fa8)[0x2aa29427fa8]
../src/bootstrap-emacs(+0x1a8ee6)[0x2aa29428ee6]
../src/bootstrap-emacs(+0x1a8ee6)[0x2aa29428ee6]
../src/bootstrap-emacs(+0x1a8ee6)[0x2aa29428ee6]
../src/bootstrap-emacs(+0x1a7c3e)[0x2aa29427c3e]
../src/bootstrap-emacs(+0x1a9094)[0x2aa29429094]
../src/bootstrap-emacs(Ffuncall+0x2de)[0x2aa2944a2ee]
../src/bootstrap-emacs(+0x1ca42c)[0x2aa2944a42c]
../src/bootstrap-emacs(+0x1f0c72)[0x2aa29470c72]
../src/bootstrap-emacs(+0x1f7fb0)[0x2aa29477fb0]
../src/bootstrap-emacs(+0x1f8474)[0x2aa29478474]
../src/bootstrap-emacs(eval_sub+0x5e4)[0x2aa2944cdfc]
../src/bootstrap-emacs(+0x1ce488)[0x2aa2944e488]
../src/bootstrap-emacs(eval_sub+0x532)[0x2aa2944cd4a]
../src/bootstrap-emacs(+0x1ce488)[0x2aa2944e488]
../src/bootstrap-emacs(eval_sub+0x532)[0x2aa2944cd4a]
../src/bootstrap-emacs(+0x1ce8cc)[0x2aa2944e8cc]
../src/bootstrap-emacs(eval_sub+0x532)[0x2aa2944cd4a]
../src/bootstrap-emacs(+0x1ce488)[0x2aa2944e488]
../src/bootstrap-emacs(eval_sub+0x532)[0x2aa2944cd4a]
../src/bootstrap-emacs(+0x1cd824)[0x2aa2944d824]
../src/bootstrap-emacs(eval_sub+0x532)[0x2aa2944cd4a]
../src/bootstrap-emacs(+0x1cdc2e)[0x2aa2944dc2e]
../src/bootstrap-emacs(Ffuncall+0x1f2)[0x2aa2944a202]
../src/bootstrap-emacs(+0x1ca4b0)[0x2aa2944a4b0]
../src/bootstrap-emacs(+0x1f90e4)[0x2aa294790e4]
../src/bootstrap-emacs(+0x1f9462)[0x2aa29479462]
../src/bootstrap-emacs(+0x1c9ef0)[0x2aa29449ef0]
../src/bootstrap-emacs(Ffuncall+0x182)[0x2aa2944a192]
/home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln(F636f6d702d2d6e61746976652d636f6d70696c65_comp__native_compile_0+0x804)[0x3ff91d6b0d4]
../src/bootstrap-emacs(Ffuncall+0x23e)[0x2aa2944a24e]
/home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln(F62617463682d6e61746976652d636f6d70696c65_batch_native_compile_0+0x1d2)[0x3ff91d6c592]
../src/bootstrap-emacs(Ffuncall+0x23e)[0x2aa2944a24e]
/home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln(F62617463682d627974652b6e61746976652d636f6d70696c65_batch_bytenative_compile_0+0x108)[0x3ff91d6c728]
../src/bootstrap-emacs(Ffuncall+0x23e)[0x2aa2944a24e]
...
make[2]: *** [Makefile:321: international/titdic-cnv.elc] Segmentation fault
make[2]: Leaving directory '/home/rlb/emacs/lisp'
make[1]: *** [Makefile:845: ../lisp/loaddefs.el] Error 2
make[1]: Leaving directory '/home/rlb/emacs/src'
make: *** [Makefile:449: src] Error 2
[Message part 3 (text/plain, inline)]
The gdb backtrace:

[emacs-s390x-backtrace (text/plain, inline)]
Program received signal SIGSEGV, Segmentation fault.
mark_object (arg=<optimized out>) at alloc.c:6809
6809            if (symbol_marked_p (ptr))
(gdb) backtrace
#0  mark_object (arg=<optimized out>) at alloc.c:6809
#1  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa00ac54a8) at alloc.c:6607
#2  mark_vectorlike (header=0x2aa00ac54a0) at alloc.c:6382
#3  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007f4ca8) at alloc.c:6607
#4  mark_vectorlike (header=0x2aa007f4ca0) at alloc.c:6382
#5  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007c3b10) at alloc.c:6607
#6  mark_vectorlike (header=0x2aa007c3b08) at alloc.c:6382
#7  0x000002aa001a7c3e in visit_static_gc_roots (visitor=...) at alloc.c:5926
#8  0x000002aa001a9094 in garbage_collect () at alloc.c:6132
#9  0x000002aa001a9d0c in maybe_garbage_collect () at alloc.c:6045
#10 0x000002aa001ca2ee in maybe_gc () at lisp.h:5142
#11 Ffuncall (nargs=nargs <at> entry=3, args=args <at> entry=0x3ffffffa6a0) at eval.c:3007
#12 0x000002aa001ca42c in call2 (fn=fn <at> entry=0x155f3675830, arg1=arg1 <at> entry=0x2aa00a75e43, arg2=arg2 <at> entry=0x0) at eval.c:2890
#13 0x000002aa001f0c72 in readevalloop_eager_expand_eval (val=val <at> entry=0x2aa00a75e43, macroexpand=macroexpand <at> entry=0x155f3675830) at lread.c:2133
#14 0x000002aa001f7fb0 in readevalloop (readcharfun=readcharfun <at> entry=0x2aa00aa27b5, infile0=<optimized out>, 
    infile0 <at> entry=0x0, sourcename=sourcename <at> entry=0x2aa00a7fff4, printflag=printflag <at> entry=false, unibyte=unibyte <at> entry=0x0, readfun=0x0, start=0x0, end=<optimized out>) at lread.c:2324
#15 0x000002aa001f8474 in Feval_buffer (buffer=<optimized out>, printflag=0x0, filename=0x2aa00a7fff4, unibyte=0x0, do_allow_print=<optimized out>) at lread.c:2397
#16 0x000002aa001ccdfc in eval_sub (form=<optimized out>) at eval.c:2512
#17 0x000002aa001ce488 in Fprogn (body=0x0) at eval.c:465
#18 Flet (args=0x3b) at eval.c:1051
#19 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#20 0x000002aa001ce488 in Fprogn (body=0x0) at eval.c:465
#21 Flet (args=0x36) at eval.c:1051
#22 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#23 0x000002aa001ce8cc in Funwind_protect (args=0x3fff3cf7f0b) at lisp.h:1420
#24 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#25 0x000002aa001ce488 in Fprogn (body=0x3fff3cf7d6b) at eval.c:465
#26 Flet (args=0x2d) at eval.c:1051
#27 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#28 0x000002aa001cd824 in Fprogn (body=0x0) at eval.c:465
#29 Fif (args=<optimized out>) at eval.c:421
#30 Fif (args=<optimized out>) at eval.c:407
#31 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#32 0x000002aa001cdc2e in Fprogn (body=0x0) at eval.c:465
#33 funcall_lambda (fun=0x3fff3cf7c9b, nargs=nargs <at> entry=4, arg_vector=arg_vector <at> entry=0x3ffffffb650) at eval.c:3305
#34 0x000002aa001ca202 in Ffuncall (nargs=nargs <at> entry=5, args=args <at> entry=0x3ffffffb648) at eval.c:3039
#35 0x000002aa001ca4b0 in call4 (fn=<optimized out>, arg1=arg1 <at> entry=0x2aa00a7fff4, arg2=arg2 <at> entry=0x2aa00a7fff4, arg3=arg3 <at> entry=0x0, arg4=arg4 <at> entry=0x30) at eval.c:2905
#36 0x000002aa001f90e4 in Fload (file=file <at> entry=0x3fff362bcbc, noerror=noerror <at> entry=0x0, nomessage=nomessage <at> entry=0x30, nosuffix=nosuffix <at> entry=0x0, must_suffix=<optimized out>, 
    must_suffix <at> entry=0x30) at lread.c:1473
#37 0x000002aa001f9462 in save_match_data_load (file=0x3fff362bcbc, noerror=noerror <at> entry=0x0, nomessage=nomessage <at> entry=0x30, nosuffix=nosuffix <at> entry=0x0, must_suffix=must_suffix <at> entry=0x30)
    at lread.c:1629
#38 0x000002aa001c9ef0 in Fautoload_do_load (fundef=0x3fff362bc4b, funname=funname <at> entry=0x155f2f7a340, macro_only=macro_only <at> entry=0x0) at eval.c:2295
#39 0x000002aa001ca192 in Ffuncall (nargs=2, args=0x3ffffffbba0) at eval.c:3042
#40 0x000003fff306b0d4 in F636f6d702d2d6e61746976652d636f6d70696c65_comp__native_compile_0 () at /home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln
#41 0x000002aa001ca24e in Ffuncall (nargs=<optimized out>, args=<optimized out>) at lisp.h:2110
#42 0x000003fff306c592 in F62617463682d6e61746976652d636f6d70696c65_batch_native_compile_0 () at /home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln
#43 0x000002aa001ca24e in Ffuncall (nargs=<optimized out>, args=<optimized out>) at lisp.h:2110
#44 0x000003fff306c728 in F62617463682d627974652b6e61746976652d636f6d70696c65_batch_bytenative_compile_0 () at /home/rlb/emacs/native-lisp/28.2-87d45215/comp-7672a6ed-ac6bcf4e.eln
#45 0x000002aa001ca24e in Ffuncall (nargs=<optimized out>, args=<optimized out>) at lisp.h:2110
#46 0x000002aa001ccfc4 in eval_sub (form=<optimized out>) at eval.c:2470
#47 0x000002aa001cd824 in Fprogn (body=0x0) at eval.c:465
#48 Fif (args=<optimized out>) at eval.c:421
#49 Fif (args=<optimized out>) at eval.c:407
#50 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#51 0x000002aa001cd8cc in Fprogn (body=0x0) at eval.c:465
#52 Fcond (args=<optimized out>) at eval.c:445
#53 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#54 0x000002aa001ce732 in Fprogn (body=0x3fff36e1b43) at eval.c:465
#55 FletX (args=0x3fff36e1b03) at eval.c:983
#56 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#57 0x000002aa001cd6ae in Fprogn (body=0x0) at eval.c:465
#58 prog_ignore (body=<optimized out>) at eval.c:476
#59 Fwhile (args=<optimized out>) at eval.c:1072
#60 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#61 0x000002aa001ce732 in Fprogn (body=0x0) at eval.c:465
#62 FletX (args=0x3fff36e1a83) at eval.c:983
#63 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#64 0x000002aa001cd1d6 in Fprogn (body=0x0) at eval.c:465
#65 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#66 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#67 0x000002aa001ce488 in Fprogn (body=0x0) at eval.c:465
#68 Flet (args=0x12) at eval.c:1051
#69 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#70 0x000002aa001ce488 in Fprogn (body=0x3fff35d3a73) at eval.c:465
#71 Flet (args=0xe) at eval.c:1051
#72 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#73 0x000002aa001cdc2e in Fprogn (body=0x0) at eval.c:465
#74 funcall_lambda (fun=0x3fff35d39e3, fun <at> entry=0x3fff35d39d3, nargs=nargs <at> entry=1, arg_vector=arg_vector <at> entry=0x3ffffffd280) at eval.c:3305
#75 0x000002aa001cdf10 in apply_lambda (fun=fun <at> entry=0x3fff35d39d3, args=<optimized out>, count=2929176661299, count <at> entry=15) at eval.c:3172
#76 0x000002aa001cc9d0 in eval_sub (form=<optimized out>) at eval.c:2575
#77 0x000002aa001ce488 in Fprogn (body=0x3fff37a209b) at eval.c:465
#78 Flet (args=0x8) at eval.c:1051
#79 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#80 0x000002aa001cdc2e in Fprogn (body=0x0) at eval.c:465
#81 funcall_lambda (fun=0x3fff37a1e7b, fun <at> entry=0x3fff37a1e6b, nargs=nargs <at> entry=0, arg_vector=arg_vector <at> entry=0x3ffffffd740) at eval.c:3305
#82 0x000002aa001cdf10 in apply_lambda (fun=fun <at> entry=0x3fff37a1e6b, args=<optimized out>, count=2929176221524, count <at> entry=11) at eval.c:3172
#83 0x000002aa001cc9d0 in eval_sub (form=<optimized out>) at eval.c:2575
#84 0x000002aa001ce8cc in Funwind_protect (args=0x3fff380e7a3) at lisp.h:1420
#85 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#86 0x000002aa001ce488 in Fprogn (body=0x0) at eval.c:465
#87 Flet (args=0x3ffffffe658) at eval.c:1051
#88 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#89 0x000002aa001cd824 in Fprogn (body=0x3fff380e233) at eval.c:465
#90 Fif (args=<optimized out>) at eval.c:421
#91 Fif (args=<optimized out>) at eval.c:407
#92 0x000002aa001ccd4a in eval_sub (form=<optimized out>) at eval.c:2451
#93 0x000002aa001cdc2e in Fprogn (body=0x0) at eval.c:465
#94 funcall_lambda (fun=0x3fff380e0e3, fun <at> entry=0x3fff380e0d3, nargs=nargs <at> entry=0, arg_vector=arg_vector <at> entry=0x3ffffffdf88) at eval.c:3305
#95 0x000002aa001cdf10 in apply_lambda (fun=fun <at> entry=0x3fff380e0d3, args=<optimized out>, count=4398046502696, count <at> entry=4) at eval.c:3172
#96 0x000002aa001cc9d0 in eval_sub (form=form <at> entry=0x3fff3f3ef1b) at eval.c:2575
#97 0x000002aa001cee52 in Feval (form=0x3fff3f3ef1b, lexical=<optimized out>) at eval.c:2327
#98 0x000002aa001c8fb6 in internal_condition_case (bfun=bfun <at> entry=0x2aa00142860 <top_level_2>, handlers=handlers <at> entry=0x90, hfun=hfun <at> entry=0x2aa00148ca8 <cmd_error>) at eval.c:1450
#99 0x000002aa001435d2 in top_level_1 (ignore=ignore <at> entry=0x0) at keyboard.c:1150
#100 0x000002aa001c8ed4 in internal_catch (tag=tag <at> entry=0xe850, func=func <at> entry=0x2aa001435a0 <top_level_1>, arg=arg <at> entry=0x0) at eval.c:1181
#101 0x000002aa001427e0 in command_loop () at keyboard.c:1110
#102 0x000002aa001487bc in recursive_edit_1 () at keyboard.c:720
#103 0x000002aa00148bcc in Frecursive_edit () at keyboard.c:803
#104 0x000002aa00051d7a in main (argc=<optimized out>, argv=0x3ffffffea28) at emacs.c:2358
[Message part 5 (text/plain, inline)]
Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Wed, 14 Sep 2022 20:22:01 GMT) Full text and rfc822 format available.

Message #20 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Rob Browning <rlb <at> defaultvalue.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 57789 <at> debbugs.gnu.org
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation
 crashes on s390x
Date: Wed, 14 Sep 2022 15:21:41 -0500
Rob Browning <rlb <at> defaultvalue.org> writes:

> Starting from scratch with the emacs-28.1 commit I can reproduce the
> failure when building via

Oops, meant the emacs-28.2 commit for all of that testing.

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Thu, 15 Sep 2022 07:12:02 GMT) Full text and rfc822 format available.

Message #23 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Rob Browning <rlb <at> defaultvalue.org>, Andrea Corallo <akrl <at> sdf.org>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 57789 <at> debbugs.gnu.org
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation
 crashes on s390x
Date: Thu, 15 Sep 2022 10:10:59 +0300
> From: Rob Browning <rlb <at> defaultvalue.org>
> Cc: 57789 <at> debbugs.gnu.org
> Date: Wed, 14 Sep 2022 15:19:24 -0500
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > Please run the crashing command under GDB, and when it segfaults,
> > produce the C-level and Lisp-level backtrace, and post them here.
> 
> Starting from scratch with the emacs-28.1 commit I can reproduce the
> failure when building via
> 
>   ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation
> 
> It crashes with the same segfault repeatably, i.e. if you run make
> again, it crashes again on the previously mentioned "... -l comp -f
> batch-byte+native-compile international/titdic-cnv.el" invocation.  That
> crash output is attached below.
> 
> After adjusting the Makefile.in invocation so I could run it with gdb in
> exactly the same environment once it's failing on that command, I
> captured the backtrace and included it below.

Thanks.  The backtrace indicates that the crash is in GC.  This
probably means we have some fundamental problem on that architecture.
Andrea, any advice for how to investigate?

Does the build of the same code with the same options sans
"--with-native-compilation" succeed, or does it also crash with
similar symptoms?  If the build without native-compilation succeeds,
my first question would be how mature and stable is libgccjit on that
platform?  Perhaps take this up with the GCC's libgccjit developers.

> With respect to the Lisp-level backtrace, I imagined you probably meant
> an xbacktrace?  If so (and assuming I'm guessing right about how I
> should do that), I haven't figured out how to arrange sourcing the
> src/.gdbinit from the src/Makefile.in command.

You can source it manually from the GDB prompt, when the segfault
happens, and then invoke xbacktrace manually, can't you?

> It looked like it might be because there were no debug symbols, so I
> tried adding a CFLAGS=-g3 to the end of the ./configure, but that caused
> the crash to disappear entirely.

Too bad, it means we have a heisenbug on our hands, which will make it
even harder to debug (as if debugging crashes in GC were not hard
enough already).

What happens if you modify this variable:

  (defcustom native-comp-debug (if (eq 'windows-nt system-type) 1 0)

to have the value 1 or even zero, and then rebuild from scratch? does
the build succeed then?

> Finally (and this was just a random guess based on previous experiences,
> particularly with programs like guile that play (normal, traditional)
> tricks with pointers/coercions/etc.) I noticed that emacs doesn't
> specify -fno-strict-aliasing, and unless all the C code has been written
> with that in mind, I assume that might open a window allowing the
> optimizer to introduce undesirable changes.  So I added a
> CFLAGS=-fno-strict-aliasing to the end of the ./configure command, and
> then the build and tests worked fine (twice in a row):
> 
>   ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation \
>     CFLAGS=-fno-strict-aliasing
> 
> Of course that's not remotely conclusive, but if all of the C code
> wasn't written with strict-aliasing in mind, then I wondered if it might
> make sense to consider adding -fno-strict-aliasing as a default option.

I don't know enough about this.  Perhaps Andrea or Paul could comment.

> Also, even if that ends up being desirable, I'm not sure it'll be
> sufficient.  That is, I suspect I might want to run the full build/check
> with -fno-strict-aliasing in a loop for a bit to make sure the clean
> build/check is reliable, since I think I may have seen some test crashes
> (not the build crash) on one earlier run with that option, but I'm not
> sure that was a clean attempt.

Yes, running the full test suite would be the logical next step.

> Program received signal SIGSEGV, Segmentation fault.
> mark_object (arg=<optimized out>) at alloc.c:6809
> 6809            if (symbol_marked_p (ptr))
> (gdb) backtrace
> #0  mark_object (arg=<optimized out>) at alloc.c:6809

Any idea what cause SIGSEGV here?  Was 'ptr' an invalid pointer for
some reason, and if so, what exactly makes it invalid?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Thu, 15 Sep 2022 14:53:01 GMT) Full text and rfc822 format available.

Message #26 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>, Rob Browning <rlb <at> defaultvalue.org>,
 Andrea Corallo <akrl <at> sdf.org>
Cc: 57789 <at> debbugs.gnu.org
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation crashes
 on s390x
Date: Thu, 15 Sep 2022 09:51:54 -0500
On 9/15/22 02:10, Eli Zaretskii wrote:
>> Of course that's not remotely conclusive, but if all of the C code
>> wasn't written with strict-aliasing in mind, then I wondered if it might
>> make sense to consider adding -fno-strict-aliasing as a default option.
> I don't know enough about this.  Perhaps Andrea or Paul could comment.
>
Throwing -fno-strict-aliasing in the mix is a bit like throwing -O1 into 
the mix. I'm not surprised it would cause a Heisenbug to vanish; it 
doesn't mean strict aliasing is the problem.

Emacs should work with strict aliasing. At least, that's true in the 
default build. I suppose it could be possible there's a strict aliasing 
bug in the native compiler - I'm not that familiar with that code.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Thu, 15 Sep 2022 16:27:02 GMT) Full text and rfc822 format available.

Message #29 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Rob Browning <rlb <at> defaultvalue.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Eli Zaretskii <eliz <at> gnu.org>, Andrea
 Corallo <akrl <at> sdf.org>
Cc: 57789 <at> debbugs.gnu.org
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation
 crashes on s390x
Date: Thu, 15 Sep 2022 11:26:51 -0500
Paul Eggert <eggert <at> cs.ucla.edu> writes:

> Throwing -fno-strict-aliasing in the mix is a bit like throwing -O1 into 
> the mix. I'm not surprised it would cause a Heisenbug to vanish; it 
> doesn't mean strict aliasing is the problem.

Agreed.

Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Fri, 16 Sep 2022 06:05:02 GMT) Full text and rfc822 format available.

Message #32 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Rob Browning <rlb <at> defaultvalue.org>
Cc: 57789 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation
 crashes on s390x
Date: Fri, 16 Sep 2022 08:04:06 +0200
Rob Browning <rlb <at> defaultvalue.org> writes:

> Rob Browning <rlb <at> defaultvalue.org> writes:
>
>> Starting from scratch with the emacs-28.1 commit I can reproduce the
>> failure when building via
>
> Oops, meant the emacs-28.2 commit for all of that testing.

Looking at Rob's backtrace, 

#0  mark_object (arg=<optimized out>) at alloc.c:6809
#1  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa00ac54a8) at alloc.c:6607
#2  mark_vectorlike (header=0x2aa00ac54a0) at alloc.c:6382
#3  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007f4ca8) at alloc.c:6607
#4  mark_vectorlike (header=0x2aa007f4ca0) at alloc.c:6382
#5  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007c3b10) at alloc.c:6607
#6  mark_vectorlike (header=0x2aa007c3b08) at alloc.c:6382
#7  0x000002aa001a7c3e in visit_static_gc_roots (visitor=...) at alloc.c:5926

and seeing frame#7, would it be a way forward to determine which
staticpro (I assume it is a staticpro) that is?  Maybe that can give a
clue which one can then use together with a bisect, perhaps?

WDYT?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Fri, 16 Sep 2022 08:40:01 GMT) Full text and rfc822 format available.

Message #35 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Andrea Corallo <akrl <at> sdf.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 57789 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>,
 Rob Browning <rlb <at> defaultvalue.org>
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation
 crashes on s390x
Date: Fri, 16 Sep 2022 08:39:35 +0000
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Rob Browning <rlb <at> defaultvalue.org>
>> Cc: 57789 <at> debbugs.gnu.org
>> Date: Wed, 14 Sep 2022 15:19:24 -0500
>> 
>> Eli Zaretskii <eliz <at> gnu.org> writes:
>> 
>> > Please run the crashing command under GDB, and when it segfaults,
>> > produce the C-level and Lisp-level backtrace, and post them here.
>> 
>> Starting from scratch with the emacs-28.1 commit I can reproduce the
>> failure when building via
>> 
>>   ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation
>> 
>> It crashes with the same segfault repeatably, i.e. if you run make
>> again, it crashes again on the previously mentioned "... -l comp -f
>> batch-byte+native-compile international/titdic-cnv.el" invocation.  That
>> crash output is attached below.
>> 
>> After adjusting the Makefile.in invocation so I could run it with gdb in
>> exactly the same environment once it's failing on that command, I
>> captured the backtrace and included it below.
>
> Thanks.  The backtrace indicates that the crash is in GC.  This
> probably means we have some fundamental problem on that architecture.
> Andrea, any advice for how to investigate?

Mmmh one cheap way to maybe gather more info is to have a run under
valgrind.

Other than that I typically start debugging with GDB and possibly
rr. Like what is (or was) the object the GC is crashing on?  Why?
What's the last piece of code that touched it? Why?  IIUC here we have
no debug symbols so this makes it very difficult.

BTW the fact that -g has an impact on the crash is very odd

  Andrea





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Fri, 16 Sep 2022 08:44:02 GMT) Full text and rfc822 format available.

Message #38 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Andrea Corallo <akrl <at> sdf.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 57789 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 Rob Browning <rlb <at> defaultvalue.org>
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation
 crashes on s390x
Date: Fri, 16 Sep 2022 08:43:37 +0000
Paul Eggert <eggert <at> cs.ucla.edu> writes:

> On 9/15/22 02:10, Eli Zaretskii wrote:
>>> Of course that's not remotely conclusive, but if all of the C code
>>> wasn't written with strict-aliasing in mind, then I wondered if it might
>>> make sense to consider adding -fno-strict-aliasing as a default option.
>> I don't know enough about this.  Perhaps Andrea or Paul could comment.
>>
> Throwing -fno-strict-aliasing in the mix is a bit like throwing -O1
> into the mix. I'm not surprised it would cause a Heisenbug to vanish;
> it doesn't mean strict aliasing is the problem.

Hi Paul,

totally agree with you.  The fact that even -g has an impact here
clearly shows that initial conditions are not necessary directly
connected with the final symptom we observe.

Best Regards

  Andrea




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Sat, 17 Sep 2022 21:01:02 GMT) Full text and rfc822 format available.

Message #41 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Rob Browning <rlb <at> defaultvalue.org>
To: Eli Zaretskii <eliz <at> gnu.org>, Andrea Corallo <akrl <at> sdf.org>, Paul Eggert
 <eggert <at> cs.ucla.edu>
Cc: 57789 <at> debbugs.gnu.org
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation
 crashes on s390x
Date: Sat, 17 Sep 2022 16:00:17 -0500
Eli Zaretskii <eliz <at> gnu.org> writes:

> Rob Browning <rlb <at> defaultvalue.org> writes:

> Does the build of the same code with the same options sans
> "--with-native-compilation" succeed, or does it also crash with
> similar symptoms?

Works fine.

> You can source it manually from the GDB prompt, when the segfault
> happens, and then invoke xbacktrace manually, can't you?

Yep.

  Breakpoint 1 at 0x2aa0004ef30: file emacs.c, line 400.
  Breakpoint 2 at 0x2aa0010f168: file xterm.c, line 10291.
  (gdb) xbacktrace
  "Automatic GC" (0x0)
  "internal-macroexpand-for-load" (0xffffa6a8)
  "eval-buffer" (0xffffaa28)
  "let" (0xffffac10)
  "let" (0xffffae28)
  "unwind-protect" (0xffffaff0)
  "let" (0xffffb1f8)
  "if" (0xffffb3c8)
  "load-with-code-conversion" (0xffffb650)
  "time-since" (0xffffbba8)
  "comp--native-compile" (0xffffbd38)
  "batch-native-compile" (0xffffbef0)
  "batch-byte+native-compile" (0xffffc080)
  "funcall" (0xffffc078)
  "if" (0xffffc268)
  "cond" (0xffffc438)
  "let*" (0xffffc618)
  "while" (0xffffc7e8)
  "let*" (0xffffc9c8)
  "progn" (0xffffcb98)
  "if" (0xffffccc0)
  "let" (0xffffceb8)
  "let" (0xffffd0b0)
  "command-line-1" (0xffffd280)
  "let" (0xffffd570)
  "command-line" (0xffffd740)
  "unwind-protect" (0xffffd9f0)
  "let" (0xffffdbe8)
  "if" (0xffffddb8)
  "normal-top-level" (0xffffdf88)

> Too bad, it means we have a heisenbug on our hands, which will make it
> even harder to debug (as if debugging crashes in GC were not hard
> enough already).
>
> What happens if you modify this variable:
>
>   (defcustom native-comp-debug (if (eq 'windows-nt system-type) 1 0)
>
> to have the value 1 or even zero, and then rebuild from scratch? does
> the build succeed then?

No, appears to crash in the same way.

> Yes, running the full test suite would be the logical next step.

Oh, I had run it, I just meant that I'd likely want to double-check via
testing in a loop to try to see if it might be an intermittent failure.

Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Sat, 17 Sep 2022 21:05:02 GMT) Full text and rfc822 format available.

Message #44 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Rob Browning <rlb <at> defaultvalue.org>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: 57789 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation
 crashes on s390x
Date: Sat, 17 Sep 2022 16:04:31 -0500
Gerd Möllmann <gerd.moellmann <at> gmail.com> writes:

> Looking at Rob's backtrace, 
>
> #0  mark_object (arg=<optimized out>) at alloc.c:6809
> #1  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa00ac54a8) at alloc.c:6607
> #2  mark_vectorlike (header=0x2aa00ac54a0) at alloc.c:6382
> #3  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007f4ca8) at alloc.c:6607
> #4  mark_vectorlike (header=0x2aa007f4ca0) at alloc.c:6382
> #5  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007c3b10) at alloc.c:6607
> #6  mark_vectorlike (header=0x2aa007c3b08) at alloc.c:6382
> #7  0x000002aa001a7c3e in visit_static_gc_roots (visitor=...) at alloc.c:5926
>
> and seeing frame#7, would it be a way forward to determine which
> staticpro (I assume it is a staticpro) that is?  Maybe that can give a
> clue which one can then use together with a bisect, perhaps?

Not completely sure I followed, but moving up to that frame and printing
visitor didn't work: "optimized out".

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Sun, 18 Sep 2022 05:23:02 GMT) Full text and rfc822 format available.

Message #47 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Rob Browning <rlb <at> defaultvalue.org>
Cc: 57789 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation
 crashes on s390x
Date: Sun, 18 Sep 2022 07:22:45 +0200
Rob Browning <rlb <at> defaultvalue.org> writes:

> Gerd Möllmann <gerd.moellmann <at> gmail.com> writes:
>
>> Looking at Rob's backtrace, 
>>
>> #0  mark_object (arg=<optimized out>) at alloc.c:6809
>> #1  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa00ac54a8) at alloc.c:6607
>> #2  mark_vectorlike (header=0x2aa00ac54a0) at alloc.c:6382
>> #3  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007f4ca8) at alloc.c:6607
>> #4  mark_vectorlike (header=0x2aa007f4ca0) at alloc.c:6382
>> #5  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007c3b10) at alloc.c:6607
>> #6  mark_vectorlike (header=0x2aa007c3b08) at alloc.c:6382
>> #7  0x000002aa001a7c3e in visit_static_gc_roots (visitor=...) at alloc.c:5926
>>
>> and seeing frame#7, would it be a way forward to determine which
>> staticpro (I assume it is a staticpro) that is?  Maybe that can give a
>> clue which one can then use together with a bisect, perhaps?
>
> Not completely sure I followed, but moving up to that frame and printing
> visitor didn't work: "optimized out".

Sorry, I thought another Emacs developer would chime in, when I wrote
that.

Let me try to explain what I'm after.  Frame#7, the call to
visit_static_gc_roots shows that we are at the very beginning of a GC,
recursively marking everything that we know must survice the GC.

void
visit_static_gc_roots (struct gc_root_visitor visitor)
{
  visit_buffer_root (visitor,
                     &buffer_defaults,
                     GC_ROOT_BUFFER_LOCAL_DEFAULT);
  visit_buffer_root (visitor,
                     &buffer_local_symbols,
                     GC_ROOT_BUFFER_LOCAL_NAME);

  for (int i = 0; i < ARRAYELTS (lispsym); i++)
    {
      Lisp_Object sptr = builtin_lisp_symbol (i);
      visitor.visit (&sptr, GC_ROOT_C_SYMBOL, visitor.data);
    }

  for (int i = 0; i < staticidx; i++)
    visitor.visit (staticvec[i], GC_ROOT_STATICPRO, visitor.data);
}

First interesting thing would be where in this function we are when the
crash happens.  I was assuming it is somewhere in the last for-loop, for
reasons, but that doesn't have to be the case.

If I'm right, we are currently in the process of marking Lisp objects
referenced from C variables that are known to contains Lisp objects.
Such variables are added to staticvec with a call to staticpro.  That's
what the staticpro in my last mail menat.

But let's first see where in visit_... we are.






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Sun, 18 Sep 2022 05:34:02 GMT) Full text and rfc822 format available.

Message #50 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Rob Browning <rlb <at> defaultvalue.org>
Cc: gerd.moellmann <at> gmail.com, 57789 <at> debbugs.gnu.org
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation
 crashes on s390x
Date: Sun, 18 Sep 2022 08:33:08 +0300
> From: Rob Browning <rlb <at> defaultvalue.org>
> Cc: Eli Zaretskii <eliz <at> gnu.org>, 57789 <at> debbugs.gnu.org
> Date: Sat, 17 Sep 2022 16:04:31 -0500
> 
> Gerd Möllmann <gerd.moellmann <at> gmail.com> writes:
> 
> > Looking at Rob's backtrace, 
> >
> > #0  mark_object (arg=<optimized out>) at alloc.c:6809
> > #1  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa00ac54a8) at alloc.c:6607
> > #2  mark_vectorlike (header=0x2aa00ac54a0) at alloc.c:6382
> > #3  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007f4ca8) at alloc.c:6607
> > #4  mark_vectorlike (header=0x2aa007f4ca0) at alloc.c:6382
> > #5  0x000002aa001a8ee6 in mark_objects (n=<optimized out>, obj=0x2aa007c3b10) at alloc.c:6607
> > #6  mark_vectorlike (header=0x2aa007c3b08) at alloc.c:6382
> > #7  0x000002aa001a7c3e in visit_static_gc_roots (visitor=...) at alloc.c:5926
> >
> > and seeing frame#7, would it be a way forward to determine which
> > staticpro (I assume it is a staticpro) that is?  Maybe that can give a
> > clue which one can then use together with a bisect, perhaps?
> 
> Not completely sure I followed, but moving up to that frame and printing
> visitor didn't work: "optimized out".

The code where this happens is this:

  for (int i = 0; i < staticidx; i++)
    visitor.visit (staticvec[i], GC_ROOT_STATICPRO, visitor.data);

So one way of knowing which staticpro is being handled here is to see
what is the value of 'i' and look at staticvec[i].  I'm guessing that
'i' is also "optimized out", though, so 2 possible ways forward:

  . disassemble visit_static_gc_roots, find in which register or where
    on the stack or in memory is 'i; or staticvec[i] stored, and go
    from there; or
  . add a printf to the above loop to show the value of 'i', and
    re-run the build, fingers crossed, hoping that the additional
    printf won't make the crash go away.

Once you know which staticpro is being processed here, we'd need to
examine its contents and try to figure out which parts cause the crash
in GC.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Sun, 18 Sep 2022 05:50:01 GMT) Full text and rfc822 format available.

Message #53 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gerd Möllmann <gerd.moellmann <at> gmail.com>
Cc: 57789 <at> debbugs.gnu.org, rlb <at> defaultvalue.org
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation
 crashes on s390x
Date: Sun, 18 Sep 2022 08:49:02 +0300
> From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
> Cc: Eli Zaretskii <eliz <at> gnu.org>,  57789 <at> debbugs.gnu.org
> Date: Sun, 18 Sep 2022 07:22:45 +0200
> 
> But let's first see where in visit_... we are.

I think the backtrace tells that, if you look at the sources from the
emacs-28 branch.  See my other message.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Sun, 18 Sep 2022 05:56:02 GMT) Full text and rfc822 format available.

Message #56 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 57789 <at> debbugs.gnu.org, rlb <at> defaultvalue.org
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation crashes
 on s390x
Date: Sun, 18 Sep 2022 07:55:19 +0200
On 22-09-18 7:49 , Eli Zaretskii wrote:
>> From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
>> Cc: Eli Zaretskii <eliz <at> gnu.org>,  57789 <at> debbugs.gnu.org
>> Date: Sun, 18 Sep 2022 07:22:45 +0200
>>
>> But let's first see where in visit_... we are.
> 
> I think the backtrace tells that, if you look at the sources from the
> emacs-28 branch.  See my other message.

Ah, right, visit_buffer_root.  EINSUFFICIENTCOFFEE.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Sat, 24 Sep 2022 21:07:02 GMT) Full text and rfc822 format available.

Message #59 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Rob Browning <rlb <at> defaultvalue.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: gerd.moellmann <at> gmail.com, 57789 <at> debbugs.gnu.org
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation
 crashes on s390x
Date: Sat, 24 Sep 2022 16:06:06 -0500
Eli Zaretskii <eliz <at> gnu.org> writes:

> Once you know which staticpro is being processed here, we'd need to
> examine its contents and try to figure out which parts cause the crash
> in GC.

Thanks, and I'll try to look in to this further when I have time.  For
now I'm changing the debian packages to avoid native compilation on some
architectures (currently mips64el[1] and s390x).

[1] There ./configure fails at the moment with "Error: -march=mips1 is
    not compatible with the selected ABI" when testing libgccjit.
    That's on eller.debian.org (mipsel host in a mips64el schroot).

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Wed, 07 Jun 2023 21:16:02 GMT) Full text and rfc822 format available.

Message #62 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Andrea Corallo <acorallo <at> gnu.org>
To: Rob Browning <rlb <at> defaultvalue.org>
Cc: gerd.moellmann <at> gmail.com, 57789 <at> debbugs.gnu.org,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation
 crashes on s390x
Date: Wed, 07 Jun 2023 17:15:38 -0400
Rob Browning <rlb <at> defaultvalue.org> writes:

> Eli Zaretskii <eliz <at> gnu.org> writes:
>
>> Once you know which staticpro is being processed here, we'd need to
>> examine its contents and try to figure out which parts cause the crash
>> in GC.
>
> Thanks, and I'll try to look in to this further when I have time.  For
> now I'm changing the debian packages to avoid native compilation on some
> architectures (currently mips64el[1] and s390x).
>
> [1] There ./configure fails at the moment with "Error: -march=mips1 is
>     not compatible with the selected ABI" when testing libgccjit.
>     That's on eller.debian.org (mipsel host in a mips64el schroot).

Hi Rob,

any progress with this investigation?  Is the bug still reproducible
with a recent codebase?

Thanks

  Andrea




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#57789; Package emacs. (Mon, 11 Sep 2023 18:09:02 GMT) Full text and rfc822 format available.

Message #65 received at 57789 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Andrea Corallo <acorallo <at> gnu.org>, Rob Browning <rlb <at> defaultvalue.org>
Cc: gerd.moellmann <at> gmail.com, 57789 <at> debbugs.gnu.org,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#57789: Emacs 28.1 clone build with native compilation crashes
 on s390x
Date: Mon, 11 Sep 2023 11:08:13 -0700
tags 57789 + moreinfo
thanks

Andrea Corallo <acorallo <at> gnu.org> writes:

> Rob Browning <rlb <at> defaultvalue.org> writes:
>
>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>
>>> Once you know which staticpro is being processed here, we'd need to
>>> examine its contents and try to figure out which parts cause the crash
>>> in GC.
>>
>> Thanks, and I'll try to look in to this further when I have time.  For
>> now I'm changing the debian packages to avoid native compilation on some
>> architectures (currently mips64el[1] and s390x).
>>
>> [1] There ./configure fails at the moment with "Error: -march=mips1 is
>>     not compatible with the selected ABI" when testing libgccjit.
>>     That's on eller.debian.org (mipsel host in a mips64el schroot).
>
> Hi Rob,
>
> any progress with this investigation?  Is the bug still reproducible
> with a recent codebase?

Ping.  Rob, any updates here?




Added tag(s) moreinfo. Request was from Stefan Kangas <stefankangas <at> gmail.com> to control <at> debbugs.gnu.org. (Mon, 11 Sep 2023 18:09:02 GMT) Full text and rfc822 format available.

This bug report was last modified 237 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.