GNU bug report logs - #58956
mark_object, mark_objects(?) crash

Previous Next

Package: emacs;

Reported by: Sean Whitton <spwhitton <at> spwhitton.name>

Date: Wed, 2 Nov 2022 01:34:02 UTC

Severity: normal

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 58956 in the body.
You can then email your comments to 58956 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Wed, 02 Nov 2022 01:34:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Sean Whitton <spwhitton <at> spwhitton.name>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 02 Nov 2022 01:34:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Sean Whitton <spwhitton <at> spwhitton.name>
To: bug-gnu-emacs <at> gnu.org
Subject: mark_object, mark_objects(?) crash
Date: Tue, 01 Nov 2022 18:33:38 -0700
[Message part 1 (text/plain, inline)]
Hello,

A Debian user has reported a crash with Emacs 28.  I'm attaching the
backtrace he provided.  We currently have the two recent trampoline fork
bomb patches from Andreas applied; I don't think any of our other
patches are relevant.

<https://bugs.debian.org/1017711>

-- 
Sean Whitton
[gdb.txt (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Wed, 02 Nov 2022 12:26:02 GMT) Full text and rfc822 format available.

Message #8 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Sean Whitton <spwhitton <at> spwhitton.name>
Cc: 58956 <at> debbugs.gnu.org
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Wed, 02 Nov 2022 14:24:51 +0200
> From: Sean Whitton <spwhitton <at> spwhitton.name>
> Date: Tue, 01 Nov 2022 18:33:38 -0700
> 
> Thread 1 (Thread 0x7f5b914cb380 (LWP 35005)):
> #0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo <at> entry=6, no_tid=no_tid <at> entry=0) at ./nptl/pthread_kill.c:44
>         tid = <optimized out>
>         ret = 0
>         pd = <optimized out>
>         old_mask = {__val = {0 <repeats 16 times>}}
>         ret = <optimized out>
> #1  0x00007f5b922895df in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
> #2  0x00007f5b9223da02 in __GI_raise (sig=sig <at> entry=6) at ../sysdeps/posix/raise.c:26
>         ret = <optimized out>
> #3  0x000055fdfd114864 in terminate_due_to_signal (sig=sig <at> entry=6, backtrace_limit=backtrace_limit <at> entry=40) at ./debian/build-src/src/emacs.c:437
> #4  0x000055fdfd114d27 in emacs_abort () at ./debian/build-src/src/sysdep.c:2282
> #5  0x000055fdfd111e99 in check_message_stack () at ./debian/build-src/src/xdisp.c:12157
> #6  0x000055fdfd20492a in shut_down_emacs (sig=0, stuff=0x0) at ./debian/build-src/src/emacs.c:2789
>         tpgrp = <optimized out>
> #7  0x000055fdfd114765 in Fkill_emacs (arg=arg <at> entry=0x6) at ./debian/build-src/src/emacs.c:2692
>         exit_code = <optimized out>
> #8  0x000055fdfd114827 in terminate_due_to_signal (sig=sig <at> entry=1, backtrace_limit=backtrace_limit <at> entry=40) at ./debian/build-src/src/emacs.c:417
> #9  0x000055fdfd114d00 in handle_fatal_signal (sig=sig <at> entry=1) at ./debian/build-src/src/sysdep.c:1762
> #10 0x000055fdfd114d17 in deliver_process_signal (handler=0x55fdfd114cf5 <handle_fatal_signal>, sig=1) at ./debian/build-src/src/sysdep.c:1720
>         old_errno = 2
>         on_main_thread = true
> #11 deliver_fatal_signal (sig=1) at ./debian/build-src/src/sysdep.c:1768
> #12 0x00007f5b9223daa0 in <signal handler called> () at /lib/x86_64-linux-gnu/libc.so.6
> #13 0x000055fdfd2641c3 in mark_object (arg=0x295d90329a68) at ./debian/build-src/src/alloc.c:6628
>         obj = 0x295d90329a68
>         po = <optimized out>
>         cdr_count = 0

Signal 1 is SIGHUP, AFAIU.  Why should Emacs receive SIGHUP in the
middle of GC, I have no idea.  Maybe ask the user what was he doing at
that time.  E.g., could that be a remote Emacs session?

Other than that, I don't see what can be done with this data: this is
an optimized build, and there's no real data in the backtrace to begin
considering why would SIGHUP happen.  Sorry.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Wed, 02 Nov 2022 21:44:02 GMT) Full text and rfc822 format available.

Message #11 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Sean Whitton <spwhitton <at> spwhitton.name>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: 58956 <at> debbugs.gnu.org, 1017711 <at> bugs.debian.org
Subject: Re: Bug#1017711: emacs-gtk: terminated with signal SIGABRT, 137 MB
 coredump
Date: Wed, 02 Nov 2022 14:43:41 -0700
[Message part 1 (text/plain, inline)]
Hello Vincent,

Upstream says there isn't enough information in the backtrace to say
anything helpful about this.  Could you take a look at
<https://debbugs.gnu.org/58956> and consider supplying more information
over there, please?

Also, are you able to reproduce this with 'emacs -q' (not -Q)?

Currently we have no way to reproduce this, and you're the only person
who's seen anything like this, so we'll probably have to move the
severity back down to 'important'.

-- 
Sean Whitton
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Thu, 03 Nov 2022 03:01:02 GMT) Full text and rfc822 format available.

Message #14 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Vincent Lefevre <vincent <at> vinc17.net>
To: Sean Whitton <spwhitton <at> spwhitton.name>
Cc: 58956 <at> debbugs.gnu.org, 1017711 <at> bugs.debian.org
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Thu, 3 Nov 2022 04:00:46 +0100
On 2022-11-02 14:24:51 +0200, Eli Zaretskii wrote:
> Signal 1 is SIGHUP, AFAIU.  Why should Emacs receive SIGHUP in the
> middle of GC, I have no idea.  Maybe ask the user what was he doing at
> that time.  E.g., could that be a remote Emacs session?

No, it is on my local machine.

On 2022-11-02 14:43:41 -0700, Sean Whitton wrote:
> Upstream says there isn't enough information in the backtrace to say
> anything helpful about this.  Could you take a look at
> <https://debbugs.gnu.org/58956> and consider supplying more information
> over there, please?
> 
> Also, are you able to reproduce this with 'emacs -q' (not -Q)?

This is not reproducible with "emacs -q".

I can reproduce it in a firejail private directory[*] (so that the
behavior doesn't depend on my own config files), where there is no
.emacs file. There is a .emacs.d directory with just a eln-cache
subdirectory:

zira% ls -la .emacs.d 
total 12
drwx------ 3 vinc17 vinc17 4096 2022-11-01 00:40:05 .
drwx------ 4 vinc17 vinc17 4096 2022-11-03 03:53:23 ..
drwxr-xr-x 3 vinc17 vinc17 4096 2022-11-01 00:40:05 eln-cache
zira% ls -la .emacs.d/eln-cache 
total 12
drwxr-xr-x 3 vinc17 vinc17 4096 2022-11-01 00:40:05 .
drwx------ 3 vinc17 vinc17 4096 2022-11-01 00:40:05 ..
drwxr-xr-x 2 vinc17 vinc17 4096 2022-11-01 00:40:05 28.2-43f520ab
zira% ls -la .emacs.d/eln-cache/28.2-43f520ab 
total 8
drwxr-xr-x 2 vinc17 vinc17 4096 2022-11-01 00:40:05 .
drwxr-xr-x 3 vinc17 vinc17 4096 2022-11-01 00:40:05 ..
zira% 

[*] firejail --ignore=read-only --ignore='noexec ${HOME}' --noblacklist='${HOME}/*' --private=fj-dir zsh

I run emacs, and quit it immediately. The generation of the core dump
is almost 100% reproducible. Ditto with "emacs -nw".

But note that the bug is also reproducible without firejail, but
harder to reproduce.

-- 
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Thu, 03 Nov 2022 06:48:06 GMT) Full text and rfc822 format available.

Message #17 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: 58956 <at> debbugs.gnu.org, 1017711 <at> bugs.debian.org, spwhitton <at> spwhitton.name
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Thu, 03 Nov 2022 08:47:06 +0200
> Cc: 58956 <at> debbugs.gnu.org, 1017711 <at> bugs.debian.org
> Date: Thu, 3 Nov 2022 04:00:46 +0100
> From: Vincent Lefevre <vincent <at> vinc17.net>
> 
> On 2022-11-02 14:24:51 +0200, Eli Zaretskii wrote:
> > Signal 1 is SIGHUP, AFAIU.  Why should Emacs receive SIGHUP in the
> > middle of GC, I have no idea.  Maybe ask the user what was he doing at
> > that time.  E.g., could that be a remote Emacs session?
> 
> No, it is on my local machine.

So how come Emacs gets a SIGHUP?  This is the crucial detail that is
missing here.  Basically, if SIGHUP is delivered to Emacs, Emacs is
supposed to die a violent death.

> I run emacs, and quit it immediately. The generation of the core dump
> is almost 100% reproducible. Ditto with "emacs -nw".

Wait, you mean the crash is during exiting Emacs?  That could mean
Emacs receives some input event when it's half-way through the
shutdown process, and the input descriptor is already closed.

But the backtrace you posted shows SIGHUP during GC, which is AFAIU a
very different case.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Thu, 03 Nov 2022 10:14:01 GMT) Full text and rfc822 format available.

Message #20 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Vincent Lefevre <vincent <at> vinc17.net>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 58956 <at> debbugs.gnu.org, 1017711 <at> bugs.debian.org, spwhitton <at> spwhitton.name
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Thu, 3 Nov 2022 11:13:08 +0100
On 2022-11-03 08:47:06 +0200, Eli Zaretskii wrote:
> > On 2022-11-02 14:24:51 +0200, Eli Zaretskii wrote:
> > > Signal 1 is SIGHUP, AFAIU.  Why should Emacs receive SIGHUP in the
> > > middle of GC, I have no idea.  Maybe ask the user what was he doing at
> > > that time.  E.g., could that be a remote Emacs session?
> > 
> > No, it is on my local machine.
> 
> So how come Emacs gets a SIGHUP?  This is the crucial detail that is
> missing here.  Basically, if SIGHUP is delivered to Emacs, Emacs is
> supposed to die a violent death.

I suspect the SIGHUP comes from Emacs itself. According to strace
output, the only processes started by Emacs are "/usr/bin/emacs"
(there are many of them). I don't see what other process could be
aware of the situation. Unfortunately, I couldn't reproduce the
issue with strace (I suspect some race condition).

> > I run emacs, and quit it immediately. The generation of the core dump
> > is almost 100% reproducible. Ditto with "emacs -nw".
> 
> Wait, you mean the crash is during exiting Emacs?

For this test, yes. In general, I don't know.

> That could mean Emacs receives some input event when it's half-way
> through the shutdown process, and the input descriptor is already
> closed.

Note that the process that crashes is not the Emacs I started,
but a subprocess run by Emacs itself, since it has arguments like
"-no-comp-spawn --batch -l /tmp/emacs-async-comp-url.el-FGov4z.el".
However, it also happened that the Emacs I started immediately
crashed (this occurred only once, though).

-- 
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Thu, 03 Nov 2022 10:28:01 GMT) Full text and rfc822 format available.

Message #23 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Vincent Lefevre <vincent <at> vinc17.net>, Andrea Corallo <akrl <at> sdf.org>
Cc: 58956 <at> debbugs.gnu.org, 1017711 <at> bugs.debian.org, spwhitton <at> spwhitton.name
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Thu, 03 Nov 2022 12:27:01 +0200
> Date: Thu, 3 Nov 2022 11:13:08 +0100
> From: Vincent Lefevre <vincent <at> vinc17.net>
> Cc: spwhitton <at> spwhitton.name, 58956 <at> debbugs.gnu.org,
> 	1017711 <at> bugs.debian.org
> 
> On 2022-11-03 08:47:06 +0200, Eli Zaretskii wrote:
> > > On 2022-11-02 14:24:51 +0200, Eli Zaretskii wrote:
> > > > Signal 1 is SIGHUP, AFAIU.  Why should Emacs receive SIGHUP in the
> > > > middle of GC, I have no idea.  Maybe ask the user what was he doing at
> > > > that time.  E.g., could that be a remote Emacs session?
> > > 
> > > No, it is on my local machine.
> > 
> > So how come Emacs gets a SIGHUP?  This is the crucial detail that is
> > missing here.  Basically, if SIGHUP is delivered to Emacs, Emacs is
> > supposed to die a violent death.
> 
> I suspect the SIGHUP comes from Emacs itself. According to strace
> output, the only processes started by Emacs are "/usr/bin/emacs"
> (there are many of them). I don't see what other process could be
> aware of the situation. Unfortunately, I couldn't reproduce the
> issue with strace (I suspect some race condition).
> 
> > > I run emacs, and quit it immediately. The generation of the core dump
> > > is almost 100% reproducible. Ditto with "emacs -nw".
> > 
> > Wait, you mean the crash is during exiting Emacs?
> 
> For this test, yes. In general, I don't know.
> 
> > That could mean Emacs receives some input event when it's half-way
> > through the shutdown process, and the input descriptor is already
> > closed.
> 
> Note that the process that crashes is not the Emacs I started,
> but a subprocess run by Emacs itself, since it has arguments like
> "-no-comp-spawn --batch -l /tmp/emacs-async-comp-url.el-FGov4z.el".

Andrea, could you please look into this?  The SIGHUP could be because
the parent process exits, but that shouldn't cause a crash in the
sub-process that performs native compilation?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Thu, 03 Nov 2022 21:26:02 GMT) Full text and rfc822 format available.

Message #26 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Andrea Corallo <akrl <at> sdf.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 58956 <at> debbugs.gnu.org, Vincent Lefevre <vincent <at> vinc17.net>,
 1017711 <at> bugs.debian.org, spwhitton <at> spwhitton.name
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Thu, 03 Nov 2022 21:25:08 +0000
Eli Zaretskii <eliz <at> gnu.org> writes:

>> Date: Thu, 3 Nov 2022 11:13:08 +0100
>> From: Vincent Lefevre <vincent <at> vinc17.net>
>> Cc: spwhitton <at> spwhitton.name, 58956 <at> debbugs.gnu.org,
>> 	1017711 <at> bugs.debian.org
>> 
>> On 2022-11-03 08:47:06 +0200, Eli Zaretskii wrote:
>> > > On 2022-11-02 14:24:51 +0200, Eli Zaretskii wrote:
>> > > > Signal 1 is SIGHUP, AFAIU.  Why should Emacs receive SIGHUP in the
>> > > > middle of GC, I have no idea.  Maybe ask the user what was he doing at
>> > > > that time.  E.g., could that be a remote Emacs session?
>> > > 
>> > > No, it is on my local machine.
>> > 
>> > So how come Emacs gets a SIGHUP?  This is the crucial detail that is
>> > missing here.  Basically, if SIGHUP is delivered to Emacs, Emacs is
>> > supposed to die a violent death.
>> 
>> I suspect the SIGHUP comes from Emacs itself. According to strace
>> output, the only processes started by Emacs are "/usr/bin/emacs"
>> (there are many of them). I don't see what other process could be
>> aware of the situation. Unfortunately, I couldn't reproduce the
>> issue with strace (I suspect some race condition).
>> 
>> > > I run emacs, and quit it immediately. The generation of the core dump
>> > > is almost 100% reproducible. Ditto with "emacs -nw".
>> > 
>> > Wait, you mean the crash is during exiting Emacs?
>> 
>> For this test, yes. In general, I don't know.
>> 
>> > That could mean Emacs receives some input event when it's half-way
>> > through the shutdown process, and the input descriptor is already
>> > closed.
>> 
>> Note that the process that crashes is not the Emacs I started,
>> but a subprocess run by Emacs itself, since it has arguments like
>> "-no-comp-spawn --batch -l /tmp/emacs-async-comp-url.el-FGov4z.el".
>
> Andrea, could you please look into this?  The SIGHUP could be because
> the parent process exits, but that shouldn't cause a crash in the
> sub-process that performs native compilation?

Hi Eli,

AFAIU the Emacs subprocess we use to compile should behave like a
regular Emacs.

Now, the only option that comes to my mind is that libgccjit (being
strictly derived from the GCC codebase) might be registering a signal
handler of some kind that alters the behaviour we expect.  But if this
is the case we should find trace of it the strace, or we can use gdb
setting a break point into 'signal' as well to check.

Indeed if this theory is true I think should be classified as a
libgccjit bug.

  Andrea




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Fri, 04 Nov 2022 07:02:01 GMT) Full text and rfc822 format available.

Message #29 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andrea Corallo <akrl <at> sdf.org>, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 58956 <at> debbugs.gnu.org, vincent <at> vinc17.net, 1017711 <at> bugs.debian.org,
 spwhitton <at> spwhitton.name
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Fri, 04 Nov 2022 09:00:54 +0200
> From: Andrea Corallo <akrl <at> sdf.org>
> Cc: Vincent Lefevre <vincent <at> vinc17.net>, spwhitton <at> spwhitton.name,
>         58956 <at> debbugs.gnu.org, 1017711 <at> bugs.debian.org
> Date: Thu, 03 Nov 2022 21:25:08 +0000
> 
> AFAIU the Emacs subprocess we use to compile should behave like a
> regular Emacs.

Basically, you are saying that if the sub-process that runs
async-compilation gets SIGHUP, it should abort and dump core, like a
normal Emacs session does, right?

The backtrace posted to the Debian bug tracker, here:

  https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1017711;filename=gdb.txt;msg=5

indicates that Emacs was in the middle of comp-copy-insn which was
called from comp-fwprop.  Then Emacs performed GC, and SIGHUP was
received during GC.  IOW, we were in our Lisp code, not in a libgccjit
code, when the signal arrived.

Another backtrace, posted here:

  https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1017711;filename=gdb.txt;msg=45

tells a somewhat different story: it doesn't show Emacs in the middle
of a native compilation, but just inside substitute-command-keys that
was called from command-line.

> Now, the only option that comes to my mind is that libgccjit (being
> strictly derived from the GCC codebase) might be registering a signal
> handler of some kind that alters the behaviour we expect.  But if this
> is the case we should find trace of it the strace, or we can use gdb
> setting a break point into 'signal' as well to check.
> 
> Indeed if this theory is true I think should be classified as a
> libgccjit bug.

I don't think it's true, see above.

Paul, can you help here, please?  We need to establish what is the
source of SIGHUP in these cases.  "These cases" mean, AFAIU, the
situations where Emacs launched an async subprocess to do native
compilation (which is another Emacs process in a --batch session), and
the parent Emacs session is terminated by the user before the async
compilation runs to completion.  Would the child Emacs process get
SIGHUP in this scenario?  If yes, then I think we should treat SIGHUP
differently in non-interactive invocations: instead of dumping core,
we should catch the signal and exit with a non-zero exit status.

Does this make sense?

Andrea, if we do the above as I suggest, is there any cleanup that we
need to do before exiting?  For example, what if the subprocess that
does the async compilation already started writing the .eln file when
the signal arrives?  What do we do today when the parent interactive
Emacs is terminated by the user?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Fri, 04 Nov 2022 21:05:01 GMT) Full text and rfc822 format available.

Message #32 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Andrea Corallo <akrl <at> sdf.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 58956 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>, vincent <at> vinc17.net,
 1017711 <at> bugs.debian.org, spwhitton <at> spwhitton.name
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Fri, 04 Nov 2022 21:03:47 +0000
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Andrea Corallo <akrl <at> sdf.org>
>> Cc: Vincent Lefevre <vincent <at> vinc17.net>, spwhitton <at> spwhitton.name,
>>         58956 <at> debbugs.gnu.org, 1017711 <at> bugs.debian.org
>> Date: Thu, 03 Nov 2022 21:25:08 +0000
>> 
>> AFAIU the Emacs subprocess we use to compile should behave like a
>> regular Emacs.
>
> Basically, you are saying that if the sub-process that runs
> async-compilation gets SIGHUP, it should abort and dump core, like a
> normal Emacs session does, right?
>
> The backtrace posted to the Debian bug tracker, here:
>
>   https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1017711;filename=gdb.txt;msg=5
>
> indicates that Emacs was in the middle of comp-copy-insn which was
> called from comp-fwprop.  Then Emacs performed GC, and SIGHUP was
> received during GC.  IOW, we were in our Lisp code, not in a libgccjit
> code, when the signal arrived.
>
> Another backtrace, posted here:
>
>   https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1017711;filename=gdb.txt;msg=45
>
> tells a somewhat different story: it doesn't show Emacs in the middle
> of a native compilation, but just inside substitute-command-keys that
> was called from command-line.

Sorry I missed those traces.  Okay so for both cases if libgccjit is not
involved the behaviour of Emacs here is just the plain one and should
not be related to native compilation.  It's just that native compilation
makes it more likely to be identify this condition.

>> Now, the only option that comes to my mind is that libgccjit (being
>> strictly derived from the GCC codebase) might be registering a signal
>> handler of some kind that alters the behaviour we expect.  But if this
>> is the case we should find trace of it the strace, or we can use gdb
>> setting a break point into 'signal' as well to check.
>> 
>> Indeed if this theory is true I think should be classified as a
>> libgccjit bug.
>
> I don't think it's true, see above.
>
> Paul, can you help here, please?  We need to establish what is the
> source of SIGHUP in these cases.  "These cases" mean, AFAIU, the
> situations where Emacs launched an async subprocess to do native
> compilation (which is another Emacs process in a --batch session), and
> the parent Emacs session is terminated by the user before the async
> compilation runs to completion.  Would the child Emacs process get
> SIGHUP in this scenario?  If yes, then I think we should treat SIGHUP
> differently in non-interactive invocations: instead of dumping core,
> we should catch the signal and exit with a non-zero exit status.
>
> Does this make sense?

To me yes.

> Andrea, if we do the above as I suggest, is there any cleanup that we
> need to do before exiting?  For example, what if the subprocess that
> does the async compilation already started writing the .eln file when
> the signal arrives?  What do we do today when the parent interactive
> Emacs is terminated by the user?

I think we have no special handling for this case, so yeah we might
leave some traces of the compilation.  Other than the .eln we should
also remove the lisp file we write to be loaded by the async compilation
process.  I'm not sure where and how would be best to handle all of this
tho.

Best Regards

  Andrea




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Sat, 05 Nov 2022 20:56:01 GMT) Full text and rfc822 format available.

Message #35 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>, Andrea Corallo <akrl <at> sdf.org>
Cc: 58956 <at> debbugs.gnu.org, vincent <at> vinc17.net, 1017711 <at> bugs.debian.org,
 spwhitton <at> spwhitton.name
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Sat, 5 Nov 2022 13:54:54 -0700
[Message part 1 (text/plain, inline)]
On 2022-11-04 00:00, Eli Zaretskii wrote:
> We need to establish what is the
> source of SIGHUP in these cases.  "These cases" mean, AFAIU, the
> situations where Emacs launched an async subprocess to do native
> compilation (which is another Emacs process in a --batch session), and
> the parent Emacs session is terminated by the user before the async
> compilation runs to completion.  Would the child Emacs process get
> SIGHUP in this scenario?

Hard for me to say. It's a messy area, with kernels (and Emacs itself) 
sending SIGHUP on various whims.

Does the attached patch fix things? It builds on your commit 
190a6853708ab22072437f6ebd93beb3ec1a9ce6 dated 2020-12-04; I don't know 
why that earlier patch was installed, but it would seem to apply to 
SIGHUP and SIGTERM as well as it applies to SIGINT.
[sighup.diff (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Sun, 06 Nov 2022 05:53:02 GMT) Full text and rfc822 format available.

Message #38 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 58956 <at> debbugs.gnu.org, spwhitton <at> spwhitton.name, vincent <at> vinc17.net,
 1017711 <at> bugs.debian.org, akrl <at> sdf.org
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Sun, 06 Nov 2022 07:51:18 +0200
> Date: Sat, 5 Nov 2022 13:54:54 -0700
> Cc: vincent <at> vinc17.net, spwhitton <at> spwhitton.name, 58956 <at> debbugs.gnu.org,
>  1017711 <at> bugs.debian.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> 
> On 2022-11-04 00:00, Eli Zaretskii wrote:
> > We need to establish what is the
> > source of SIGHUP in these cases.  "These cases" mean, AFAIU, the
> > situations where Emacs launched an async subprocess to do native
> > compilation (which is another Emacs process in a --batch session), and
> > the parent Emacs session is terminated by the user before the async
> > compilation runs to completion.  Would the child Emacs process get
> > SIGHUP in this scenario?
> 
> Hard for me to say. It's a messy area, with kernels (and Emacs itself) 
> sending SIGHUP on various whims.

But is it possible for a program like Emacs to get SIGHUP in such a
situation, or is that highly improbable?  We have standard streams of
the inferior Emacs process connected via PTYs to the parent process, I
believe -- does that deliver SIGHUP or SIGPIPE when the parent exits?

> Does the attached patch fix things? It builds on your commit 
> 190a6853708ab22072437f6ebd93beb3ec1a9ce6 dated 2020-12-04; I don't know 
> why that earlier patch was installed, but it would seem to apply to 
> SIGHUP and SIGTERM as well as it applies to SIGINT.

I was trying to be conservative, that's all.  I'm okay with doing the
same for SIGHUP.  Vincent, can you try this patch, please?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Sun, 06 Nov 2022 19:19:02 GMT) Full text and rfc822 format available.

Message #41 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 58956 <at> debbugs.gnu.org, spwhitton <at> spwhitton.name, vincent <at> vinc17.net,
 1017711 <at> bugs.debian.org, akrl <at> sdf.org
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Sun, 6 Nov 2022 11:18:03 -0800
On 2022-11-05 22:51, Eli Zaretskii wrote:

> But is it possible for a program like Emacs to get SIGHUP in such a
> situation, or is that highly improbable?  We have standard streams of
> the inferior Emacs process connected via PTYs to the parent process, I
> believe -- does that deliver SIGHUP or SIGPIPE when the parent exits?

It depends on the OS and the app that invokes Emacs and how that app 
itself was invoked. It's a hairy area.

On a POSIX platform it's certainly *possible* for Emacs to get SIGHUP in 
that situation, because a user can invoke the shell command 'kill -s HUP 
P', where P is the process ID of the inferior Emacs. Whether it's 
*likely* is a bit harder to say. I ran a few little experiments on 
Fedora 36 and Ubuntu 22.10 and found SIGHUP being sent in a few 
situations and not others and didn't have the time or patience to suss 
out exactly why or when.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Sun, 06 Nov 2022 19:34:02 GMT) Full text and rfc822 format available.

Message #44 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 58956 <at> debbugs.gnu.org, spwhitton <at> spwhitton.name, vincent <at> vinc17.net,
 1017711 <at> bugs.debian.org, akrl <at> sdf.org
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Sun, 06 Nov 2022 21:32:35 +0200
> Date: Sun, 6 Nov 2022 11:18:03 -0800
> Cc: akrl <at> sdf.org, vincent <at> vinc17.net, spwhitton <at> spwhitton.name,
>  58956 <at> debbugs.gnu.org, 1017711 <at> bugs.debian.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> 
> On 2022-11-05 22:51, Eli Zaretskii wrote:
> 
> > But is it possible for a program like Emacs to get SIGHUP in such a
> > situation, or is that highly improbable?  We have standard streams of
> > the inferior Emacs process connected via PTYs to the parent process, I
> > believe -- does that deliver SIGHUP or SIGPIPE when the parent exits?
> 
> It depends on the OS and the app that invokes Emacs and how that app 
> itself was invoked. It's a hairy area.
> 
> On a POSIX platform it's certainly *possible* for Emacs to get SIGHUP in 
> that situation, because a user can invoke the shell command 'kill -s HUP 
> P', where P is the process ID of the inferior Emacs. Whether it's 
> *likely* is a bit harder to say. I ran a few little experiments on 
> Fedora 36 and Ubuntu 22.10 and found SIGHUP being sent in a few 
> situations and not others and didn't have the time or patience to suss 
> out exactly why or when.

Thanks.  The scenario that is of primary interest in this case is the
following:

 . user starts Emacs
 . Emacs loads some Lisp package and as results starts a subordinate
   Emacs process in batch mode to native-compile the loaded Lisp
 . user exits Emacs

My question was whether in this scenario, since the parent Emacs
exits, the child Emacs can get SIGHUP, simply because its parent
exited and the read end of the PTY no longer exists.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Sun, 06 Nov 2022 19:45:01 GMT) Full text and rfc822 format available.

Message #47 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 58956 <at> debbugs.gnu.org, spwhitton <at> spwhitton.name, vincent <at> vinc17.net,
 1017711 <at> bugs.debian.org, akrl <at> sdf.org
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Sun, 6 Nov 2022 11:44:43 -0800
On 2022-11-06 11:32, Eli Zaretskii wrote:
> My question was whether in this scenario, since the parent Emacs
> exits, the child Emacs can get SIGHUP, simply because its parent
> exited and the read end of the PTY no longer exists.

Yes, my sense from the few experiments I tried, is that it's a plausible 
scenario, though I never observed it actually happening for Emacs doing 
a subprocess compile.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Sun, 06 Nov 2022 19:59:01 GMT) Full text and rfc822 format available.

Message #50 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 58956 <at> debbugs.gnu.org, spwhitton <at> spwhitton.name, vincent <at> vinc17.net,
 1017711 <at> bugs.debian.org, akrl <at> sdf.org
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Sun, 06 Nov 2022 21:58:26 +0200
> Date: Sun, 6 Nov 2022 11:44:43 -0800
> Cc: akrl <at> sdf.org, vincent <at> vinc17.net, spwhitton <at> spwhitton.name,
>  58956 <at> debbugs.gnu.org, 1017711 <at> bugs.debian.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> 
> On 2022-11-06 11:32, Eli Zaretskii wrote:
> > My question was whether in this scenario, since the parent Emacs
> > exits, the child Emacs can get SIGHUP, simply because its parent
> > exited and the read end of the PTY no longer exists.
> 
> Yes, my sense from the few experiments I tried, is that it's a plausible 
> scenario, though I never observed it actually happening for Emacs doing 
> a subprocess compile.

OK, thanks.  So I hope your suggested patch will solve this issue.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Tue, 08 Nov 2022 22:03:01 GMT) Full text and rfc822 format available.

Message #53 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Sean Whitton <spwhitton <at> arizona.edu>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: 1017711 <at> bugs.debian.org, Eli Zaretskii <eliz <at> gnu.org>,
 Paul Eggert <eggert <at> cs.ucla.edu>, 58956 <at> debbugs.gnu.org,
 Andrea Corallo <akrl <at> sdf.org>
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Tue, 08 Nov 2022 12:44:08 -0700
[Message part 1 (text/plain, inline)]
Hello Vincent,

Are you able to test the patch?  Let me know if you need help getting an
installable .deb.  Thanks.

-- 
Sean Whitton
[signature.asc (application/pgp-signature, inline)]

Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Thu, 10 Nov 2022 10:15:01 GMT) Full text and rfc822 format available.

Notification sent to Sean Whitton <spwhitton <at> spwhitton.name>:
bug acknowledged by developer. (Thu, 10 Nov 2022 10:15:01 GMT) Full text and rfc822 format available.

Message #58 received at 58956-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 58956-done <at> debbugs.gnu.org, spwhitton <at> spwhitton.name, vincent <at> vinc17.net,
 1017711 <at> bugs.debian.org, akrl <at> sdf.org
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Thu, 10 Nov 2022 12:14:31 +0200
> Date: Sat, 5 Nov 2022 13:54:54 -0700
> Cc: vincent <at> vinc17.net, spwhitton <at> spwhitton.name, 58956 <at> debbugs.gnu.org,
>  1017711 <at> bugs.debian.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> 
> On 2022-11-04 00:00, Eli Zaretskii wrote:
> > We need to establish what is the
> > source of SIGHUP in these cases.  "These cases" mean, AFAIU, the
> > situations where Emacs launched an async subprocess to do native
> > compilation (which is another Emacs process in a --batch session), and
> > the parent Emacs session is terminated by the user before the async
> > compilation runs to completion.  Would the child Emacs process get
> > SIGHUP in this scenario?
> 
> Hard for me to say. It's a messy area, with kernels (and Emacs itself) 
> sending SIGHUP on various whims.
> 
> Does the attached patch fix things? It builds on your commit 
> 190a6853708ab22072437f6ebd93beb3ec1a9ce6 dated 2020-12-04; I don't know 
> why that earlier patch was installed, but it would seem to apply to 
> SIGHUP and SIGTERM as well as it applies to SIGINT.

No further comments, so I've now installed this on the master branch,
and I'm marking this bug done.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Thu, 10 Nov 2022 10:24:02 GMT) Full text and rfc822 format available.

Message #61 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Vincent Lefevre <vincent <at> vinc17.net>
To: Sean Whitton <spwhitton <at> arizona.edu>
Cc: 1017711 <at> bugs.debian.org, Eli Zaretskii <eliz <at> gnu.org>,
 Paul Eggert <eggert <at> cs.ucla.edu>, 58956 <at> debbugs.gnu.org,
 Andrea Corallo <akrl <at> sdf.org>
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Thu, 10 Nov 2022 11:23:03 +0100
On 2022-11-08 12:44:08 -0700, Sean Whitton wrote:
> Are you able to test the patch?  Let me know if you need help getting an
> installable .deb.  Thanks.

Sorry, I couldn't test it yet, first because of an uninstallable
package needed for the build because I couldn't upgrade libc6 yet
and I couldn't get the previous version from snapshot.debian.org
(bug 1023540). Now that I could upgrade libc6, I'll be able to
test when I have some time, but perhaps not before the week-end.

-- 
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Fri, 11 Nov 2022 19:25:02 GMT) Full text and rfc822 format available.

Message #64 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Sean Whitton <spwhitton <at> arizona.edu>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: Andrea Corallo <akrl <at> sdf.org>, Eli Zaretskii <eliz <at> gnu.org>,
 Paul Eggert <eggert <at> cs.ucla.edu>, 1017711 <at> bugs.debian.org,
 58956 <at> debbugs.gnu.org
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Fri, 11 Nov 2022 11:32:33 -0700
[Message part 1 (text/plain, inline)]
Hello,

On Thu 10 Nov 2022 at 11:23AM +01, Vincent Lefevre wrote:

> On 2022-11-08 12:44:08 -0700, Sean Whitton wrote:
>> Are you able to test the patch?  Let me know if you need help getting an
>> installable .deb.  Thanks.
>
> Sorry, I couldn't test it yet, first because of an uninstallable
> package needed for the build because I couldn't upgrade libc6 yet
> and I couldn't get the previous version from snapshot.debian.org
> (bug 1023540). Now that I could upgrade libc6, I'll be able to
> test when I have some time, but perhaps not before the week-end.

Okay, do let me know if I can help -- this is blocking Emacs from migrating.

-- 
Sean Whitton
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Sat, 12 Nov 2022 01:56:01 GMT) Full text and rfc822 format available.

Message #67 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Vincent Lefevre <vincent <at> vinc17.net>
To: Sean Whitton <spwhitton <at> arizona.edu>
Cc: Andrea Corallo <akrl <at> sdf.org>, Eli Zaretskii <eliz <at> gnu.org>,
 Paul Eggert <eggert <at> cs.ucla.edu>, 1017711 <at> bugs.debian.org,
 58956 <at> debbugs.gnu.org
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Sat, 12 Nov 2022 02:55:00 +0100
Hi,

On 2022-11-11 11:32:33 -0700, Sean Whitton wrote:
> On Thu 10 Nov 2022 at 11:23AM +01, Vincent Lefevre wrote:
> > On 2022-11-08 12:44:08 -0700, Sean Whitton wrote:
> >> Are you able to test the patch?  Let me know if you need help getting an
> >> installable .deb.  Thanks.
> >
> > Sorry, I couldn't test it yet, first because of an uninstallable
> > package needed for the build because I couldn't upgrade libc6 yet
> > and I couldn't get the previous version from snapshot.debian.org
> > (bug 1023540). Now that I could upgrade libc6, I'll be able to
> > test when I have some time, but perhaps not before the week-end.
> 
> Okay, do let me know if I can help -- this is blocking Emacs from migrating.

I've rebuilt the packages with the patch and couldn't reproduce
the bug yet. So it may be the correct fix.

-- 
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58956; Package emacs. (Mon, 14 Nov 2022 09:13:04 GMT) Full text and rfc822 format available.

Message #70 received at 58956 <at> debbugs.gnu.org (full text, mbox):

From: Sean Whitton <spwhitton <at> arizona.edu>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>,
 58956 <at> debbugs.gnu.org, 1017711 <at> bugs.debian.org, Andrea Corallo <akrl <at> sdf.org>
Subject: Re: bug#58956: mark_object, mark_objects(?) crash
Date: Sun, 13 Nov 2022 13:51:26 -0700
[Message part 1 (text/plain, inline)]
Hello,

On Sat 12 Nov 2022 at 02:55AM +01, Vincent Lefevre wrote:

> Hi,
>
> On 2022-11-11 11:32:33 -0700, Sean Whitton wrote:
>> On Thu 10 Nov 2022 at 11:23AM +01, Vincent Lefevre wrote:
>> > On 2022-11-08 12:44:08 -0700, Sean Whitton wrote:
>> >> Are you able to test the patch?  Let me know if you need help getting an
>> >> installable .deb.  Thanks.
>> >
>> > Sorry, I couldn't test it yet, first because of an uninstallable
>> > package needed for the build because I couldn't upgrade libc6 yet
>> > and I couldn't get the previous version from snapshot.debian.org
>> > (bug 1023540). Now that I could upgrade libc6, I'll be able to
>> > test when I have some time, but perhaps not before the week-end.
>>
>> Okay, do let me know if I can help -- this is blocking Emacs from migrating.
>
> I've rebuilt the packages with the patch and couldn't reproduce
> the bug yet. So it may be the correct fix.

Many thanks for testing, and Eli and Paul for the patch.

-- 
Sean Whitton
[signature.asc (application/pgp-signature, inline)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 12 Dec 2022 12:24:13 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 159 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.