GNU bug report logs - #63368
Build coordiantor "Signals delivery fails constantly" crashes

Previous Next

Package: guix;

Reported by: Christopher Baines <mail <at> cbaines.net>

Date: Mon, 8 May 2023 10:55:02 UTC

Severity: normal

To reply to this bug, email your comments to 63368 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#63368; Package guix. (Mon, 08 May 2023 10:55:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Christopher Baines <mail <at> cbaines.net>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Mon, 08 May 2023 10:55:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Christopher Baines <mail <at> cbaines.net>
To: bug-guix <at> gnu.org
Subject: Build coordiantor "Signals delivery fails constantly" crashes
Date: Mon, 08 May 2023 11:45:21 +0100
[Message part 1 (text/plain, inline)]
Since the recent core-updates merge, I've seen the build coordinator
using less memory, but it's also been crashing in a new way, up to 10
times a day.

In the log, you see something like:

  2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
  2023-05-07 09:15:42 Signals delivery fails constantly

I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
do with this.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#63368; Package guix. (Wed, 10 May 2023 12:50:03 GMT) Full text and rfc822 format available.

Message #8 received at 63368 <at> debbugs.gnu.org (full text, mbox):

From: Christopher Baines <mail <at> cbaines.net>
To: 63368 <at> debbugs.gnu.org
Subject: Re: bug#63368: Build coordiantor "Signals delivery fails
 constantly" crashes
Date: Wed, 10 May 2023 13:47:11 +0100
[Message part 1 (text/plain, inline)]
Christopher Baines <mail <at> cbaines.net> writes:

> Since the recent core-updates merge, I've seen the build coordinator
> using less memory, but it's also been crashing in a new way, up to 10
> times a day.
>
> In the log, you see something like:
>
>   2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
>   2023-05-07 09:15:42 Signals delivery fails constantly
>
> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
> do with this.

I think I've found a workaround. I found a list of environment variables
[1] you can set to affect the GC behaviour, and the first one I tried
(GC_RETRY_SIGNALS=0) seems to have had the desired affect, in that the
crashes/restarts have stopped.

1: https://github.com/ivmai/bdwgc/blob/master/docs/README.environment

I've sent a patch [2] to apply this setting as part of the service.

2: https://issues.guix.gnu.org/63417
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#63368; Package guix. (Thu, 25 May 2023 15:26:01 GMT) Full text and rfc822 format available.

Message #11 received at 63368 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Christopher Baines <mail <at> cbaines.net>
Cc: 63368 <at> debbugs.gnu.org
Subject: Re: bug#63368: Build coordiantor "Signals delivery fails
 constantly" crashes
Date: Thu, 25 May 2023 17:24:56 +0200
Hi,

Christopher Baines <mail <at> cbaines.net> skribis:

> Since the recent core-updates merge, I've seen the build coordinator
> using less memory, but it's also been crashing in a new way, up to 10
> times a day.
>
> In the log, you see something like:
>
>   2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
>   2023-05-07 09:15:42 Signals delivery fails constantly
>
> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
> do with this.

Normally on GNU/Linux libgc has:

  #define SIG_SUSPEND SIGPWR

The Coordinator fiddles with SIGALRM, SIGUSR1, SIGINT, and SIGPIPE,
which should normally be fine.

Is there anything else that might interfere with libgc?

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#63368; Package guix. (Thu, 25 May 2023 15:42:01 GMT) Full text and rfc822 format available.

Message #14 received at 63368 <at> debbugs.gnu.org (full text, mbox):

From: Christopher Baines <mail <at> cbaines.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 63368 <at> debbugs.gnu.org
Subject: Re: bug#63368: Build coordiantor "Signals delivery fails
 constantly" crashes
Date: Thu, 25 May 2023 16:26:34 +0100
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> writes:

> Christopher Baines <mail <at> cbaines.net> skribis:
>
>> Since the recent core-updates merge, I've seen the build coordinator
>> using less memory, but it's also been crashing in a new way, up to 10
>> times a day.
>>
>> In the log, you see something like:
>>
>>   2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
>>   2023-05-07 09:15:42 Signals delivery fails constantly
>>
>> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
>> do with this.
>
> Normally on GNU/Linux libgc has:
>
>   #define SIG_SUSPEND SIGPWR
>
> The Coordinator fiddles with SIGALRM, SIGUSR1, SIGINT, and SIGPIPE,
> which should normally be fine.
>
> Is there anything else that might interfere with libgc?

I've seen this issue in both the build coordinator and nar-herder, both
of which use guile-sqlite, so I wonder if that could have something to
do with it.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#63368; Package guix. (Fri, 02 Jun 2023 17:13:01 GMT) Full text and rfc822 format available.

Message #17 received at 63368 <at> debbugs.gnu.org (full text, mbox):

From: Christopher Baines <mail <at> cbaines.net>
To: 63368 <at> debbugs.gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: bug#63368: Build coordiantor "Signals delivery fails
 constantly" crashes
Date: Fri, 02 Jun 2023 18:07:16 +0100
[Message part 1 (text/plain, inline)]
Christopher Baines <mail <at> cbaines.net> writes:

> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> Christopher Baines <mail <at> cbaines.net> skribis:
>>
>>> Since the recent core-updates merge, I've seen the build coordinator
>>> using less memory, but it's also been crashing in a new way, up to 10
>>> times a day.
>>>
>>> In the log, you see something like:
>>>
>>>   2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
>>>   2023-05-07 09:15:42 Signals delivery fails constantly
>>>
>>> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
>>> do with this.
>>
>> Normally on GNU/Linux libgc has:
>>
>>   #define SIG_SUSPEND SIGPWR
>>
>> The Coordinator fiddles with SIGALRM, SIGUSR1, SIGINT, and SIGPIPE,
>> which should normally be fine.
>>
>> Is there anything else that might interfere with libgc?
>
> I've seen this issue in both the build coordinator and nar-herder, both
> of which use guile-sqlite, so I wonder if that could have something to
> do with it.

I've seen this happen with the build coordinator agent now (on
milano-guix-1):

  2023-06-02 18:59:55 2023-06-02 18:59:55 (DEBUG): fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: checking the availability of build inputs
  2023-06-02 18:59:55 2023-06-02 18:59:55 (INFO ): fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: setup successful, building: /gnu/store/7fbrli2a8nzn676q8gz2b0i0y0lr9nxv-r-quasr-1.40.0.drv
  2023-06-02 19:00:46 Signals delivery fails constantly at GC #55
  2023-06-02 19:01:22 Signals delivery fails constantly
  2023-06-02 19:01:29 locale is en_US.utf8
  2023-06-02 19:01:29 (gnutls version: 3.7.7, guix version: 1.4.0-6.dc5430c)

Which is a bit more concerning, since the build coordinator agent is
intentionally quite simple (no SQLite for example).
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#63368; Package guix. (Tue, 06 Jun 2023 15:10:02 GMT) Full text and rfc822 format available.

Message #20 received at 63368 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Christopher Baines <mail <at> cbaines.net>
Cc: 63368 <at> debbugs.gnu.org
Subject: Re: bug#63368: Build coordiantor "Signals delivery fails
 constantly" crashes
Date: Tue, 06 Jun 2023 17:09:03 +0200
Christopher Baines <mail <at> cbaines.net> skribis:

> I've seen this happen with the build coordinator agent now (on
> milano-guix-1):
>
>   2023-06-02 18:59:55 2023-06-02 18:59:55 (DEBUG): fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: checking the availability of build inputs
>   2023-06-02 18:59:55 2023-06-02 18:59:55 (INFO ): fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: setup successful, building: /gnu/store/7fbrli2a8nzn676q8gz2b0i0y0lr9nxv-r-quasr-1.40.0.drv
>   2023-06-02 19:00:46 Signals delivery fails constantly at GC #55
>   2023-06-02 19:01:22 Signals delivery fails constantly
>   2023-06-02 19:01:29 locale is en_US.utf8
>   2023-06-02 19:01:29 (gnutls version: 3.7.7, guix version: 1.4.0-6.dc5430c)
>
> Which is a bit more concerning, since the build coordinator agent is
> intentionally quite simple (no SQLite for example).

The closure of (guix-build-coordinator agent) seems to be quite large
still.

Could you check what .so files are loaded by that code, perhaps via
/proc/PID/maps?

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#63368; Package guix. (Tue, 06 Jun 2023 15:21:02 GMT) Full text and rfc822 format available.

Message #23 received at 63368 <at> debbugs.gnu.org (full text, mbox):

From: Christopher Baines <mail <at> cbaines.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 63368 <at> debbugs.gnu.org
Subject: Re: bug#63368: Build coordiantor "Signals delivery fails
 constantly" crashes
Date: Tue, 06 Jun 2023 16:19:39 +0100
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> writes:

> Christopher Baines <mail <at> cbaines.net> skribis:
>
>> I've seen this happen with the build coordinator agent now (on
>> milano-guix-1):
>>
>>   2023-06-02 18:59:55 2023-06-02 18:59:55 (DEBUG):
>> fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: checking the availability of
>> build inputs
>>   2023-06-02 18:59:55 2023-06-02 18:59:55 (INFO ):
>> fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: setup successful, building:
>> /gnu/store/7fbrli2a8nzn676q8gz2b0i0y0lr9nxv-r-quasr-1.40.0.drv
>>   2023-06-02 19:00:46 Signals delivery fails constantly at GC #55
>>   2023-06-02 19:01:22 Signals delivery fails constantly
>>   2023-06-02 19:01:29 locale is en_US.utf8
>>   2023-06-02 19:01:29 (gnutls version: 3.7.7, guix version: 1.4.0-6.dc5430c)
>>
>> Which is a bit more concerning, since the build coordinator agent is
>> intentionally quite simple (no SQLite for example).
>
> The closure of (guix-build-coordinator agent) seems to be quite large
> still.
>
> Could you check what .so files are loaded by that code, perhaps via
> /proc/PID/maps?

I think I see these (that's on milano-guix-1 currently):

/gnu/store/0i81lpfnn05pmjc5f43q4nfvd27r08f7-guile-gnutls-3.7.12/lib/guile/3.0/extensions/guile-gnutls-v-2.so.0.0.0
/gnu/store/0jk7sl5xqwwdkzjpp9sxgz9z0d48a3vy-libunistring-1.0/lib/libunistring.so.2.2.0
/gnu/store/1r1azdi4hvfypnx14d01n60p4aa7g2im-libidn2-2.3.4/lib/libidn2.so.0.3.8
/gnu/store/1w1r6r56z9lhg8ghcb7lxss6mkn7d5l1-libgc-8.2.2/lib/libgc.so.1.5.1
/gnu/store/4gvgcfdiz67wv04ihqfa8pqwzsb0qpv5-guile-3.0.9/lib/libguile-3.0.so.1.6.0
/gnu/store/8y0pwifz8a3d7zbdfzsawa1amf4afx1s-libgcrypt-1.10.1/lib/libgcrypt.so.20.4.1
/gnu/store/930nwsiysdvy2x5zv1sf6v7ym75z8ayk-gcc-11.3.0-lib/lib/libgcc_s.so.1
/gnu/store/c2fx42ial6lr60s96xcbml5hd8vwaxq3-nettle-3.8.1/lib/libhogweed.so.6.6
/gnu/store/c2fx42ial6lr60s96xcbml5hd8vwaxq3-nettle-3.8.1/lib/libnettle.so.8.6
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/ld-linux-x86-64.so.2
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libcrypt.so.1
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libc.so.6
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libm.so.6
/gnu/store/ib2n2vzqpchc3bhh9i712w5sq9zapn8d-gmp-6.2.1/lib/libgmp.so.10.4.1
/gnu/store/j5kzdjan6mnf2ngmkc50fia8vrbpqi9b-libtasn1-4.19.0/lib/libtasn1.so.6.6.3
/gnu/store/k0p01a6b7hsxjfr65ga4f2gh6lh92aiq-lzlib-1.13/lib/liblz.so.1.13
/gnu/store/m9wi9hcrf7f9dm4ri32vw1jrbh1csywi-libgpg-error-1.45/lib/libgpg-error.so.0.33.0
/gnu/store/slzq3zqwj75lbrg4ly51hfhbv2vhryv5-zlib-1.2.13/lib/libz.so.1.2.13
/gnu/store/vq7dxp5la2lnhsvniwv38j0ggvsmzim7-p11-kit-0.24.1/lib/libp11-kit.so.0.3.0
/gnu/store/w8b0l8hk6g0fahj4fvmc4qqm3cvaxnmv-libffi-3.4.4/lib/libffi.so.8.1.2
/gnu/store/yr4lbvdyc4dgs76yij1dw2w2z8s84af8-gnutls-3.7.7/lib/libgnutls.so.30.34.1
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#63368; Package guix. (Fri, 09 Jun 2023 13:15:01 GMT) Full text and rfc822 format available.

Message #26 received at 63368 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Christopher Baines <mail <at> cbaines.net>
Cc: 63368 <at> debbugs.gnu.org
Subject: Re: bug#63368: Build coordiantor "Signals delivery fails
 constantly" crashes
Date: Fri, 09 Jun 2023 15:14:22 +0200
Christopher Baines <mail <at> cbaines.net> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> Christopher Baines <mail <at> cbaines.net> skribis:

[...]

>>>   2023-06-02 19:01:22 Signals delivery fails constantly
>>>   2023-06-02 19:01:29 locale is en_US.utf8
>>>   2023-06-02 19:01:29 (gnutls version: 3.7.7, guix version: 1.4.0-6.dc5430c)
>>>
>>> Which is a bit more concerning, since the build coordinator agent is
>>> intentionally quite simple (no SQLite for example).
>>
>> The closure of (guix-build-coordinator agent) seems to be quite large
>> still.
>>
>> Could you check what .so files are loaded by that code, perhaps via
>> /proc/PID/maps?
>
> I think I see these (that's on milano-guix-1 currently):
>
> /gnu/store/0i81lpfnn05pmjc5f43q4nfvd27r08f7-guile-gnutls-3.7.12/lib/guile/3.0/extensions/guile-gnutls-v-2.so.0.0.0
> /gnu/store/0jk7sl5xqwwdkzjpp9sxgz9z0d48a3vy-libunistring-1.0/lib/libunistring.so.2.2.0
> /gnu/store/1r1azdi4hvfypnx14d01n60p4aa7g2im-libidn2-2.3.4/lib/libidn2.so.0.3.8
> /gnu/store/1w1r6r56z9lhg8ghcb7lxss6mkn7d5l1-libgc-8.2.2/lib/libgc.so.1.5.1
> /gnu/store/4gvgcfdiz67wv04ihqfa8pqwzsb0qpv5-guile-3.0.9/lib/libguile-3.0.so.1.6.0
> /gnu/store/8y0pwifz8a3d7zbdfzsawa1amf4afx1s-libgcrypt-1.10.1/lib/libgcrypt.so.20.4.1
> /gnu/store/930nwsiysdvy2x5zv1sf6v7ym75z8ayk-gcc-11.3.0-lib/lib/libgcc_s.so.1
> /gnu/store/c2fx42ial6lr60s96xcbml5hd8vwaxq3-nettle-3.8.1/lib/libhogweed.so.6.6
> /gnu/store/c2fx42ial6lr60s96xcbml5hd8vwaxq3-nettle-3.8.1/lib/libnettle.so.8.6
> /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/ld-linux-x86-64.so.2
> /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libcrypt.so.1
> /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libc.so.6
> /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libm.so.6
> /gnu/store/ib2n2vzqpchc3bhh9i712w5sq9zapn8d-gmp-6.2.1/lib/libgmp.so.10.4.1
> /gnu/store/j5kzdjan6mnf2ngmkc50fia8vrbpqi9b-libtasn1-4.19.0/lib/libtasn1.so.6.6.3
> /gnu/store/k0p01a6b7hsxjfr65ga4f2gh6lh92aiq-lzlib-1.13/lib/liblz.so.1.13
> /gnu/store/m9wi9hcrf7f9dm4ri32vw1jrbh1csywi-libgpg-error-1.45/lib/libgpg-error.so.0.33.0
> /gnu/store/slzq3zqwj75lbrg4ly51hfhbv2vhryv5-zlib-1.2.13/lib/libz.so.1.2.13
> /gnu/store/vq7dxp5la2lnhsvniwv38j0ggvsmzim7-p11-kit-0.24.1/lib/libp11-kit.so.0.3.0
> /gnu/store/w8b0l8hk6g0fahj4fvmc4qqm3cvaxnmv-libffi-3.4.4/lib/libffi.so.8.1.2
> /gnu/store/yr4lbvdyc4dgs76yij1dw2w2z8s84af8-gnutls-3.7.7/lib/libgnutls.so.30.34.1


Hmm no idea.  I’ve never seen “Signals delivery fails” before so I
really wonder what could be causing this.  Would be great if you could
come up with a reduced test case, but I guess that won’t be easy.

Or perhaps you could run a Coordinator agent under ‘strace -f’ to see if
we get hints?

Ludo’.




This bug report was last modified 330 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.