GNU bug report logs - #39266
Heap corruption leads to random crashes

Previous Next

Package: guile;

Reported by: Ludovic Courtès <ludo <at> gnu.org>

Date: Fri, 24 Jan 2020 15:15:02 UTC

Severity: important

Merged with 36811, 36812, 39208, 39241, 39988

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 39266 in the body.
You can then email your comments to 39266 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guile <at> gnu.org:
bug#39266; Package guile. (Fri, 24 Jan 2020 15:15:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ludovic Courtès <ludo <at> gnu.org>:
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Fri, 24 Jan 2020 15:15:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: bug-Guile <at> gnu.org
Subject: Finalization thread hits wrong-type-arg on weak vector (AArch64)
Date: Fri, 24 Jan 2020 16:14:29 +0100
Hello!

While building the “guix-system.drv” derivation on AArch64, I got this
crash (not fully deterministic but quite frequent).  Here the
finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e.,
accessing a one-element weak vector):

--8<---------------cut here---------------start------------->8---
$ ( export out=$PWD/build; unset GUILE_LOAD_PATH; unset GUILE_LOAD_COMPILED_PATH; gdb --args "/gnu/store/p8in2npgl5yhliy25ikz7shjbq0gii95-guile-next-3.0.0/bin/guile" "--no-auto-compile" "-L" "/gnu/store/3qg8l6kr4wa9sbgwy00z1mb3p88xf455-module-import" "-C" "/gnu/store/h9qcvg71bmx735fsndagll9y7s72k9n9-module-import-compiled" guix-system-builder )
[…]
loading 'gnu/services/cups.scm'...
Backtrace:
[Switching to Thread 0xffffbebec1d0 (LWP 22464)]

Thread 2 "guile" hit Breakpoint 3, scm_display_backtrace_with_highlights (
    stack=stack <at> entry="#<struct stack>" = {...}, port=port <at> entry=#<port #<port-type file 4c3b40> 510040>, 
    first=first <at> entry=#f, depth=depth <at> entry=#f, highlights=highlights <at> entry=()) at backtrace.c:269
269     {
(gdb) bt
#0  scm_display_backtrace_with_highlights (stack=stack <at> entry="#<struct stack>" = {...}, 
    port=port <at> entry=#<port #<port-type file 4c3b40> 510040>, first=first <at> entry=#f, depth=depth <at> entry=#f, 
    highlights=highlights <at> entry=()) at backtrace.c:269
#1  0x0000ffffbf5ef8c4 in print_exception_and_backtrace (
    args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cef60, tag=wrong-type-arg, 
    port=#<port #<port-type file 4c3b40> 510040>) at continuations.c:409
#2  pre_unwind_handler (error_port=0x510040, tag=wrong-type-arg, 
    args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cef60) at continuations.c:453
#3  0x0000ffffbf672588 in catch_pre_unwind_handler (data=0xffffbebeb850, 
    exn=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70ced40) at throw.c:135
#4  0x0000ffffbf67bdf8 in vm_regular_engine (thread=0x475b40) at vm-engine.c:972
#5  0x0000ffffbf67d10c in scm_call_n (proc=proc <at> entry=#<unmatched-tag 10045>, argv=<optimized out>, nargs=5)
    at vm.c:1589
#6  0x0000ffffbf5f3c10 in scm_apply_0 (proc=#<unmatched-tag 10045>, args=()) at eval.c:603
#7  0x0000ffffbf5f4654 in scm_apply_1 (proc=<optimized out>, arg1=arg1 <at> entry=wrong-type-arg, 
    args=args <at> entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70caf80) at eval.c:609
#8  0x0000ffffbf6729e0 in scm_throw (key=key <at> entry=wrong-type-arg, 
    args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70caf80) at throw.c:262
#9  0x0000ffffbf672b44 in scm_ithrow (key=key <at> entry=wrong-type-arg, args=<optimized out>, no_return=no_return <at> entry=1)
    at throw.c:457
#10 0x0000ffffbf5f1dec in scm_error_scm (key=key <at> entry=wrong-type-arg, subr=subr <at> entry="weak-vector-ref", 
    message=<optimized out>, 
    args=args <at> entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafd0, 
    data=data <at> entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafc0) at error.c:90
#11 0x0000ffffbf5f1ea0 in scm_error (key=key <at> entry=wrong-type-arg, 
    subr=subr <at> entry=0xffffbf6a52d8 <s_scm_weak_vector_ref> "weak-vector-ref", 
    message=message <at> entry=0xffffbf696e98 "Wrong type argument in position ~A (expecting ~A): ~S", 
    args=args <at> entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafd0, 
    rest=<error reading variable: ERROR: Cannot access memory at address 0x0>0x70cafc0) at error.c:62
#12 0x0000ffffbf5f22cc in scm_wrong_type_arg_msg (
    subr=subr <at> entry=0xffffbf6a52d8 <s_scm_weak_vector_ref> "weak-vector-ref", pos=pos <at> entry=1, 
    bad_value=<error reading variable: ERROR: Cannot access memory at address 0x0>0x30ff880, 
    szMessage=szMessage <at> entry=0xffffbf6a5300 "weak vector") at error.c:282
#13 0x0000ffffbf680050 in scm_c_weak_vector_ref (wv=<optimized out>, k=k <at> entry=0) at weak-vector.c:193
#14 0x0000ffffbf67eff4 in scm_i_weak_car (
    pair=<error reading variable: ERROR: Cannot access memory at address 0x0>0x30f8830) at weak-list.h:39
#15 scm_i_visit_weak_list (list_loc=0xffffbf6c81b0 <all_weak_tables>, visit=<optimized out>) at weak-list.h:49
#16 vacuum_all_weak_tables () at weak-table.c:494
#17 0x0000ffffbf5fda44 in async_gc_finalizer (ptr=0x494ec0, data=0x0) at finalizers.c:316
#18 0x0000ffffbf549f74 in GC_invoke_finalizers ()
   from /gnu/store/wsqzmim7m23gskpibrpqzx4djadhjz8y-libgc-7.6.12/lib/libgc.so.1
#19 0x0000ffffbf5fdf64 in scm_run_finalizers () at finalizers.c:398
#20 0x0000ffffbf5fdff4 in finalization_thread_proc (unused=<optimized out>) at finalizers.c:233
#21 0x0000ffffbf5ef6e0 in c_body (d=0xffffbebeb918) at continuations.c:430
#22 0x0000ffffbf67bdf8 in vm_regular_engine (thread=0x475b40) at vm-engine.c:972
#23 0x0000ffffbf67d10c in scm_call_n (proc=#<unmatched-tag 10045>, argv=argv <at> entry=0xffffbebeb660, 
    nargs=nargs <at> entry=2) at vm.c:1589
#24 0x0000ffffbf5f3930 in scm_call_2 (proc=<optimized out>, arg1=<optimized out>, arg2=<optimized out>) at eval.c:503
#25 0x0000ffffbf5f4f38 in scm_c_with_exception_handler (type=type <at> entry=#t, handler=0xffffbebeb670, 
    handler <at> entry=0xffffbf6724b0 <catch_post_unwind_handler>, handler_data=0x510040, 
    handler_data <at> entry=0xffffbebeb850, thunk=0x0, thunk <at> entry=0xffffbf6725f8 <catch_body>, 
    thunk_data=0x1dce42683dff4d67, thunk_data <at> entry=0xffffbebeb850) at exceptions.c:170
#26 0x0000ffffbf672850 in scm_c_catch (tag=tag <at> entry=#t, body=body <at> entry=0xffffbf5ef6c8 <c_body>, 
    body_data=body_data <at> entry=0xffffbebeb918, handler=handler <at> entry=0xffffbf5ef970 <c_handler>, 
    handler_data=handler_data <at> entry=0xffffbebeb918, 
    pre_unwind_handler=pre_unwind_handler <at> entry=0xffffbf5ef7b8 <pre_unwind_handler>, 
    pre_unwind_handler_data=pre_unwind_handler_data <at> entry=0x510040) at throw.c:168
#27 0x0000ffffbf5efbf4 in scm_i_with_continuation_barrier (body=body <at> entry=0xffffbf5ef6c8 <c_body>, 
    body_data=body_data <at> entry=0xffffbebeb918, handler=handler <at> entry=0xffffbf5ef970 <c_handler>, 
    handler_data=handler_data <at> entry=0xffffbebeb918, 
    pre_unwind_handler=pre_unwind_handler <at> entry=0xffffbf5ef7b8 <pre_unwind_handler>, pre_unwind_handler_data=0x510040)
    at continuations.c:368
#28 0x0000ffffbf5efca0 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>)
    at continuations.c:464
#29 0x0000ffffbf671148 in with_guile (base=0xffffbebeb988, data=0xffffbebeb9a8) at threads.c:645
#30 0x0000ffffbf551618 in GC_call_with_stack_base ()
   from /gnu/store/wsqzmim7m23gskpibrpqzx4djadhjz8y-libgc-7.6.12/lib/libgc.so.1
#31 0x0000ffffbf6714a8 in scm_i_with_guile (dynamic_state=<optimized out>, data=<optimized out>, func=<optimized out>)
    at threads.c:688
#32 scm_with_guile (func=<optimized out>, data=<optimized out>) at threads.c:694
#33 0x0000ffffbf50e7f4 in start_thread ()
   from /gnu/store/nr1aw4i32h7rmxwmq7d2da0mwcwg551j-glibc-2.29/lib/libpthread.so.0
#34 0x0000ffffbf136edc in thread_start () from /gnu/store/nr1aw4i32h7rmxwmq7d2da0mwcwg551j-glibc-2.29/lib/libc.so.6
(gdb) info threads
  Id   Target Id                                 Frame 
  1    Thread 0xffffbf6f4010 (LWP 22463) "guile" resize_table (table=table <at> entry=0x4a2cb0) at weak-table.c:272
* 2    Thread 0xffffbebec1d0 (LWP 22464) "guile" scm_display_backtrace_with_highlights (
    stack=stack <at> entry="#<struct stack>" = {...}, port=port <at> entry=#<port #<port-type file 4c3b40> 510040>, 
    first=first <at> entry=#f, depth=depth <at> entry=#f, highlights=highlights <at> entry=()) at backtrace.c:269
--8<---------------cut here---------------end--------------->8---

The problem appears to be that the type tag of the weak-vector got
zeroed:

--8<---------------cut here---------------start------------->8---
(gdb) frame 15
#15 scm_i_visit_weak_list (list_loc=0xffffbf6c81b0 <all_weak_tables>, visit=<optimized out>) at weak-list.h:49
49            SCM car = scm_i_weak_car (in);
(gdb) p in
$48 = <error reading variable: ERROR: Cannot access memory at address 0x0>(SCM) <error reading variable: ERROR: Cannot access memory at address 0x0>0x30f8830
(gdb) p *(void**)in
$49 = (void *) 0x30ff880
(gdb) p ((void**)$49)[0]@2
$50 = {0x0, 0x30f8840}
(gdb) p (void**)in
$51 = (void **) 0x30f8830
--8<---------------cut here---------------end--------------->8---

There’s normally no disappearing link registered on the first element of
the weak vector (type tag + length) so I don’t know how this can happen.
Here’s the other thread (with surprisingly broken stack frames):

--8<---------------cut here---------------start------------->8---
(gdb) thread 1
[Switching to thread 1 (Thread 0xffffbf6f4010 (LWP 22463))]
#0  resize_table (table=table <at> entry=0x4a2cb0) at weak-table.c:272
272               scm_t_weak_entry *next = entry->next;
(gdb) bt
#0  resize_table (table=table <at> entry=0x4a2cb0) at weak-table.c:272
#1  0x0000ffffbf67ef00 in vacuum_weak_table (table=0x4a2cb0) at weak-table.c:318
#2  0x0000ffffbf67f374 in scm_c_weak_table_ref (table=<optimized out>, raw_hash=2016212919028049524, 
    pred=0xffffbf67eb10 <assq_predicate>, closure=0x72b6a80, dflt=()) at weak-table.c:533
#3  0x0000ffffbf653824 in scm_source_properties (obj=<optimized out>) at srcprop.c:195
#4  0x0000ffffbeebcf14 in ?? ()
#5  0x0000ffffbf03c130 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
--8<---------------cut here---------------end--------------->8---

Thoughts?

This code is the same as in 2.2.

Ludo’.




Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Sat, 15 Feb 2020 16:38:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-guile <at> gnu.org:
bug#39266; Package guile. (Wed, 19 Feb 2020 13:52:02 GMT) Full text and rfc822 format available.

Message #10 received at 39266 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 39266 <at> debbugs.gnu.org
Subject: Re: bug#39266: Finalization thread hits wrong-type-arg on weak vector
 (AArch64)
Date: Wed, 19 Feb 2020 14:50:51 +0100
Ludovic Courtès <ludo <at> gnu.org> skribis:

> While building the “guix-system.drv” derivation on AArch64, I got this
> crash (not fully deterministic but quite frequent).  Here the
> finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e.,
> accessing a one-element weak vector):
>
> $ ( export out=$PWD/build; unset GUILE_LOAD_PATH; unset GUILE_LOAD_COMPILED_PATH; gdb --args "/gnu/store/p8in2npgl5yhliy25ikz7shjbq0gii95-guile-next-3.0.0/bin/guile" "--no-auto-compile" "-L" "/gnu/store/3qg8l6kr4wa9sbgwy00z1mb3p88xf455-module-import" "-C" "/gnu/store/h9qcvg71bmx735fsndagll9y7s72k9n9-module-import-compiled" guix-system-builder )
> […]
> loading 'gnu/services/cups.scm'...
> Backtrace:

Apparently this bug does not occur with v3.0.0-23-g7dc90a17e¹.  It may
be that 00fbdfa7345765168e14438eed0b0b8c64c27ab9 reduces GC pressure,
which as a side effect makes the problem vanish.

It’s not satisfactory, but as a stop-gap measure, we could release 3.0.1
like this, which could make Guile 3 usable for Guix on AArch64.

Thoughts?

Ludo’.

¹ Specifically, I tested by (1) building a tarball with “make dist”, (2)
  running “guix build guile-next
  --with-source=guile-next=the-tarball.tar.gz”, and (3) running that
  Guile in the code above.  For some reason, Guile 3.0.0 built “by hand”
  would not reproduce the original bug, which is why I built it through
  Guix.




Information forwarded to bug-guile <at> gnu.org:
bug#39266; Package guile. (Wed, 19 Feb 2020 14:22:02 GMT) Full text and rfc822 format available.

Message #13 received at 39266 <at> debbugs.gnu.org (full text, mbox):

From: Brian Woodcox <bw <at> InSkyData.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 39266 <at> debbugs.gnu.org
Subject: Re: bug#39266: Finalization thread hits wrong-type-arg on weak vector
 (AArch64)
Date: Wed, 19 Feb 2020 07:19:57 -0700
Well, I would be happy, because right now I can’t do guix pull.

I tried multiple timed on aarch64 with no success.

Brian.


> On Feb 19, 2020, at 6:50 AM, Ludovic Courtès <ludo <at> gnu.org> wrote:
> 
> Ludovic Courtès <ludo <at> gnu.org> skribis:
> 
>> While building the “guix-system.drv” derivation on AArch64, I got this
>> crash (not fully deterministic but quite frequent).  Here the
>> finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e.,
>> accessing a one-element weak vector):
>> 
>> $ ( export out=$PWD/build; unset GUILE_LOAD_PATH; unset GUILE_LOAD_COMPILED_PATH; gdb --args "/gnu/store/p8in2npgl5yhliy25ikz7shjbq0gii95-guile-next-3.0.0/bin/guile" "--no-auto-compile" "-L" "/gnu/store/3qg8l6kr4wa9sbgwy00z1mb3p88xf455-module-import" "-C" "/gnu/store/h9qcvg71bmx735fsndagll9y7s72k9n9-module-import-compiled" guix-system-builder )
>> […]
>> loading 'gnu/services/cups.scm'...
>> Backtrace:
> 
> Apparently this bug does not occur with v3.0.0-23-g7dc90a17e¹.  It may
> be that 00fbdfa7345765168e14438eed0b0b8c64c27ab9 reduces GC pressure,
> which as a side effect makes the problem vanish.
> 
> It’s not satisfactory, but as a stop-gap measure, we could release 3.0.1
> like this, which could make Guile 3 usable for Guix on AArch64.
> 
> Thoughts?
> 
> Ludo’.
> 
> ¹ Specifically, I tested by (1) building a tarball with “make dist”, (2)
>  running “guix build guile-next
>  --with-source=guile-next=the-tarball.tar.gz”, and (3) running that
>  Guile in the code above.  For some reason, Guile 3.0.0 built “by hand”
>  would not reproduce the original bug, which is why I built it through
>  Guix.
> 
> 
> 




Information forwarded to bug-guile <at> gnu.org:
bug#39266; Package guile. (Sat, 29 Feb 2020 15:10:02 GMT) Full text and rfc822 format available.

Message #16 received at 39266 <at> debbugs.gnu.org (full text, mbox):

From: shtwzrd <shtwzrd <at> protonmail.com>
To: "39266 <at> debbugs.gnu.org" <39266 <at> debbugs.gnu.org>,
 Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: bug#39266: Finalization thread hits wrong-type-arg on weak vector
 (AArch64)
Date: Sat, 29 Feb 2020 15:09:42 +0000
[Message part 1 (text/plain, inline)]
Seconding what Brian said, I would also be happy with this solution.

Right now, guix users on aarch64 simply aren't using guile 3 because they can't pull, so this one bug probably acts as a blocker to encountering other potential bugs.

So even though it's not satisfactory, getting aarch64 working again could result in more and varied bug reports for guile on that platform, so that there may eventually be enough information to find the actual root cause of the problem.
[Message part 2 (text/html, inline)]

Information forwarded to bug-guile <at> gnu.org:
bug#39266; Package guile. (Mon, 09 Mar 2020 09:13:01 GMT) Full text and rfc822 format available.

Message #19 received at 39266 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 39988 <at> debbugs.gnu.org, 39266 <at> debbugs.gnu.org
Subject: Re: bug#39988: [3.0.1] Segfault in GC
Date: Mon, 09 Mar 2020 10:11:59 +0100
Ludovic Courtès <ludo <at> gnu.org> skribis:

> Apparently a deadlock on ‘all_weak_tables_lock’:

I reproduced the deadlock:

--8<---------------cut here---------------start------------->8---
(gdb) thread 1
[Switching to thread 1 (Thread 0x7faee3633b80 (LWP 5809))]
#0  0x00007faee3bcb0bc in __lll_lock_wait ()
   from /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libpthread.so.0
(gdb) bt
#0  0x00007faee3bcb0bc in __lll_lock_wait ()
   from /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libpthread.so.0
#1  0x00007faee3bc4674 in pthread_mutex_lock ()
   from /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libpthread.so.0
#2  0x00007faee3d22a2f in scm_c_make_weak_table (k=<optimized out>, kind=SCM_WEAK_TABLE_KIND_KEY)
    at weak-table.c:505
#3  0x00007faee139814b in ?? ()
#4  0x00007faee330fd80 in ?? ()
#5  0x00007faee3d89860 in ?? ()
   from /gnu/store/s5p2yja08zcg6j56y1wfvnm6nxiyllz1-guile-next-3.0.1/lib/libguile-3.0.so.1
#6  0x00007faee330fd80 in ?? ()
#7  0x00007faee3cc46eb in scm_jit_enter_mcode (thread=0x7faee330fd80, 
    mcode=0x7faedb2a87f0 "I\211\314I)\304I\203\374\020\017\217", <incomplete sequence \344>) at jit.c:5725
#8  0x00007faee3d1ff88 in vm_regular_engine (thread=0x7faed63eab30) at vm-engine.c:374
#9  0x00007faee3d20935 in scm_call_n (proc=<optimized out>, argv=argv <at> entry=0x7fff3f56ce98, nargs=nargs <at> entry=1)
    at vm.c:1600
#10 0x00007faee3c9d1e7 in scm_primitive_eval (exp=<optimized out>) at eval.c:671
#11 0x00007faee3cc63ab in scm_primitive_load (filename=<optimized out>) at load.c:131
#12 0x00007faee3d1f4fc in vm_regular_engine (thread=0x7faee330fd80) at vm-engine.c:972
#13 0x00007faee3d20935 in scm_call_n (proc=<optimized out>, argv=argv <at> entry=0x7fff3f56d068, nargs=nargs <at> entry=1)
    at vm.c:1600
#14 0x00007faee3c9d1e7 in scm_primitive_eval (exp=<optimized out>, 
    exp <at> entry=((@ (ice-9 control) %) (begin (set! %load-path (cons "." %load-path)) (set! %load-path (cons "." %load-path)) ((@@ (ice-9 command-line) load/lang) "./build-aux/compile-all.scm") (quit)))) at eval.c:671
#15 0x00007faee3c9d243 in scm_eval (
    exp=((@ (ice-9 control) %) (begin (set! %load-path (cons "." %load-path)) (set! %load-path (cons "." %load-path)) ((@@ (ice-9 command-line) load/lang) "./build-aux/compile-all.scm") (quit))), 
    module_or_state=module_or_state <at> entry="#<struct module>" = {...}) at eval.c:705
#16 0x00007faee3cf6860 in scm_shell (argc=834, argv=0x7fff3f56d6c8) at script.c:357
#17 0x00007faee3cb4bed in invoke_main_func (body_data=0x7fff3f56d570) at init.c:308
#18 0x00007faee3c97e3a in c_body (d=0x7fff3f56d4b0) at continuations.c:430
#19 0x00007faee3d1f4fc in vm_regular_engine (thread=0x7faee330fd80) at vm-engine.c:972
#20 0x00007faee3d20935 in scm_call_n (proc=<optimized out>, argv=argv <at> entry=0x7fff3f56d270, nargs=nargs <at> entry=2)
    at vm.c:1600
#21 0x00007faee3c9c07a in scm_call_2 (proc=<optimized out>, arg1=<optimized out>, arg2=<optimized out>)
    at eval.c:503
#22 0x00007faee3c9d87a in scm_c_with_exception_handler (type=type <at> entry=#t, 
    handler=handler <at> entry=0x7faee3d15d60 <catch_post_unwind_handler>, 
    handler_data=handler_data <at> entry=0x7fff3f56d3e0, thunk=thunk <at> entry=0x7faee3d15ea0 <catch_body>, 
    thunk_data=thunk_data <at> entry=0x7fff3f56d3e0) at exceptions.c:170
#23 0x00007faee3d1609d in scm_c_catch (tag=tag <at> entry=#t, body=body <at> entry=0x7faee3c97e30 <c_body>, 
    body_data=body_data <at> entry=0x7fff3f56d4b0, handler=handler <at> entry=0x7faee3c980d0 <c_handler>, 
    handler_data=handler_data <at> entry=0x7fff3f56d4b0, 
    pre_unwind_handler=pre_unwind_handler <at> entry=0x7faee3c97f30 <pre_unwind_handler>, 
    pre_unwind_handler_data=0x7faee15e63c0) at throw.c:168
#24 0x00007faee3c983e3 in scm_i_with_continuation_barrier (body=body <at> entry=0x7faee3c97e30 <c_body>, 
    body_data=body_data <at> entry=0x7fff3f56d4b0, handler=handler <at> entry=0x7faee3c980d0 <c_handler>, 
    handler_data=handler_data <at> entry=0x7fff3f56d4b0, 
    pre_unwind_handler=pre_unwind_handler <at> entry=0x7faee3c97f30 <pre_unwind_handler>, 
    pre_unwind_handler_data=0x7faee15e63c0) at continuations.c:368
#25 0x00007faee3c98475 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>)
    at continuations.c:464
#26 0x00007faee3d14b3f in with_guile (base=base <at> entry=0x7fff3f56d518, data=data <at> entry=0x7fff3f56d540)
    at threads.c:645
#27 0x00007faee3bf9a68 in GC_call_with_stack_base (fn=fn <at> entry=0x7faee3d14af0 <with_guile>, 
    arg=arg <at> entry=0x7fff3f56d540) at misc.c:1941
#28 0x00007faee3d14e58 in scm_i_with_guile (dynamic_state=<optimized out>, data=data <at> entry=0x7fff3f56d540, 
    func=func <at> entry=0x7faee3cb4bd0 <invoke_main_func>) at threads.c:688
#29 scm_with_guile (func=func <at> entry=0x7faee3cb4bd0 <invoke_main_func>, data=data <at> entry=0x7fff3f56d570)
    at threads.c:694
#30 0x00007faee3cb4d62 in scm_boot_guile (argc=argc <at> entry=834, argv=argv <at> entry=0x7fff3f56d6c8, 
    main_func=main_func <at> entry=0x401240 <inner_main>, closure=closure <at> entry=0x0) at init.c:291
#31 0x0000000000401100 in main (argc=834, argv=0x7fff3f56d6c8) at guile.c:95
(gdb) thread 5
[Switching to thread 5 (Thread 0x7faee0f93700 (LWP 5815))]
#0  0x00007faee3bc7efc in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libpthread.so.0
(gdb) bt
#0  0x00007faee3bc7efc in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libpthread.so.0
#1  0x00007faee3d15355 in scm_pthread_cond_wait (cond=<optimized out>, mutex=<optimized out>) at threads.c:1605
#2  0x00007faee3d15523 in block_self (queue=((#<unmatched-tag 277>) #<unmatched-tag 277>), 
    mutex=mutex <at> entry=0x7faee1201f80, waittime=waittime <at> entry=0x0) at threads.c:312
#3  0x00007faee3d15657 in lock_mutex (current_thread=0x7faee330fb40, waittime=0x0, m=0x7faee1201f80, 
    kind=SCM_MUTEX_RECURSIVE) at threads.c:1021
#4  scm_timed_lock_mutex (mutex=#<unmatched-tag 10377>, timeout=<optimized out>) at threads.c:1085
#5  0x00007faee13a663f in ?? ()
#6  0x00007faee330fb40 in ?? ()
#7  0x00007faee3d89860 in ?? ()
   from /gnu/store/s5p2yja08zcg6j56y1wfvnm6nxiyllz1-guile-next-3.0.1/lib/libguile-3.0.so.1
#8  0x00007faee330fb40 in ?? ()
#9  0x00007faee3cc46eb in scm_jit_enter_mcode (thread=0x7faee330fb40, 
    mcode=0x7faee1396410 "I\211\314I)\304I\203\374\020\017\214\272\002") at jit.c:5725
#10 0x00007faee3d1fc99 in vm_regular_engine (thread=0x7faee155ace8) at vm-engine.c:360
#11 0x00007faee3d20935 in scm_call_n (proc=<optimized out>, argv=argv <at> entry=0x7faee0f92090, nargs=nargs <at> entry=3)
    at vm.c:1600
#12 0x00007faee3c9c09f in scm_call_3 (proc=<optimized out>, arg1=arg1 <at> entry=(guile), arg2=<optimized out>, 
    arg3=arg3 <at> entry=#f) at eval.c:510
#13 0x00007faee3ccbf2f in scm_maybe_resolve_module (name=name <at> entry=(guile)) at modules.c:195
#14 0x00007faee3cb8898 in resolve_module (name=(guile), public_p=<optimized out>) at intrinsics.c:317
#15 0x00007faee3d1ef94 in vm_regular_engine (thread=0x7faee330fb40) at vm-engine.c:1583
#16 0x00007faee3d20935 in scm_call_n (proc=<optimized out>, argv=argv <at> entry=0x7faee0f92278, nargs=nargs <at> entry=1)
    at vm.c:1600
#17 0x00007faee3c9c058 in scm_call_1 (proc=<optimized out>, arg1=<optimized out>) at eval.c:496
#18 0x00007faee3d1f4fc in vm_regular_engine (thread=0x7faee330fb40) at vm-engine.c:972
#19 0x00007faee3d20935 in scm_call_n (proc=<optimized out>, argv=argv <at> entry=0x7faee0f92420, nargs=nargs <at> entry=4)
    at vm.c:1600
#20 0x00007faee3c9c0d4 in scm_call_4 (proc=<optimized out>, arg1=arg1 <at> entry="#<vector>" = {...}, 
    arg2=arg2 <at> entry=#<port #<port-type file 7faee159ab40> 7faee15e63c0>, arg3=arg3 <at> entry=#:count, 
    arg4=arg4 <at> entry=20) at eval.c:517
#21 0x00007faee3c8f5f9 in display_backtrace_body (a=<optimized out>) at backtrace.c:239
#22 0x00007faee3c9d87a in scm_c_with_exception_handler (type=type <at> entry=#t, 
    handler=handler <at> entry=0x7faee3d15d60 <catch_post_unwind_handler>, 
    handler_data=handler_data <at> entry=0x7faee0f925d0, thunk=thunk <at> entry=0x7faee3d15ea0 <catch_body>, 
    thunk_data=thunk_data <at> entry=0x7faee0f925d0) at exceptions.c:170
#23 0x00007faee3d1609d in scm_c_catch (tag=tag <at> entry=#t, body=body <at> entry=0x7faee3c8f4d0 <display_backtrace_body>, 
    body_data=body_data <at> entry=0x7faee0f92640, handler=handler <at> entry=0x7faee3c8f8b0 <error_during_backtrace>, 
    handler_data=handler_data <at> entry=0x7faee15e63c0, pre_unwind_handler=pre_unwind_handler <at> entry=0x0, 
    pre_unwind_handler_data=0x0) at throw.c:168
#24 0x00007faee3d160be in scm_internal_catch (tag=tag <at> entry=#t, 
    body=body <at> entry=0x7faee3c8f4d0 <display_backtrace_body>, body_data=body_data <at> entry=0x7faee0f92640, 
    handler=handler <at> entry=0x7faee3c8f8b0 <error_during_backtrace>, handler_data=handler_data <at> entry=0x7faee15e63c0)
    at throw.c:177
#25 0x00007faee3c8f4c5 in scm_display_backtrace_with_highlights (stack=stack <at> entry="#<struct stack>" = {...}, 
    port=port <at> entry=#<port #<port-type file 7faee159ab40> 7faee15e63c0>, first=first <at> entry=#f, 
    depth=depth <at> entry=#f, highlights=highlights <at> entry=()) at backtrace.c:277
#26 0x00007faee3c9801f in print_exception_and_backtrace (
    args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee11f7f50, tag=wrong-type-arg, 
    port=#<port #<port-type file 7faee159ab40> 7faee15e63c0>) at continuations.c:409
#27 pre_unwind_handler (error_port=0x7faee15e63c0, tag=wrong-type-arg, 
    args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee11f7f50)
    at continuations.c:453
#28 0x00007faee3d15e1b in catch_pre_unwind_handler (data=0x7faee0f92d80, 
    exn=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee11f7980) at throw.c:135
#29 0x00007faee3d1f4fc in vm_regular_engine (thread=0x7faee330fb40) at vm-engine.c:972
#30 0x00007faee3d20935 in scm_call_n (proc=proc <at> entry=#<unmatched-tag 10045>, argv=<optimized out>, nargs=5)
    at vm.c:1600
#31 0x00007faee3c9c3d4 in scm_apply_0 (proc=#<unmatched-tag 10045>, args=()) at eval.c:603
#32 0x00007faee3c9d07d in scm_apply_1 (proc=<optimized out>, arg1=arg1 <at> entry=wrong-type-arg, 
    args=args <at> entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee10d5610)
    at eval.c:609
#33 0x00007faee3d16259 in scm_throw (key=key <at> entry=wrong-type-arg, 
    args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee10d5610) at throw.c:262
#34 0x00007faee3d163a9 in scm_ithrow (key=key <at> entry=wrong-type-arg, args=<optimized out>, 
    no_return=no_return <at> entry=1) at throw.c:457
#35 0x00007faee3c9a585 in scm_error_scm (key=key <at> entry=wrong-type-arg, subr=<optimized out>, 
    message=message <at> entry="Wrong type argument in position ~A (expecting ~A): ~S", 
    args=args <at> entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee10d59a0, 
    data=data <at> entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee10d5ab0)
    at error.c:90
#36 0x00007faee3c9a61f in scm_error (key=wrong-type-arg, 
    subr=subr <at> entry=0x7faee3d4bf60 <s_scm_weak_vector_ref> "weak-vector-ref", 
    message=message <at> entry=0x7faee3d3d490 "Wrong type argument in position ~A (expecting ~A): ~S", 
    args=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee10d59a0, 
    rest=rest <at> entry=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faee10d5ab0)
    at error.c:62
#37 0x00007faee3c9a9e0 in scm_wrong_type_arg_msg (
    subr=subr <at> entry=0x7faee3d4bf60 <s_scm_weak_vector_ref> "weak-vector-ref", pos=pos <at> entry=1, 
    bad_value=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faec8cc8980, 
    szMessage=szMessage <at> entry=0x7faee3d4bee0 "weak vector") at error.c:282
#38 0x00007faee3d23716 in scm_c_weak_vector_ref (wv=<optimized out>, k=k <at> entry=0) at weak-vector.c:193
#39 0x00007faee3d22838 in scm_i_weak_car (
    pair=<error reading variable: ERROR: Cannot access memory at address 0x0>0x7faec7c7d7e0) at weak-list.h:39
#40 scm_i_visit_weak_list (list_loc=0x7faee3d8a868 <all_weak_tables>, visit=<optimized out>) at weak-list.h:49
#41 vacuum_all_weak_tables () at weak-table.c:494
#42 0x00007faee3ca5f2e in async_gc_finalizer (ptr=0x7faee3312ea0, data=0x0) at finalizers.c:316
#43 0x00007faee3bf26ef in GC_invoke_finalizers () at finalize.c:1276
#44 0x00007faee3ca63c9 in scm_run_finalizers () at finalizers.c:398
#45 0x00007faee3ca643d in finalization_thread_proc (unused=<optimized out>) at finalizers.c:233
#46 0x00007faee3c97e3a in c_body (d=0x7faee0f92e50) at continuations.c:430
#47 0x00007faee3d1f4fc in vm_regular_engine (thread=0x7faee330fb40) at vm-engine.c:972
#48 0x00007faee3d20935 in scm_call_n (proc=<optimized out>, argv=argv <at> entry=0x7faee0f92c10, nargs=nargs <at> entry=2)
    at vm.c:1600
#49 0x00007faee3c9c07a in scm_call_2 (proc=<optimized out>, arg1=<optimized out>, arg2=<optimized out>)
    at eval.c:503
#50 0x00007faee3c9d87a in scm_c_with_exception_handler (type=type <at> entry=#t, 
    handler=handler <at> entry=0x7faee3d15d60 <catch_post_unwind_handler>, 
    handler_data=handler_data <at> entry=0x7faee0f92d80, thunk=thunk <at> entry=0x7faee3d15ea0 <catch_body>, 
    thunk_data=thunk_data <at> entry=0x7faee0f92d80) at exceptions.c:170
#51 0x00007faee3d1609d in scm_c_catch (tag=tag <at> entry=#t, body=body <at> entry=0x7faee3c97e30 <c_body>, 
    body_data=body_data <at> entry=0x7faee0f92e50, handler=handler <at> entry=0x7faee3c980d0 <c_handler>, 
    handler_data=handler_data <at> entry=0x7faee0f92e50, 
    pre_unwind_handler=pre_unwind_handler <at> entry=0x7faee3c97f30 <pre_unwind_handler>, 
    pre_unwind_handler_data=0x7faee15e63c0) at throw.c:168
#52 0x00007faee3c983e3 in scm_i_with_continuation_barrier (body=body <at> entry=0x7faee3c97e30 <c_body>, 
    body_data=body_data <at> entry=0x7faee0f92e50, handler=handler <at> entry=0x7faee3c980d0 <c_handler>, 
    handler_data=handler_data <at> entry=0x7faee0f92e50, 
    pre_unwind_handler=pre_unwind_handler <at> entry=0x7faee3c97f30 <pre_unwind_handler>, 
    pre_unwind_handler_data=0x7faee15e63c0) at continuations.c:368
#53 0x00007faee3c98475 in scm_c_with_continuation_barrier (func=<optimized out>, data=<optimized out>)
    at continuations.c:464
#54 0x00007faee3d14b3f in with_guile (base=base <at> entry=0x7faee0f92eb8, data=data <at> entry=0x7faee0f92ee0)
    at threads.c:645
#55 0x00007faee3bf9a68 in GC_call_with_stack_base (fn=fn <at> entry=0x7faee3d14af0 <with_guile>, 
    arg=arg <at> entry=0x7faee0f92ee0) at misc.c:1941
#56 0x00007faee3d14e58 in scm_i_with_guile (dynamic_state=<optimized out>, data=<optimized out>, 
    func=<optimized out>) at threads.c:688
#57 scm_with_guile (func=<optimized out>, data=<optimized out>) at threads.c:694
#58 0x00007faee3bc2015 in start_thread ()
   from /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libpthread.so.0
#59 0x00007faee372891f in clone () from /gnu/store/ahqgl4h89xqj695lgqvsaf6zh2nhy4pj-glibc-2.29/lib/libc.so.6
--8<---------------cut here---------------end--------------->8---

It stems from the bug described in
<https://issues.guix.gnu.org/issue/39266>, this time on x86_64.

Ludo’.




Information forwarded to bug-guile <at> gnu.org:
bug#39266; Package guile. (Mon, 09 Mar 2020 14:39:01 GMT) Full text and rfc822 format available.

Message #22 received at 39266 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 39266 <at> debbugs.gnu.org
Subject: Re: bug#39266: Finalization thread hits wrong-type-arg on weak vector
 (AArch64)
Date: Mon, 09 Mar 2020 15:38:45 +0100
Ludovic Courtès <ludo <at> gnu.org> skribis:

> While building the “guix-system.drv” derivation on AArch64, I got this
> crash (not fully deterministic but quite frequent).  Here the
> finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e.,
> accessing a one-element weak vector):

With 3.0.1, I can reproduce the bug on x86_64.  With rr (thanks, Andy!),
I found this (starting from the point where the type cell of the weak
vector is zeroed, and reverse-continuing until its gets its original
value of 0x10f):

--8<---------------cut here---------------start------------->8---
(rr) frame 40
#40 0x00007ffff7f2e66d in scm_i_weak_car (pair=0x7fffe15af690) at ../libguile/pairs.h:190
190	  return SCM_CAR (x);
(rr) down
#39 0x00007ffff7f2f576 in scm_c_weak_vector_ref (wv=<optimized out>, k=k <at> entry=0) at weak-vector.c:193
193	  SCM_VALIDATE_WEAK_VECTOR (1, wv);
(rr) 
#38 0x00007ffff7ea7ba0 in scm_wrong_type_arg_msg (
    subr=subr <at> entry=0x7ffff7f56f00 <s_scm_weak_vector_ref> "weak-vector-ref", pos=pos <at> entry=1, 
    bad_value=0x7fffec472b90, szMessage=szMessage <at> entry=0x7ffff7f56e80 "weak vector") at error.c:282
282	      scm_error (scm_arg_type_key,
(rr) p *((void**)0x7fffec472b90)
$1 = (void *) 0x0
(rr) watch *((void**)0x7fffec472b90)
Hardware watchpoint 1: *((void**)0x7fffec472b90)
(rr) reverse-cont
Continuing.

Thread 1 received signal SIGCONT, Continued.
[Switching to Thread 27074.27074]
__lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:101
101	../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: Dosiero aŭ dosierujo ne ekzistas.
(rr) 
Continuing.

Thread 1 hit Hardware watchpoint 1: *((void**)0x7fffec472b90)

Old value = (void *) 0x0
New value = (void *) 0x10f
__memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:259
259	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Dosiero aŭ dosierujo ne ekzistas.
(rr) bt
#0  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:259
#1  0x00007ffff7f1d499 in set_vtable_access_fields (vtable=vtable <at> entry=0x7fffeb48ee80) at struct.c:143
#2  0x00007ffff7f1dd8d in scm_i_struct_inherit_vtable_magic (vtable=vtable <at> entry=0x7ffff4e32fa0, 
    obj=obj <at> entry=0x7fffeb48ee80) at struct.c:215
#3  0x00007ffff7f1dfea in scm_c_make_structv (vtable=0x7ffff4e32fa0, n_tail=<optimized out>, n_init=8, 
    init=0x7fffffff50d0) at struct.c:364
#4  0x00007ffff7f1e0b9 in scm_make_struct_no_tail (vtable=0x7ffff4e32fa0, init=0x304) at struct.c:491
--8<---------------cut here---------------end--------------->8---

Bingo!  There’s a mismatch in struct.c:

--8<---------------cut here---------------start------------->8---
  bitmask_size = (nfields + 31U) / 32U;
  unboxed_fields = scm_gc_malloc_pointerless (bitmask_size, "unboxed fields");
  memset (unboxed_fields, 0, bitmask_size * sizeof(*unboxed_fields));
--8<---------------cut here---------------end--------------->8---

Pushed a fix as 7c17655cd3d859bf0c5a86d9782a7788205fc05a.

Thanks, rr!  You made my day!  :-)

Now testing Guix builds on x86_64, i686, ARMv7, and AArch64 to see if
that addresses seemingly related issues.

Ludo’.




Information forwarded to bug-guile <at> gnu.org:
bug#39266; Package guile. (Mon, 09 Mar 2020 22:20:01 GMT) Full text and rfc822 format available.

Message #25 received at 39266 <at> debbugs.gnu.org (full text, mbox):

From: Pierre Langlois <pierre.langlois <at> gmx.com>
To: bug-guile <at> gnu.org
Cc: 39266 <at> debbugs.gnu.org
Subject: Re: bug#39266: Finalization thread hits wrong-type-arg on weak vector
 (AArch64)
Date: Mon, 09 Mar 2020 22:19:49 +0000
Hi Ludo,

Ludovic Courtès writes:

> Ludovic Courtès <ludo <at> gnu.org> skribis:
>
>> While building the “guix-system.drv” derivation on AArch64, I got this
>> crash (not fully deterministic but quite frequent).  Here the
>> finalization thread gets a wrong-type-arg in ‘scm_i_weak_car’ (i.e.,
>> accessing a one-element weak vector):
>
> With 3.0.1, I can reproduce the bug on x86_64.  With rr (thanks, Andy!),
> I found this (starting from the point where the type cell of the weak
> vector is zeroed, and reverse-continuing until its gets its original
> value of 0x10f):
>
> --8<---------------cut here---------------start------------->8---
> (rr) frame 40
> #40 0x00007ffff7f2e66d in scm_i_weak_car (pair=0x7fffe15af690) at ../libguile/pairs.h:190
> 190	  return SCM_CAR (x);
> (rr) down
> #39 0x00007ffff7f2f576 in scm_c_weak_vector_ref (wv=<optimized out>, k=k <at> entry=0) at weak-vector.c:193
> 193	  SCM_VALIDATE_WEAK_VECTOR (1, wv);
> (rr) 
> #38 0x00007ffff7ea7ba0 in scm_wrong_type_arg_msg (
>     subr=subr <at> entry=0x7ffff7f56f00 <s_scm_weak_vector_ref> "weak-vector-ref", pos=pos <at> entry=1, 
>     bad_value=0x7fffec472b90, szMessage=szMessage <at> entry=0x7ffff7f56e80 "weak vector") at error.c:282
> 282	      scm_error (scm_arg_type_key,
> (rr) p *((void**)0x7fffec472b90)
> $1 = (void *) 0x0
> (rr) watch *((void**)0x7fffec472b90)
> Hardware watchpoint 1: *((void**)0x7fffec472b90)
> (rr) reverse-cont
> Continuing.
>
> Thread 1 received signal SIGCONT, Continued.
> [Switching to Thread 27074.27074]
> __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:101
> 101	../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: Dosiero aŭ dosierujo ne ekzistas.
> (rr) 
> Continuing.
>
> Thread 1 hit Hardware watchpoint 1: *((void**)0x7fffec472b90)
>
> Old value = (void *) 0x0
> New value = (void *) 0x10f
> __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:259
> 259	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Dosiero aŭ dosierujo ne ekzistas.
> (rr) bt
> #0  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:259
> #1  0x00007ffff7f1d499 in set_vtable_access_fields (vtable=vtable <at> entry=0x7fffeb48ee80) at struct.c:143
> #2  0x00007ffff7f1dd8d in scm_i_struct_inherit_vtable_magic (vtable=vtable <at> entry=0x7ffff4e32fa0, 
>     obj=obj <at> entry=0x7fffeb48ee80) at struct.c:215
> #3  0x00007ffff7f1dfea in scm_c_make_structv (vtable=0x7ffff4e32fa0, n_tail=<optimized out>, n_init=8, 
>     init=0x7fffffff50d0) at struct.c:364
> #4  0x00007ffff7f1e0b9 in scm_make_struct_no_tail (vtable=0x7ffff4e32fa0, init=0x304) at struct.c:491
> --8<---------------cut here---------------end--------------->8---
>
> Bingo!  There’s a mismatch in struct.c:
>
> --8<---------------cut here---------------start------------->8---
>   bitmask_size = (nfields + 31U) / 32U;
>   unboxed_fields = scm_gc_malloc_pointerless (bitmask_size, "unboxed fields");
>   memset (unboxed_fields, 0, bitmask_size * sizeof(*unboxed_fields));
> --8<---------------cut here---------------end--------------->8---

Oh wow, scary! That was some nice debugging, these types of bugs can be
really hard to get to the bottom of.

>
> Pushed a fix as 7c17655cd3d859bf0c5a86d9782a7788205fc05a.
>
> Thanks, rr!  You made my day!  :-)
>
> Now testing Guix builds on x86_64, i686, ARMv7, and AArch64 to see if
> that addresses seemingly related issues.

I've tested it on AArch64 and it's looking good, I'm running Guile 3
finally! I've tested by running 'guix pull --branch=wip-guile-3.0.1' on
a rockpro64 running the Guix system, I've then reconfigured and rebooted
and it's all good.

Thanks so much for the fix! Hopefully it'll work on every platform and
that can be the end of it :-).

Pierre




Information forwarded to bug-guile <at> gnu.org:
bug#39266; Package guile. (Mon, 09 Mar 2020 22:21:02 GMT) Full text and rfc822 format available.

Merged 39266 39988. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Tue, 10 Mar 2020 17:24:02 GMT) Full text and rfc822 format available.

Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Tue, 10 Mar 2020 17:27:01 GMT) Full text and rfc822 format available.

Notification sent to Ludovic Courtès <ludo <at> gnu.org>:
bug acknowledged by developer. (Tue, 10 Mar 2020 17:27:02 GMT) Full text and rfc822 format available.

Message #35 received at 39266-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Pierre Langlois <pierre.langlois <at> gmx.com>
Cc: 39266-done <at> debbugs.gnu.org
Subject: Re: bug#39266: Finalization thread hits wrong-type-arg on weak vector
 (AArch64)
Date: Tue, 10 Mar 2020 18:25:55 +0100
Hi Pierre,

Pierre Langlois <pierre.langlois <at> gmx.com> skribis:

> I've tested it on AArch64 and it's looking good, I'm running Guile 3
> finally! I've tested by running 'guix pull --branch=wip-guile-3.0.1' on
> a rockpro64 running the Guix system, I've then reconfigured and rebooted
> and it's all good.

Thanks for testing!

> Thanks so much for the fix! Hopefully it'll work on every platform and
> that can be the end of it :-).

Yup, I’ve tested ‘guix pull --branch=wip-guile-3.0.1’ and ‘guix build
guile3.0-guix’ on all 4 architectures that Guix supports, and everything
is fine.

I’ve now pushed the upgrade to 3.0.1 + patch to Guix.

Closing!  \o/

The bug appears to be rare for Guile workloads not as intensive as a
Guix build (never reported, never seen), but we should still probably do
a bug-fix 3.0.2 release in the coming weeks, I guess.

Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Tue, 10 Mar 2020 17:27:02 GMT) Full text and rfc822 format available.

Notification sent to Ludovic Courtès <ludo <at> gnu.org>:
bug acknowledged by developer. (Tue, 10 Mar 2020 17:27:02 GMT) Full text and rfc822 format available.

Changed bug title to 'Heap corruption leads to random crashes' from 'Finalization thread hits wrong-type-arg on weak vector (AArch64)' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Wed, 11 Mar 2020 20:22:02 GMT) Full text and rfc822 format available.

Merged 36811 36812 39241 39266 39988. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Thu, 12 Mar 2020 16:00:02 GMT) Full text and rfc822 format available.

Merged 36811 36812 39208 39241 39266 39988. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Thu, 12 Mar 2020 16:02:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 10 Apr 2020 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 354 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.