GNU bug report logs - #58732
installer: finalizers & device destroy segfault

Previous Next

Package: guix;

Reported by: Mathieu Othacehe <othacehe <at> gnu.org>

Date: Sun, 23 Oct 2022 09:08:01 UTC

Severity: important

Done: Mathieu Othacehe <othacehe <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 58732 in the body.
You can then email your comments to 58732 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#58732; Package guix. (Sun, 23 Oct 2022 09:08:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mathieu Othacehe <othacehe <at> gnu.org>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Sun, 23 Oct 2022 09:08:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: bug-guix <at> gnu.org
Subject: installer: finalizers & device destroy segfault
Date: Sun, 23 Oct 2022 11:07:31 +0200
Hello,

I found a segfault in the installer by running those steps:

- Run an automatic partitioning with separate home and no encryption
- In the final configuration page, come back to partitioning
- Remove all partitions but the ESP one, create a new btrfs root
- partition
- Repeat until the crash occurs

Using Josselin's instructions here: https://issues.guix.gnu.org/57513, I
was able to get the following backtrace:

--8<---------------cut here---------------start------------->8---
Reading symbols from /gnu/store/b0ymz7vjfkcvhbci49q5yk1fi0l9lq49-parted-3.5/lib/libparted.so...
(gdb) bt
#0  linux_destroy (dev=0x1dc89e0) at arch/linux.c:1615
#1  0x00007f8941aecd37 in ?? () from /gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#2  0x00007f8941a45e3f in GC_invoke_finalizers () from /gnu/store/2lczkxbdbzh4gk7wh91bzrqrk7h5g1dl-libgc-8.0.4/lib/libgc.so.1
#3  0x00007f8941aed429 in scm_run_finalizers () from /gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#4  0x00007f8941af4482 in ?? () from /gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#5  0x00007f8941ae085a in ?? () from /gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#6  0x00007f8941b6d336 in ?? () from /gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#7  0x00007f8941b7a5e9 in scm_call_n () from /gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#8  0x00007f8941ae209a in scm_call_2 () from /gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#9  0x00007f8941b98752 in ?? () from /gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#10 0x00007f8941b6a88f in scm_c_catch () from /gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#11 0x00007f8941ae2e66 in scm_c_with_continuation_barrier () from /gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#12 0x00007f8941b69b39 in ?? () from /gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#13 0x00007f8941a400ba in GC_call_with_stack_base () from /gnu/store/2lczkxbdbzh4gk7wh91bzrqrk7h5g1dl-libgc-8.0.4/lib/libgc.so.1
#14 0x00007f8941b628b8 in scm_with_guile () from /gnu/store/1jgcbdzx2ss6xv59w55g3kr3x4935dfb-guile-3.0.8/lib/libguile-3.0.so.1
#15 0x00007f8941a16d7e in ?? () from /gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libpthread.so.0
#16 0x00007f8941614eff in clone () from /gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/lib/libc.so.6
--8<---------------cut here---------------end--------------->8---

linux_destroy is the PedDevice destruction function. The crash occurs
when dereferencing the arch_specific pointer which is ...

--8<---------------cut here---------------start------------->8---
(gdb) p dev
$1 = (PedDevice *) 0x1dc89e0
(gdb) p *dev
$2 = {next = 0x1, model = 0x1645d50 "", path = 0x0, type = PED_DEVICE_UNKNOWN, sector_size = 0, phys_sector_size = 1, length = 23272720, open_count = 0, read_only = 1, external_mode = 0, dirty = 0, boot_dirty = 0, hw_geom = {
    cylinders = 0, heads = 2, sectors = 0}, bios_geom = {cylinders = 23259184, heads = 0, sectors = 0}, host = 1, did = 0, arch_specific = 0x0}
(gdb) p dev->arch_specific 
$3 = (void *) 0x0
--8<---------------cut here---------------end--------------->8---

null! I guess this has to deal with device pointer finalizers. I'm a bit
disappointed because I thought we had overcome those mistakes.

Thanks,

Mathieu




Added indication that bug 58732 blocks53214 Request was from Mathieu Othacehe <mathieu <at> meije.mail-host-address-is-not-set> to control <at> debbugs.gnu.org. (Sun, 23 Oct 2022 10:31:01 GMT) Full text and rfc822 format available.

Severity set to 'important' from 'normal' Request was from Mathieu Othacehe <mathieu <at> meije.mail-host-address-is-not-set> to control <at> debbugs.gnu.org. (Sun, 23 Oct 2022 10:32:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#58732; Package guix. (Wed, 02 Nov 2022 10:56:02 GMT) Full text and rfc822 format available.

Message #12 received at 58732 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 58732 <at> debbugs.gnu.org
Subject: Re: bug#58732: installer: finalizers & device destroy segfault
Date: Wed, 02 Nov 2022 11:55:37 +0100
Hi,

Mathieu Othacehe <othacehe <at> gnu.org> skribis:

> null! I guess this has to deal with device pointer finalizers. I'm a bit
> disappointed because I thought we had overcome those mistakes.

There are several things we should audit in Guile-Parted regarding
object lifecycles.

Common issues when writing bindings that could cause problems like what
you report:

  1. Bindings create wrappers for C pointers—e.g., with
     ‘pointer->device’.  If several C functions return a pointer P, you
     must make sure to return always the same wrapper and not create a
     new one.

     ‘pointer->device!’ attempts to do that but I think it’s bogus: it
     uses a weak-value hash table, where the value is the wrapper.  So
     if the wrapper disappears before the underlying C object, then the
     pointer is called and bad things ensue.

     ‘define-wrapped-pointer-type’ in Guile is meant to help with these
     things (info "(guile) Void Pointers and Byte Access").  We can’t
     use it directly here because we’re using bytestructures and all
     that.

  2. Scheme wrappers must mirror the aggregation relations of the C
     objects they wrap.

     For instance, let’s say a PedDisk aggregates PedPartition
     objects—i.e., that PedDisk “owns” them and outlives them.  Then the
     Scheme <disk> wrappers must outlive the Scheme <partition> wrappers
     too, so that we don’t end up calling partition finalizers while the
     disk that owns them is still alive.  This is done using a weak-key
     hash table keyed by <disk> objects in this case and populated
     anytime a <disk> getter returns a <partition>.  (See for example
     ‘%commit-owners’ in Guile-Git.)

I think we should audit Guile-Parted with these issues in mind.

WDYT?

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#58732; Package guix. (Thu, 03 Nov 2022 11:10:01 GMT) Full text and rfc822 format available.

Message #15 received at 58732 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 58732 <at> debbugs.gnu.org
Subject: Re: bug#58732: installer: finalizers & device destroy segfault
Date: Thu, 03 Nov 2022 12:09:16 +0100
Hey,

Thanks for your help :)

>   1. Bindings create wrappers for C pointers—e.g., with
>      ‘pointer->device’.  If several C functions return a pointer P, you
>      must make sure to return always the same wrapper and not create a
>      new one.

Agreed.

>
>      ‘pointer->device!’ attempts to do that but I think it’s bogus: it
>      uses a weak-value hash table, where the value is the wrapper.  So
>      if the wrapper disappears before the underlying C object, then the
>      pointer is called and bad things ensue.

I'm not sure to understand how could the wrapper disappear before the
underlying C object? We are only exposing <device> records to the
Guile-Parted users so my assumption is that when <device> goes out of
scope, the pointer it wraps can be freed, but I'm maybe missing
something?

>      ‘define-wrapped-pointer-type’ in Guile is meant to help with these
>      things (info "(guile) Void Pointers and Byte Access").  We can’t
>      use it directly here because we’re using bytestructures and all
>      that.

Turns out, the "wrap" procedure defined in define-wrapped-pointer-type
is a clone of pointer->device! except that it doesn't set a
finalizer.

Regarding object lifetime, I wrote a small memo in 2019 here:
https://issues.guix.gnu.org/36402#11.

We have three weak hash tables in Guile-Parted:

%devices: To make sure that we do not set multiple finalizers on the
same pointers.

%disk-devices: So that a device always outlives its disks.

%partition-disks: So that a disk always outlives its partitions.

This means that as far as I can tell we are OK regarding your second
point about "aggregation relations".

Mathieu




Information forwarded to bug-guix <at> gnu.org:
bug#58732; Package guix. (Thu, 03 Nov 2022 11:26:02 GMT) Full text and rfc822 format available.

Message #18 received at 58732 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 58732 <at> debbugs.gnu.org
Subject: Re: bug#58732: installer: finalizers & device destroy segfault
Date: Thu, 03 Nov 2022 12:25:39 +0100
Hi!

Mathieu Othacehe <othacehe <at> gnu.org> skribis:

>>      ‘pointer->device!’ attempts to do that but I think it’s bogus: it
>>      uses a weak-value hash table, where the value is the wrapper.  So
>>      if the wrapper disappears before the underlying C object, then the
>>      pointer is called and bad things ensue.
>
> I'm not sure to understand how could the wrapper disappear before the
> underlying C object? We are only exposing <device> records to the
> Guile-Parted users so my assumption is that when <device> goes out of
> scope, the pointer it wraps can be freed, but I'm maybe missing
> something?

Hmm you’re right (and yes it’s the same as ‘define-wrapped-pointer-type’
does).  So that should be fine.

> Regarding object lifetime, I wrote a small memo in 2019 here:
> https://issues.guix.gnu.org/36402#11.

Nice, though it does feel like we’re running in circles.  :-)

> We have three weak hash tables in Guile-Parted:
>
> %devices: To make sure that we do not set multiple finalizers on the
> same pointers.
>
> %disk-devices: So that a device always outlives its disks.
>
> %partition-disks: So that a disk always outlives its partitions.
>
> This means that as far as I can tell we are OK regarding your second
> point about "aggregation relations".

OK.

Another thing to keep in mind: finalizers run in a separate thread, so
finalization can happen concurrently.  That can be problematic is there
is shared global state in the library that’s being access when an
benign-looking free function is called.

Could you show the backtrace of the other threads as well, preferably
with debugging info?

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#58732; Package guix. (Sun, 06 Nov 2022 17:18:02 GMT) Full text and rfc822 format available.

Message #21 received at 58732 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 58732 <at> debbugs.gnu.org
Subject: Re: bug#58732: installer: finalizers & device destroy segfault
Date: Sun, 06 Nov 2022 18:17:08 +0100
[Message part 1 (text/plain, inline)]
Hey,

I made some progress on that one. I think, this is what's going on:

1. Two new PedDevice A and B are malloc'ed by the libparted when opening
the installer partitioning page.

2. They are added to the %devices weak hash table by pointer->device!
and their respective finalizers are registered.

3. The partitioning ends and A goes out of scope. It is eventually
removed from %devices but it does not mean its finalizer will be run
immediately.

4. The partitioning is restarted using the installer menu. B is still in
the %devices hash table. However, A is now gone and is added again to
the %devices hash table by the pointer->device! procedure. Another
finalizer is registered for A.

That's because set-pointer-finalizer! does not *set* a finalizer it
*adds* one.

5. The partitioning ends and both A and B goes out of scope. They are
removed from %devices and their finalizers are called. The A finalizer
is called twice resulting in a double free.

This race condition is created by the fact that there is a time window
where the device is removed from the %devices hash table but its
finalizer is not immediately called.

If set-pointer-finalizer! actually called scm_i_set_finalizer instead of
scm_i_add_finalizer the A finalizer would be set twice but called only
once. Do you think it would be an option?

I attached the instrumentation patches (good old printf's) as well as
the syslog I based my analysis upon.

Thanks,

Mathieu
[diff_guix.txt (text/plain, inline)]
diff --git a/gnu/installer/parted.scm b/gnu/installer/parted.scm
index 82375d29e3..381e1b3ce7 100644
--- a/gnu/installer/parted.scm
+++ b/gnu/installer/parted.scm
@@ -1502,6 +1502,7 @@ (define (user-partitions->configuration user-partitions)
 
 (define (init-parted)
   "Initialize libparted support."
+  (%parted-syslog-port (syslog-port))
   (probe-all-devices!)
   ;; Remove all logical devices, otherwise "device-is-busy?" will report true
   ;; on all devices containaing active logical volumes.
[diff_parted.txt (text/plain, inline)]
diff -aur parted/libparted/arch/linux.c tmp/parted-3.5/libparted/arch/linux.c
--- parted/libparted/arch/linux.c	2022-11-04 10:14:33.551737324 +0100
+++ tmp/parted-3.5/libparted/arch/linux.c	2022-04-18 20:38:45.000000000 +0200
@@ -17,7 +17,6 @@
 
 #define PROC_DEVICES_BUFSIZ 16384
 
-#include <syslog.h>
 #include <config.h>
 #include <arch/linux.h>
 #include <linux/blkpg.h>
@@ -44,7 +43,6 @@
 #include <assert.h>
 #include <sys/sysmacros.h>
 #ifdef ENABLE_DEVICE_MAPPER
-
 #include <libdevmapper.h>
 #endif
 
@@ -89,8 +87,6 @@
 #define WR_MODE (O_WRONLY)
 #define RW_MODE (O_RDWR)
 
-int syslog_init;
-
 struct hd_geometry {
         unsigned char heads;
         unsigned char sectors;
@@ -1600,11 +1596,6 @@
                                 _("ped_device_new()  Unsupported device type"));
                 goto error_free_arch_specific;
         }
-        if (!syslog_init) {
-                openlog("parted", LOG_PID, LOG_USER);
-                syslog_init = 1;
-        }
-        syslog(LOG_INFO, "parted: new: %p\n", dev);
         return dev;
 
 error_free_arch_specific:
@@ -1620,8 +1611,6 @@
 static void
 linux_destroy (PedDevice* dev)
 {
-        syslog(LOG_INFO, "parted: destroy: %p\n", dev);
-
         LinuxSpecific *arch_specific = LINUX_SPECIFIC(dev);
         void *p = arch_specific->dmtype;
 
[diff_guile_parted.txt (text/plain, inline)]
diff --git a/parted/device.scm b/parted/device.scm
index 9f688dd..36d83f4 100644
--- a/parted/device.scm
+++ b/parted/device.scm
@@ -23,7 +23,7 @@
   #:use-module (parted geom)
   #:use-module (parted natmath)
   #:use-module (parted structs)
-  #:export (parted-syslog-port
+  #:export (%parted-syslog-port
             probe-all-devices!
             get-device
             get-device-next
@@ -44,8 +44,8 @@
             device-get-minimum-alignment
             device-get-optimum-alignment))
 
-(define parted-syslog-port
-  (make-parameter #f))
+(define %parted-syslog-port
+  (make-parameter #t))
 
 ;; Record all devices, so that pointer finalizers are only set once,
 ;; even if get-device returns an already known pointer.  Use the
@@ -58,22 +58,22 @@
 (define (pointer->device! pointer)
   ;; Check if a finalizer is already registered for this pointer.
   (format (%parted-syslog-port)
-          "guile-parted: pointer->device!: ~a" pointer)
+          "guile-parted: pointer->device!: ~a~%" pointer)
 
   (format (%parted-syslog-port)
-          "guile-parted: hash begin")
+          "guile-parted: hash begin~%")
   (hash-for-each (lambda (k v)
                    (format (%parted-syslog-port)
-                           "guile-parted: hash: ~a -> ~a" k v))
+                           "guile-parted: hash: ~a -> ~a~%" k v))
                  %devices)
   (format (%parted-syslog-port)
-          "guile-parted: hash end")
+          "guile-parted: hash end~%")
 
   (or (hash-ref %devices pointer)
       (let ((device (pointer->device pointer)))
 
         (format (%parted-syslog-port)
-          "guile-parted: finalizer!: ~a" pointer)
+          "guile-parted: finalizer!: ~a~%" pointer)
 
         ;; Contrary to its name, this "adds" a finalizer.
         (set-pointer-finalizer! pointer %device-destroy)
[messages (application/octet-stream, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#58732; Package guix. (Mon, 07 Nov 2022 13:30:02 GMT) Full text and rfc822 format available.

Message #24 received at 58732 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 58732 <at> debbugs.gnu.org
Subject: Re: bug#58732: installer: finalizers & device destroy segfault
Date: Mon, 07 Nov 2022 14:29:45 +0100
Hi Mathieu,

Mathieu Othacehe <othacehe <at> gnu.org> skribis:

> I made some progress on that one. I think, this is what's going on:
>
> 1. Two new PedDevice A and B are malloc'ed by the libparted when opening
> the installer partitioning page.
>
> 2. They are added to the %devices weak hash table by pointer->device!
> and their respective finalizers are registered.
>
> 3. The partitioning ends and A goes out of scope. It is eventually
> removed from %devices but it does not mean its finalizer will be run
> immediately.
>
> 4. The partitioning is restarted using the installer menu. B is still in
> the %devices hash table. However, A is now gone and is added again to
> the %devices hash table by the pointer->device! procedure. Another
> finalizer is registered for A.
>
> That's because set-pointer-finalizer! does not *set* a finalizer it
> *adds* one.

Oh, I think I see what you mean.  You’re right about
‘set-pointer-finalizer!’ adding a finalizer, but I don’t think that’s
what’s happening here.

Finalizers are set on pointer objects, so they’re invoked when the
pointer object goes out of scope.  But:

  (eq? (make-pointer 123) (make-pointer 123))
  => #f

So a possible mistake is to add one finalizer on each pointer object and
have several pointer objects aliasing the same C object; that’s how you
can get the same “free” function called several times on the same C
object.

> 5. The partitioning ends and both A and B goes out of scope. They are
> removed from %devices and their finalizers are called. The A finalizer
> is called twice resulting in a double free.
>
> This race condition is created by the fact that there is a time window
> where the device is removed from the %devices hash table but its
> finalizer is not immediately called.

What if we create an extra hashv table that maps pointer values
(integers) to pointer objects?

  (define %pointers (make-hash-table))

  (define (canonical-pointer ptr)
    (or (hashv-ref %pointers (pointer-address ptr))
        (begin
          (hashv-set! %pointers (pointer-address ptr) ptr)
          ptr)))

This is kinda terrible but it would allow us to test the above
hypothesis.

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#58732; Package guix. (Mon, 07 Nov 2022 16:38:02 GMT) Full text and rfc822 format available.

Message #27 received at 58732 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 58732 <at> debbugs.gnu.org
Subject: Re: bug#58732: installer: finalizers & device destroy segfault
Date: Mon, 07 Nov 2022 17:37:24 +0100
[Message part 1 (text/plain, inline)]
Hola,

> Finalizers are set on pointer objects, so they’re invoked when the
> pointer object goes out of scope.  But:
>
>   (eq? (make-pointer 123) (make-pointer 123))
>   => #f

I agree, but somehow this works:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,use (parted)
scheme@(guile-user)> (eq? (get-device "/tmp/test.img") (get-device "/tmp/test.img"))
$3 = #t
--8<---------------cut here---------------end--------------->8---

denoting that the "pointer->device!" procedure is working correctly and
the underlying pointer object returned by pointer->procedure is the
same.

> So a possible mistake is to add one finalizer on each pointer object and
> have several pointer objects aliasing the same C object; that’s how you
> can get the same “free” function called several times on the same C
> object.

I don't think that what's happening. I have monitored closely the
%devices weak hash table and it never exceeds the total device count.

We have multiple finalizers registered for the same C pointer but that's
because the weak hash table may be cleaned by (gc) calls, leaving the
opportunity for multiple finalizers registration on the same C pointer.

I attached a reproducer that exposes the double free issue.

--8<---------------cut here---------------start------------->8---
sudo -E guile ~/tmp/parted-bug.scm
double free or corruption (!prev)
Aborted
--8<---------------cut here---------------end--------------->8---

We could save up somewhere which pointers have registered finalizers but
that would prevent the devices garbage collection, in the same way as if
%device was a plain hash table and not a weak one.

That could well be a solution, as I cannot see at the moment how we
could preserve this mechanism and avoid multiple finalization.

Thanks,

Mathieu
[parted-bug.scm (application/octet-stream, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#58732; Package guix. (Wed, 09 Nov 2022 15:27:02 GMT) Full text and rfc822 format available.

Message #30 received at 58732 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 58732 <at> debbugs.gnu.org
Subject: Re: bug#58732: installer: finalizers & device destroy segfault
Date: Wed, 09 Nov 2022 16:25:55 +0100
[Message part 1 (text/plain, inline)]
Hey,

I ran further tests and my understanding is that the weak hash-table /
finalizer mechanism is not compatible with a C function that can return
multiple times the same allocated object.

Even if we were to introduce a set-pointer-unique-finalizer! procedure
that calls scm_i_set_finalizer instead of scm_i_add_finalizer we would
still have double free errors because the finalizers are registered on
SCM pointers and not on libparted C pointers when calling
GC_REGISTER_FINALIZER_NO_ORDER.

I tested it out and I had several SCM pointers encapsulating the same
libparted C pointer, thus multiple finalizers on the same underlying C
pointer.

Anyway, here is a patch that solves the issue by removing the device
finalizer. It also means that all devices are persisted until the end of
the program which doesn't feel right, but I cannot think of a better
solution.

Let me know if you agree with my reasoning :)

Thanks,

Mathieu
[0001-Remove-the-finalizer-on-device-pointers.patch (text/x-patch, inline)]
From 066220a75c020b818aab9c2f5c3a7db835fa871a Mon Sep 17 00:00:00 2001
From: Mathieu Othacehe <othacehe <at> gnu.org>
Date: Wed, 9 Nov 2022 16:12:52 +0100
Subject: [PATCH 1/1] Remove the finalizer on device pointers.

Fixes: <https://issues.guix.gnu.org/58732>

* parted/device.scm (%device-destroy): Remove it.
(pointer->device!): Do not set a finalizer.
---
 parted/device.scm | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/parted/device.scm b/parted/device.scm
index 56a774b..be7f0ac 100644
--- a/parted/device.scm
+++ b/parted/device.scm
@@ -43,20 +43,23 @@
             device-get-minimum-alignment
             device-get-optimum-alignment))
 
-;; Record all devices, so that pointer finalizers are only set once,
-;; even if get-device returns an already known pointer.  Use the
-;; pointer as key and the associated <device> as value.
-(define %devices (make-weak-value-hash-table))
-
-(define %device-destroy
-  (libparted->pointer "ped_device_destroy"))
-
+;; Record all devices, so that we do not end up with different <device>
+;; objects aliasing the same underlying C pointer. Use the pointer as key and
+;; the associated <device> as value.
+(define %devices (make-hash-table))
+
+;; %DEVICES was a weak hash-table and we used to set a finalizer on POINTER.
+;; This is inevitably causing double free issues for the following reason:
+;;
+;; When <device> goes out of scope and is removed from the %DEVICES table, the
+;; finalizer that is set on the underlying C pointer is still registered but
+;; possibly not called as finalization happens is a separate thread.  If a
+;; subsequent call to ped_device_get returns the same C pointer, another
+;; finalizer will be registered.  This means that the finalization function
+;; can be called twice on the same pointer, causing a double free issue.
 (define (pointer->device! pointer)
-  ;; Check if a finalizer is already registered for this pointer.
   (or (hash-ref %devices pointer)
       (let ((device (pointer->device pointer)))
-        ;; Contrary to its name, this "adds" a finalizer.
-        (set-pointer-finalizer! pointer %device-destroy)
         (hash-set! %devices pointer device)
         device)))
 
-- 
2.38.0


Information forwarded to bug-guix <at> gnu.org:
bug#58732; Package guix. (Thu, 10 Nov 2022 11:43:02 GMT) Full text and rfc822 format available.

Message #33 received at 58732 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: 58732 <at> debbugs.gnu.org
Subject: Re: bug#58732: installer: finalizers & device destroy segfault
Date: Thu, 10 Nov 2022 12:42:20 +0100
Hi,

Mathieu Othacehe <othacehe <at> gnu.org> skribis:

> I tested it out and I had several SCM pointers encapsulating the same
> libparted C pointer, thus multiple finalizers on the same underlying C
> pointer.

Yes, that’s the idea I tried to convey.

> Anyway, here is a patch that solves the issue by removing the device
> finalizer. It also means that all devices are persisted until the end of
> the program which doesn't feel right, but I cannot think of a better
> solution.

Looking at device.c in Parted, that’s probably the right thing because
PedDevice objects are kept in a linked list whose head is stored in the
‘devices’ global variable of device.c.  So you cannot just free them
asynchronously from a finalizer thread because they might still be
accessed from other parts of the library.  This is the explanation that
should go in the comment, and it’s clearly a good reason not to free
those PedDevice objects.

Now, we could provide bindings for ‘ped_device_destroy’ that users could
explicitly call if they want to (this would be similar to explicit calls
to ‘close-port’).  We’d arrange to make it idempotent.

Thanks,
Ludo’.




Reply sent to Mathieu Othacehe <othacehe <at> gnu.org>:
You have taken responsibility. (Thu, 10 Nov 2022 12:30:02 GMT) Full text and rfc822 format available.

Notification sent to Mathieu Othacehe <othacehe <at> gnu.org>:
bug acknowledged by developer. (Thu, 10 Nov 2022 12:30:02 GMT) Full text and rfc822 format available.

Message #38 received at 58732-done <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 58732-done <at> debbugs.gnu.org
Subject: Re: bug#58732: installer: finalizers & device destroy segfault
Date: Thu, 10 Nov 2022 13:29:10 +0100
Hey,

> Looking at device.c in Parted, that’s probably the right thing because
> PedDevice objects are kept in a linked list whose head is stored in the
> ‘devices’ global variable of device.c.  So you cannot just free them
> asynchronously from a finalizer thread because they might still be
> accessed from other parts of the library.  This is the explanation that
> should go in the comment, and it’s clearly a good reason not to free
> those PedDevice objects.

If the finalizer was run synchronously when a device is removed from the
weak hash table then things would be OK. The device would be removed
from the global linked list by _device_register. get_device would malloc
a new structure and so on. However finalizers are not run synchronously
so here we are.

> Now, we could provide bindings for ‘ped_device_destroy’ that users could
> explicitly call if they want to (this would be similar to explicit calls
> to ‘close-port’).  We’d arrange to make it idempotent.

Sure.

Thanks for your help on that one. I pushed the proposed patch and updated
Guile-Parted to 0.0.7 in Guix.

Mathieu




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 09 Dec 2022 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 138 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.