GNU bug report logs - #10225
Lock ordering mismatch

Previous Next

Package: guile;

Reported by: Andy Wingo <wingo <at> pobox.com>

Date: Mon, 5 Dec 2011 20:44:02 UTC

Severity: normal

Done: Andy Wingo <wingo <at> pobox.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 10225 in the body.
You can then email your comments to 10225 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guile <at> gnu.org:
bug#10225; Package guile. (Mon, 05 Dec 2011 20:44:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Andy Wingo <wingo <at> pobox.com>:
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Mon, 05 Dec 2011 20:44:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Wingo <wingo <at> pobox.com>
To: bug-guile <bug-guile <at> gnu.org>
Cc: ludo <at> gnu.org
Subject: Lock ordering mismatch
Date: Mon, 05 Dec 2011 21:42:59 +0100
[Message part 1 (text/plain, inline)]
This message from Ludovic on 1 July indicates a problem in our threading
code that we need to fix.

Andy

[Message part 2 (message/rfc822, inline)]
From: ludo <at> gnu.org (Ludovic Courtès)
To: guile-devel <at> gnu.org
Subject: Lock ordering mismatch
Date: Fri, 01 Jul 2011 23:11:00 +0200
[Message part 3 (text/plain, inline)]
Hello,

As seen in ccb80964cd7cd112e300c34d32f67125a6d6da9a, there’s a lock
ordering mismatch between ‘do_thread exit’ and ‘fat_mutex_lock’
wrt. ‘t->admin_mutex’ and ‘m->lock’.

I thought this commit solved the problem, but now I think it doesn’t
because it leaves a small window during which a mutex could be held by a
thread without being part of its `mutexes' list, thereby violating the
invariant tested at line 667:

        /* Since MUTEX is in `t->mutexes', T must be its owner.  */
        assert (scm_is_eq (m->owner, t->handle));

So I reverted the patch.

(The situation isn’t better without the patch since
“while ./check-guile srfi-18.test threads.test ; do : ; done”
eventually hits the assertion failure.)

I tried the attached patch, which is inelegant and leads to deadlocks
with canceled threads (namely the “cancel succeeds” test in
threads.test.)

IOW I would welcome fresh ideas to approach the problem.  :-)

Thanks,
Ludo’.

[Message part 4 (text/x-patch, inline)]
	Modified libguile/threads.c
diff --git a/libguile/threads.c b/libguile/threads.c
index cbacfca..d537e0e 100644
--- a/libguile/threads.c
+++ b/libguile/threads.c
@@ -1353,12 +1353,24 @@ fat_mutex_lock (SCM mutex, scm_t_timespec *timeout, SCM owner, int *ret)
   fat_mutex *m = SCM_MUTEX_DATA (mutex);
 
   SCM new_owner = SCM_UNBNDP (owner) ? scm_current_thread() : owner;
+  scm_i_thread *t =
+    scm_is_true (new_owner) ? SCM_I_THREAD_DATA (new_owner) : NULL;
   SCM err = SCM_BOOL_F;
 
   struct timeval current_time;
 
-  scm_i_scm_pthread_mutex_lock (&m->lock);
+#define LOCK					\
+  if (t != NULL)				\
+    scm_i_pthread_mutex_lock (&t->admin_mutex);	\
+  scm_i_pthread_mutex_lock (&m->lock)
+
+#define UNLOCK					\
+  scm_i_pthread_mutex_unlock (&m->lock);	\
+  if (t != NULL)				\
+    scm_i_pthread_mutex_unlock (&t->admin_mutex)
 
+
+  LOCK;
   while (1)
     {
       if (m->level == 0)
@@ -1367,22 +1379,12 @@ fat_mutex_lock (SCM mutex, scm_t_timespec *timeout, SCM owner, int *ret)
 	  m->level++;
 
 	  if (SCM_I_IS_THREAD (new_owner))
-	    {
-	      scm_i_thread *t = SCM_I_THREAD_DATA (new_owner);
-
-	      scm_i_pthread_mutex_unlock (&m->lock);
-	      scm_i_pthread_mutex_lock (&t->admin_mutex);
-
-	      /* Only keep a weak reference to MUTEX so that it's not
-		 retained when not referenced elsewhere (bug #27450).
-		 The weak pair itself is eventually removed when MUTEX
-		 is unlocked.  Note that `t->mutexes' lists mutexes
-		 currently held by T, so it should be small.  */
-	      t->mutexes = scm_weak_car_pair (mutex, t->mutexes);
-
-	      scm_i_pthread_mutex_unlock (&t->admin_mutex);
-	      scm_i_pthread_mutex_lock (&m->lock);
-	    }
+	    /* Only keep a weak reference to MUTEX so that it's not
+	       retained when not referenced elsewhere (bug #27450).
+	       The weak pair itself is eventually removed when MUTEX
+	       is unlocked.  Note that `t->mutexes' lists mutexes
+	       currently held by T, so it should be small.  */
+	    t->mutexes = scm_weak_car_pair (mutex, t->mutexes);
 	  *ret = 1;
 	  break;
 	}
@@ -1425,13 +1427,18 @@ fat_mutex_lock (SCM mutex, scm_t_timespec *timeout, SCM owner, int *ret)
 		}
 	    }
 	  block_self (m->waiting, mutex, &m->lock, timeout);
-	  scm_i_pthread_mutex_unlock (&m->lock);
-	  SCM_TICK;
-	  scm_i_scm_pthread_mutex_lock (&m->lock);
+
+	  /* UNLOCK; */
+	  /* SCM_TICK; */
+	  /* LOCK; */
 	}
     }
-  scm_i_pthread_mutex_unlock (&m->lock);
+
+  UNLOCK;
+
   return err;
+#undef LOCK
+#undef UNLOCK
 }
 
 SCM scm_lock_mutex (SCM mx)

[Message part 5 (text/plain, inline)]

-- 
http://wingolog.org/

Reply sent to Andy Wingo <wingo <at> pobox.com>:
You have taken responsibility. (Thu, 23 Feb 2017 18:57:01 GMT) Full text and rfc822 format available.

Notification sent to Andy Wingo <wingo <at> pobox.com>:
bug acknowledged by developer. (Thu, 23 Feb 2017 18:57:01 GMT) Full text and rfc822 format available.

Message #10 received at 10225-done <at> debbugs.gnu.org (full text, mbox):

From: Andy Wingo <wingo <at> pobox.com>
To: 10225-done <at> debbugs.gnu.org
Cc: ludo <at> gnu.org
Subject: Re: bug#10225: Lock ordering mismatch
Date: Thu, 23 Feb 2017 19:56:25 +0100
> As seen in ccb80964cd7cd112e300c34d32f67125a6d6da9a, there’s a lock
> ordering mismatch between ‘do_thread exit’ and ‘fat_mutex_lock’
> wrt. ‘t->admin_mutex’ and ‘m->lock’.

Happily with the thread-safety rewrite in 2.1.5 this is fixed!

Andy




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 24 Mar 2017 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 7 years and 44 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.