GNU bug report logs - #72137
[PATCH 0/2] Avoid cache cleanup storms

Previous Next

Package: guix-patches;

Reported by: Ludovic Courtès <ludo <at> gnu.org>

Date: Tue, 16 Jul 2024 09:12:01 UTC

Severity: normal

Tags: patch

Done: Ludovic Courtès <ludo <at> gnu.org>

To reply to this bug, email your comments to 72137 AT debbugs.gnu.org.
There is no need to reopen the bug first.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to guix <at> cbaines.net, dev <at> jpoiret.xyz, ludo <at> gnu.org, othacehe <at> gnu.org, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org:
bug#72137; Package guix-patches. (Tue, 16 Jul 2024 09:12:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ludovic Courtès <ludo <at> gnu.org>:
New bug report received and forwarded. Copy sent to guix <at> cbaines.net, dev <at> jpoiret.xyz, ludo <at> gnu.org, othacehe <at> gnu.org, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org. (Tue, 16 Jul 2024 09:12:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: guix-patches <at> gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: [PATCH 0/2] Avoid cache cleanup storms
Date: Tue, 16 Jul 2024 11:10:56 +0200
Hello!

This fixes “cache cleanup storms” reported by Chris Baines whereby
typically multiple ‘guix substitute’ processes would start cleaning
up /var/guix/substitute/cache concurrently, leading to poor performance
(in particular on build farms with many such processes running in
parallel, even worse when on spinning disks).

Thoughts?

Ludo’.

Ludovic Courtès (2):
  syscalls: Add ‘mode’ parameter to ‘lock-file’.
  cache: Avoid cache cleanup storms from concurrent processes.

 guix/build/syscalls.scm | 14 +++++++++-----
 guix/cache.scm          | 27 ++++++++++++++++++---------
 tests/cache.scm         | 30 +++++++++++++++++++++++++++++-
 tests/syscalls.scm      | 13 +++++++++++++
 4 files changed, 69 insertions(+), 15 deletions(-)


base-commit: eb508e32d2d359c94d2cabebfe90dc32ca5dcf4f
-- 
2.45.2





Information forwarded to guix-patches <at> gnu.org:
bug#72137; Package guix-patches. (Tue, 16 Jul 2024 09:19:01 GMT) Full text and rfc822 format available.

Message #8 received at 72137 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 72137 <at> debbugs.gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: [PATCH 1/2] syscalls: Add ‘mode’ parameter to ‘lock-file’.
Date: Tue, 16 Jul 2024 11:15:36 +0200
* guix/build/syscalls.scm (lock-file): Add ‘mode’ parameter and honor it.
* tests/syscalls.scm ("lock-file + unlock-file"): New test.

Change-Id: I113fb4a8b35dd8782b9c0991574e39a4b4393333
---
 guix/build/syscalls.scm | 14 +++++++++-----
 tests/syscalls.scm      | 13 +++++++++++++
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/guix/build/syscalls.scm b/guix/build/syscalls.scm
index 39bcffd516..2c20edf058 100644
--- a/guix/build/syscalls.scm
+++ b/guix/build/syscalls.scm
@@ -1398,14 +1398,18 @@ (define fcntl-flock
           ;; Presumably we got EAGAIN or so.
           (throw 'flock-error err))))))
 
-(define* (lock-file file #:key (wait? #t))
-  "Wait and acquire an exclusive lock on FILE.  Return an open port."
-  (let ((port (open-file file "w0")))
-    (fcntl-flock port 'write-lock #:wait? wait?)
+(define* (lock-file file #:optional (mode "w0")
+                    #:key (wait? #t))
+  "Wait and acquire an exclusive lock on FILE.  Return an open port according
+to MODE."
+  (let ((port (open-file file mode)))
+    (fcntl-flock port
+                 (if (output-port? port) 'write-lock 'read-lock)
+                 #:wait? wait?)
     port))
 
 (define (unlock-file port)
-  "Unlock PORT, a port returned by 'lock-file'."
+  "Unlock PORT, a port returned by 'lock-file', and close it."
   (fcntl-flock port 'unlock)
   (close-port port)
   #t)
diff --git a/tests/syscalls.scm b/tests/syscalls.scm
index 7cf67c060d..13f4f11721 100644
--- a/tests/syscalls.scm
+++ b/tests/syscalls.scm
@@ -383,6 +383,19 @@ (define perform-container-tests?
                (close-port file)
                result)))))))))
 
+(test-equal "lock-file + unlock-file"
+  'hello
+  (call-with-temporary-directory
+   (lambda (directory)
+     (let* ((file (in-vicinity directory "lock"))
+            (out (lock-file file #:wait? #f)))
+       (display "hello" out)
+       (unlock-file out)
+       (let* ((in (lock-file file "r0"))
+              (content (read in)))
+         (unlock-file in)
+         content)))))
+
 (test-equal "set-thread-name"
   "Syscall Test"
   (let ((name (thread-name)))
-- 
2.45.2





Information forwarded to guix <at> cbaines.net, dev <at> jpoiret.xyz, ludo <at> gnu.org, othacehe <at> gnu.org, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org:
bug#72137; Package guix-patches. (Tue, 16 Jul 2024 09:19:02 GMT) Full text and rfc822 format available.

Message #11 received at 72137 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 72137 <at> debbugs.gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: [PATCH 2/2] cache: Avoid cache cleanup storms from concurrent
 processes.
Date: Tue, 16 Jul 2024 11:15:37 +0200
Reported by Christopher Baines <guix <at> cbaines.net>.

* guix/cache.scm (maybe-remove-expired-cache-entries): Define
‘expiry-port’; create it with ‘lock-file’.  Change ‘last-expiry-date’
accordingly.  Write timestamp straight to ‘expiry-port’.
* tests/cache.scm ("maybe-remove-expired-cache-entries, cleanup needed
but lock taken"): New test.

Change-Id: I22441d9d2c4a339d3d3878de131864db5a0ae826
---
 guix/cache.scm  | 27 ++++++++++++++++++---------
 tests/cache.scm | 30 +++++++++++++++++++++++++++++-
 2 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/guix/cache.scm b/guix/cache.scm
index 6a91c7d3ef..8b12312c77 100644
--- a/guix/cache.scm
+++ b/guix/cache.scm
@@ -1,5 +1,5 @@
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2013-2017, 2020-2021, 2023 Ludovic Courtès <ludo <at> gnu.org>
+;;; Copyright © 2013-2017, 2020-2021, 2023-2024 Ludovic Courtès <ludo <at> gnu.org>
 ;;; Copyright © 2022 Simon Tournier <zimon.toutoune <at> gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
@@ -19,6 +19,7 @@
 
 (define-module (guix cache)
   #:use-module ((guix utils) #:select (with-atomic-file-output))
+  #:autoload   (guix build syscalls) (lock-file unlock-file)
   #:use-module (srfi srfi-19)
   #:use-module (srfi srfi-26)
   #:use-module (ice-9 match)
@@ -93,13 +94,19 @@ (define* (maybe-remove-expired-cache-entries cache
   (define expiry-file
     (string-append cache "/last-expiry-cleanup"))
 
+  (define expiry-port
+    ;; Get exclusive access to EXPIRY-FILE to avoid "cleanup storms" where
+    ;; several processes would concurrently decide that time has come to clean
+    ;; up the same cache.  'lock-file' might throw to 'system-error' or to
+    ;; 'flock-error'; in either case, assume that we lost the race.
+    (false-if-exception
+     (lock-file expiry-file "a+0" #:wait? #f)))
+
   (define last-expiry-date
-    (catch 'system-error
-      (lambda ()
-        (or (string->number
-             (call-with-input-file expiry-file get-string-all))
-            0))
-      (const 0)))
+    (if expiry-port
+        (or (string->number (get-string-all expiry-port))
+            0)
+        +inf.0))
 
   (when (obsolete? last-expiry-date now cleanup-period)
     (remove-expired-cache-entries (cache-entries cache)
@@ -108,8 +115,10 @@ (define* (maybe-remove-expired-cache-entries cache
                                   #:delete-entry delete-entry)
     (catch 'system-error
       (lambda ()
-        (with-atomic-file-output expiry-file
-          (cute write (time-second now) <>)))
+        (seek expiry-port 0 SEEK_SET)
+        (truncate-file expiry-port 0)
+        (write (time-second now) expiry-port)
+        (unlock-file expiry-port))
       (lambda args
         ;; ENOENT means CACHE does not exist.
         (unless (= ENOENT (system-error-errno args))
diff --git a/tests/cache.scm b/tests/cache.scm
index d495ace2bd..e8ad083d40 100644
--- a/tests/cache.scm
+++ b/tests/cache.scm
@@ -1,5 +1,5 @@
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2017, 2020 Ludovic Courtès <ludo <at> gnu.org>
+;;; Copyright © 2017, 2020, 2024 Ludovic Courtès <ludo <at> gnu.org>
 ;;; Copyright © 2022 Simon Tournier <zimon.toutoune <at> gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
@@ -22,7 +22,9 @@ (define-module (test-cache)
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-19)
   #:use-module (srfi srfi-64)
+  #:use-module ((guix build syscalls) #:select (lock-file))
   #:use-module ((guix utils) #:select (call-with-temporary-directory))
+  #:use-module ((rnrs io ports) #:select (get-string-all))
   #:use-module (ice-9 match))
 
 (test-begin "cache")
@@ -75,6 +77,32 @@ (define-syntax-rule (test-cache-cleanup cache exp ...)
       (lambda (port)
         (display 0 port)))))
 
+(let ((pid #f))
+  (test-equal "maybe-remove-expired-cache-entries, cleanup needed but lock taken"
+    '()
+    (test-cache-cleanup cache
+      (let ((in+out (pipe)))
+        (match (primitive-fork)
+          (0 (dynamic-wind
+               (const #t)
+               (lambda ()
+                 (close-port (car in+out))
+                 (let ((port (lock-file
+                              (string-append cache "/last-expiry-cleanup"))))
+                   (display 0 port)
+                   (display "done!\n" (cdr in+out))
+                   (close-port (cdr in+out))
+                   (sleep 100)))
+               (lambda ()
+                 (primitive-exit 0))))
+          (n
+           (set! pid n)
+           (close-port (cdr in+out))
+           (pk 'chr (get-string-all (car in+out)))
+           (close-port (car in+out)))))))
+
+  (when pid (kill pid SIGKILL)))
+
 (test-equal "maybe-remove-expired-cache-entries, empty cache"
   '("a" "b" "c")
   (test-cache-cleanup cache
-- 
2.45.2





Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Tue, 20 Aug 2024 22:56:02 GMT) Full text and rfc822 format available.

Notification sent to Ludovic Courtès <ludo <at> gnu.org>:
bug acknowledged by developer. (Tue, 20 Aug 2024 22:56:02 GMT) Full text and rfc822 format available.

Message #16 received at 72137-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 72137-done <at> debbugs.gnu.org
Cc: Tobias Geerinckx-Rice <me <at> tobias.gr>, Christopher Baines <guix <at> cbaines.net>,
 Josselin Poiret <dev <at> jpoiret.xyz>, Simon Tournier <zimon.toutoune <at> gmail.com>,
 Mathieu Othacehe <othacehe <at> gnu.org>
Subject: Re: [bug#72137] [PATCH 0/2] Avoid cache cleanup storms
Date: Wed, 21 Aug 2024 00:54:45 +0200
Hi,

Ludovic Courtès <ludo <at> gnu.org> skribis:

> This fixes “cache cleanup storms” reported by Chris Baines whereby
> typically multiple ‘guix substitute’ processes would start cleaning
> up /var/guix/substitute/cache concurrently, leading to poor performance
> (in particular on build farms with many such processes running in
> parallel, even worse when on spinning disks).
>
> Thoughts?
>
> Ludo’.
>
> Ludovic Courtès (2):
>   syscalls: Add ‘mode’ parameter to ‘lock-file’.
>   cache: Avoid cache cleanup storms from concurrent processes.

I went ahead and pushed it as d921c742b774a9f0a016f3db6442d5c58a330c92.

Ludo’.




This bug report was last modified 18 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.