GNU bug report logs - #55499
excessively large manifests due to propagation

Previous Next

Package: guix;

Reported by: Ricardo Wurmus <rekado <at> elephly.net>

Date: Wed, 18 May 2022 14:02:02 UTC

Severity: important

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 55499 in the body.
You can then email your comments to 55499 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#55499; Package guix. (Wed, 18 May 2022 14:02:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ricardo Wurmus <rekado <at> elephly.net>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Wed, 18 May 2022 14:02:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: bug-guix <at> gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: excessively large manifests due to propagation
Date: Wed, 18 May 2022 15:53:06 +0200
Packages of some languages rely heavily on propagation.  R is one
example.  Since the generated “manifest” file of a Guix profile records
entries for all propagated packages, this can get really big really
quickly.

A profile consisting only of four R packages (r-seurat, r-cistopic,
r-monocle3, and r-cicero-monocle3) results in a “manifest” file that
weighs 7.1MB.  At the MDC I repeatedly encountered manifest files that
are exceeding 24MB.

Simply reading that big a file with

    (call-with-input-file "huge-manifest" read)

takes several seconds.  On the MDC cluster I observed a delay of about
27 seconds.

Disabling read positions with (read-disable 'positions) significantly
speeds this up (18s vs 27s), but it’s still very slow.

We may be able to speed things up by supporting definitions or
references in manifest files, so that we don’t need to repeat a sub-tree
when generating the file.

-- 
Ricardo




Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Mon, 23 May 2022 15:57:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#55499; Package guix. (Tue, 24 May 2022 15:31:02 GMT) Full text and rfc822 format available.

Message #10 received at 55499 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: 55499 <at> debbugs.gnu.org
Subject: Re: bug#55499: excessively large manifests due to propagation
Date: Tue, 24 May 2022 17:30:33 +0200
Hi!

Ricardo Wurmus <rekado <at> elephly.net> skribis:

> A profile consisting only of four R packages (r-seurat, r-cistopic,
> r-monocle3, and r-cicero-monocle3) results in a “manifest” file that
> weighs 7.1MB.  At the MDC I repeatedly encountered manifest files that
> are exceeding 24MB.

Commit 93f601d97ca2d9b82c41afeb86879ee37eae39e6 provides a 12% size
reduction on this example, and it’s backward-compatible and cheap.

I’ll try and follow up with changes along the lines you describe.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#55499; Package guix. (Wed, 25 May 2022 04:37:01 GMT) Full text and rfc822 format available.

Message #13 received at 55499 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 55499 <at> debbugs.gnu.org
Subject: Re: bug#55499: excessively large manifests due to propagation
Date: Wed, 25 May 2022 06:35:32 +0200
Ludovic Courtès <ludo <at> gnu.org> writes:

> Hi!
>
> Ricardo Wurmus <rekado <at> elephly.net> skribis:
>
>> A profile consisting only of four R packages (r-seurat, r-cistopic,
>> r-monocle3, and r-cicero-monocle3) results in a “manifest” file that
>> weighs 7.1MB.  At the MDC I repeatedly encountered manifest files that
>> are exceeding 24MB.
>
> Commit 93f601d97ca2d9b82c41afeb86879ee37eae39e6 provides a 12% size
> reduction on this example, and it’s backward-compatible and cheap.

Excellent!  This is a great first step.

> I’ll try and follow up with changes along the lines you describe.

Thank you!

-- 
Ricardo




Information forwarded to rekado <at> elephly.net, bug-guix <at> gnu.org:
bug#55499; Package guix. (Tue, 31 May 2022 16:10:02 GMT) Full text and rfc822 format available.

Message #16 received at 55499 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 55499 <at> debbugs.gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: [PATCH 0/3] Make 'manifest' files more compact
Date: Tue, 31 May 2022 18:09:13 +0200
Hello,

These patches implement what you suggested on IRC: not repeating
entire manifest entries and their propagated inputs.  This has a
dramatic impact on the size of the ‘manifest’ file and on the memory
and processing time to read it for the the use case you gave.

The second patch goes a tiny bit further by making the ‘search-paths’
and ‘propagated-inputs’ fields optional, shaving another ~10% on the
size of ‘manifest’ in this example.

The second patch should be squashed with the first one (so we don’t
bump version formats a second time and duplicate code).  It’s kinda
optional because it doesn’t bring much compared to the first patch and
causes a bit of extra complexity, but maybe it’s still worth keeping?

Could you try this on your larger use cases and tell me how it goes?

Thanks,
Ludo’.

Ludovic Courtès (3):
  tests: Augment profile collision test.
  profiles: Do not repeat entries in 'manifest' file.
  squash! profiles: Make all entry fields optional.

 guix/build/profiles.scm |  32 ++++++++--
 guix/profiles.scm       | 137 ++++++++++++++++++++++++++++++++--------
 tests/profiles.scm      |  52 ++++++++++++++-
 3 files changed, 187 insertions(+), 34 deletions(-)


base-commit: fed51b26141548a5bae349a5e1d8d6f681320f4f
-- 
2.36.1





Information forwarded to bug-guix <at> gnu.org:
bug#55499; Package guix. (Tue, 31 May 2022 16:10:03 GMT) Full text and rfc822 format available.

Message #19 received at 55499 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 55499 <at> debbugs.gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: [PATCH 2/3] profiles: Do not repeat entries in 'manifest' file.
Date: Tue, 31 May 2022 18:09:15 +0200
Fixes <https://issues.guix.gnu.org/55499>.
Reported by Ricardo Wurmus <rekado <at> elephly.net>.

With this change, the manifest file created for:

  guix install r r-seurat r-cistopic r-monocle3 r-cicero-monocle3 r-assertthat

goes from 5.6M to 192K.  Likewise, on this profile, wall-clock time of:

  GUIX_PROFILING=gc guix package -I

goes from 0.7s to 0.1s, with heap usage going from 55M to 9M.

* guix/profiles.scm (manifest->gexp)[entry->gexp]: Turn into a monadic
procedure.  Return a 'repeated' sexp if ENTRY was already visited
before.
Adjust caller accordingly.  Bump manifest version.
(sexp->manifest)[sexp->manifest-entry]: Turn into a monadic procedure.
Add case for 'repeated' nodes.  Add each entry to the current state
vhash.
Add clause for version 4 manifests.
* tests/profiles.scm ("deduplication of repeated entries"): New test.
* guix/build/profiles.scm (manifest-sexp->inputs+search-paths): Expect
version 4.  Add clause for 'repeated' nodes.
---
 guix/build/profiles.scm |   4 +-
 guix/profiles.scm       | 127 ++++++++++++++++++++++++++++------------
 tests/profiles.scm      |  42 +++++++++++++
 3 files changed, 134 insertions(+), 39 deletions(-)

diff --git a/guix/build/profiles.scm b/guix/build/profiles.scm
index f9875ca92e..c4460f624b 100644
--- a/guix/build/profiles.scm
+++ b/guix/build/profiles.scm
@@ -150,7 +150,7 @@ (define (manifest-sexp->inputs+search-paths manifest)
 values: the list of store items of its manifest entries, and the list of
 search path specifications."
   (match manifest                            ;this must match 'manifest->gexp'
-    (('manifest ('version 3)
+    (('manifest ('version 4)
                 ('packages (entries ...)))
      (let loop ((entries entries)
                 (inputs '())
@@ -162,6 +162,8 @@ (define (manifest-sexp->inputs+search-paths manifest)
           (loop (append rest deps)                ;breadth-first traversal
                 (cons item inputs)
                 (append paths search-paths)))
+         ((('repeated name version item) . rest)
+          (loop rest inputs search-paths))
          (()
           (values (reverse inputs)
                   (delete-duplicates
diff --git a/guix/profiles.scm b/guix/profiles.scm
index bf50c00a1e..44ff37e75b 100644
--- a/guix/profiles.scm
+++ b/guix/profiles.scm
@@ -455,31 +455,53 @@ (define (inferior->entry)
 (define (manifest->gexp manifest)
   "Return a representation of MANIFEST as a gexp."
   (define (entry->gexp entry)
-    (match entry
-      (($ <manifest-entry> name version output (? string? path)
-                           (deps ...) (search-paths ...) _ (properties ...))
-       #~(#$name #$version #$output #$path
-                 (propagated-inputs #$(map entry->gexp deps))
-                 (search-paths #$(map search-path-specification->sexp
-                                      search-paths))
-                 #$@(if (null? properties)
-                        #~()
-                        #~((properties . #$properties)))))
-      (($ <manifest-entry> name version output package
-                           (deps ...) (search-paths ...) _ (properties ...))
-       #~(#$name #$version #$output
-                 (ungexp package (or output "out"))
-                 (propagated-inputs #$(map entry->gexp deps))
-                 (search-paths #$(map search-path-specification->sexp
-                                      search-paths))
-                 #$@(if (null? properties)
-                        #~()
-                        #~((properties . #$properties)))))))
+    ;; Maintain in state monad a vhash of visited entries, indexed by their
+    ;; item, usually package objects (we cannot use the entry itself as an
+    ;; index since identical entries are usually not 'eq?').  Use that vhash
+    ;; to avoid repeating duplicate entries.  This is particularly useful in
+    ;; the presence of propagated inputs, where we could otherwise end up
+    ;; repeating large trees.
+    (mlet %state-monad ((visited (current-state)))
+      (if (match (vhash-assq (manifest-entry-item entry) visited)
+            ((_ . previous-entry)
+             (manifest-entry=? previous-entry entry))
+            (#f #f))
+          (return #~(repeated #$(manifest-entry-name entry)
+                              #$(manifest-entry-version entry)
+                              #$(manifest-entry-item entry)))
+          (mbegin %state-monad
+            (set-current-state (vhash-consq (manifest-entry-item entry)
+                                            entry visited))
+            (mlet %state-monad ((deps (mapm %state-monad entry->gexp
+                                            (manifest-entry-dependencies entry))))
+              (return
+               (match entry
+                 (($ <manifest-entry> name version output (? string? path)
+                                      (_deps ...) (search-paths ...) _ (properties ...))
+                  #~(#$name #$version #$output #$path
+                            (propagated-inputs #$deps)
+                            (search-paths #$(map search-path-specification->sexp
+                                                 search-paths))
+                            #$@(if (null? properties)
+                                   #~()
+                                   #~((properties . #$properties)))))
+                 (($ <manifest-entry> name version output package
+                                      (_deps ...) (search-paths ...) _ (properties ...))
+                  #~(#$name #$version #$output
+                            (ungexp package (or output "out"))
+                            (propagated-inputs #$deps)
+                            (search-paths #$(map search-path-specification->sexp
+                                                 search-paths))
+                            #$@(if (null? properties)
+                                   #~()
+                                   #~((properties . #$properties))))))))))))
 
   (match manifest
     (($ <manifest> (entries ...))
-     #~(manifest (version 3)
-                 (packages #$(map entry->gexp entries))))))
+     #~(manifest (version 4)
+                 (packages #$(run-with-state
+                                 (mapm %state-monad entry->gexp entries)
+                               vlist-null))))))
 
 (define (find-package name version)
   "Return a package from the distro matching NAME and possibly VERSION.  This
@@ -522,25 +544,44 @@ (define (infer-dependency item parent)
 
   (define* (sexp->manifest-entry sexp #:optional (parent (delay #f)))
     (match sexp
+      (('repeated name version path)
+       ;; This entry is the same as another one encountered earlier; look it
+       ;; up and return it.
+       (mlet %state-monad ((visited (current-state))
+                           (key -> (list name version path)))
+         (match (vhash-assoc key visited)
+           (#f
+            (raise (formatted-message
+                    (G_ "invalid repeated entry in profile: ~s")
+                    sexp)))
+           ((_ . entry)
+            (return entry)))))
       ((name version output path
              ('propagated-inputs deps)
              ('search-paths search-paths)
              extra-stuff ...)
-       ;; For each of DEPS, keep a promise pointing to ENTRY.
-       (letrec* ((deps* (map (cut sexp->manifest-entry <> (delay entry))
-                             deps))
-                 (entry (manifest-entry
-                          (name name)
-                          (version version)
-                          (output output)
-                          (item path)
-                          (dependencies deps*)
-                          (search-paths (map sexp->search-path-specification
-                                             search-paths))
-                          (parent parent)
-                          (properties (or (assoc-ref extra-stuff 'properties)
-                                          '())))))
-         entry))))
+       (mlet* %state-monad
+           ((entry -> #f)
+            (deps*    (mapm %state-monad
+                            (cut sexp->manifest-entry <> (delay entry))
+                            deps))
+            (visited  (current-state))
+            (key ->   (list name version path)))
+         (set! entry                              ;XXX: emulate 'letrec*'
+               (manifest-entry
+                 (name name)
+                 (version version)
+                 (output output)
+                 (item path)
+                 (dependencies deps*)
+                 (search-paths (map sexp->search-path-specification
+                                    search-paths))
+                 (parent parent)
+                 (properties (or (assoc-ref extra-stuff 'properties)
+                                 '()))))
+         (mbegin %state-monad
+           (set-current-state (vhash-cons key entry visited))
+           (return entry))))))
 
   (match sexp
     (('manifest ('version 0)
@@ -608,7 +649,17 @@ (define* (sexp->manifest-entry sexp #:optional (parent (delay #f)))
     ;; Version 3 represents DEPS as full-blown manifest entries.
     (('manifest ('version 3 minor-version ...)
                 ('packages (entries ...)))
-     (manifest (map sexp->manifest-entry entries)))
+     (manifest (run-with-state
+                   (mapm %state-monad sexp->manifest-entry entries)
+                 vlist-null)))
+
+    ;; Version 4 deduplicates repeated entries, as can happen with deep
+    ;; propagated input trees.
+    (('manifest ('version 4 minor-version ...)
+                ('packages (entries ...)))
+     (manifest (run-with-state
+                   (mapm %state-monad sexp->manifest-entry entries)
+                 vlist-null)))
     (_
      (raise (condition
              (&message (message "unsupported manifest format")))))))
diff --git a/tests/profiles.scm b/tests/profiles.scm
index 7e51d37ab9..3838d971c9 100644
--- a/tests/profiles.scm
+++ b/tests/profiles.scm
@@ -586,6 +586,48 @@ (define (entry->sexp entry)
                                                     #:locales? #f)))
         (return #f)))))
 
+(test-assertm "deduplication of repeated entries"
+  ;; Make sure the 'manifest' file does not duplicate identical entries.
+  ;; See <https://issues.guix.gnu.org/55499>.
+  (mlet* %store-monad ((p0 -> (dummy-package "p0"
+                                (build-system trivial-build-system)
+                                (arguments
+                                 `(#:guile ,%bootstrap-guile
+                                   #:builder (mkdir (assoc-ref %outputs "out"))))
+                                (propagated-inputs
+                                 `(("guile" ,%bootstrap-guile)))))
+                       (p1 -> (package
+                                (inherit p0)
+                                (name "p1")))
+                       (drv (profile-derivation (packages->manifest
+                                                 (list p0 p1))
+                                                #:hooks '()
+                                                #:locales? #f)))
+    (mbegin %store-monad
+      (built-derivations (list drv))
+      (let ((file     (string-append (derivation->output-path drv)
+                                     "/manifest"))
+            (manifest (profile-manifest (derivation->output-path drv))))
+        (define (contains-repeated? sexp)
+          (match sexp
+            (('repeated _ ...) #t)
+            ((lst ...) (any contains-repeated? sexp))
+            (_ #f)))
+
+        (return (and (contains-repeated? (call-with-input-file file read))
+
+                     ;; MANIFEST has two entries for %BOOTSTRAP-GUILE since
+                     ;; it's propagated both from P0 and from P1.  When
+                     ;; reading a 'repeated' node, 'read-manifest' should
+                     ;; reuse the previously-read entry so the two
+                     ;; %BOOTSTRAP-GUILE entries must be 'eq?'.
+                     (match (manifest-entries manifest)
+                       (((= manifest-entry-dependencies (dep0))
+                         (= manifest-entry-dependencies (dep1)))
+                        (and (string=? (manifest-entry-name dep0)
+                                       (package-name %bootstrap-guile))
+                             (eq? dep0 dep1))))))))))
+
 (test-assertm "no collision"
   ;; Here we have an entry that is "lowered" (its 'item' field is a store file
   ;; name) and another entry (its 'item' field is a package) that is
-- 
2.36.1





Information forwarded to bug-guix <at> gnu.org:
bug#55499; Package guix. (Tue, 31 May 2022 16:10:03 GMT) Full text and rfc822 format available.

Message #22 received at 55499 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 55499 <at> debbugs.gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: [PATCH 1/3] tests: Augment profile collision test.
Date: Tue, 31 May 2022 18:09:14 +0200
* tests/profiles.scm ("collision of propagated inputs"): Check the
parents of ENTRY1 and ENTRY2.
---
 tests/profiles.scm | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tests/profiles.scm b/tests/profiles.scm
index d59d75985f..7e51d37ab9 100644
--- a/tests/profiles.scm
+++ b/tests/profiles.scm
@@ -556,14 +556,20 @@ (define (entry->sexp entry)
         (return #f)))))
 
 (test-equal "collision of propagated inputs"
-  '(("guile-bootstrap" "2.0") ("guile-bootstrap" "42"))
+  '(("guile-bootstrap" "2.0") "p1"
+    <> ("guile-bootstrap" "42") "p2")
   (guard (c ((profile-collision-error? c)
              (let ((entry1 (profile-collision-error-entry c))
                    (entry2 (profile-collision-error-conflict c)))
                (list (list (manifest-entry-name entry1)
                            (manifest-entry-version entry1))
+                     (manifest-entry-name
+                      (force (manifest-entry-parent entry1)))
+                     '<>
                      (list (manifest-entry-name entry2)
-                           (manifest-entry-version entry2))))))
+                           (manifest-entry-version entry2))
+                     (manifest-entry-name
+                      (force (manifest-entry-parent entry2)))))))
     (run-with-store %store
       (mlet* %store-monad ((p0 -> (package
                                     (inherit %bootstrap-guile)
-- 
2.36.1





Information forwarded to bug-guix <at> gnu.org:
bug#55499; Package guix. (Tue, 31 May 2022 16:10:04 GMT) Full text and rfc822 format available.

Message #25 received at 55499 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 55499 <at> debbugs.gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: [PATCH 3/3] squash! profiles: Make all entry fields optional.
Date: Tue, 31 May 2022 18:09:16 +0200
This is meant to be squashed with the previous patch.

This makes the 'search-paths' and 'propagated-inputs' fields of each
entry optional, shaving a bit more space and reading time, down to 180K
instead of 192K.

* guix/build/profiles.scm (manifest-sexp->inputs+search-paths)[let-fields]:
New macro.
Use it.
* guix/profiles.scm (manifest->gexp)[optional]: New procedure.  Use it.
[sexp->manifest-entry]: Rename to...
[sexp->manifest-entry/v3]: ... this.
---
 guix/build/profiles.scm |  28 ++++++++--
 guix/profiles.scm       | 120 ++++++++++++++++++++++++++--------------
 2 files changed, 100 insertions(+), 48 deletions(-)

diff --git a/guix/build/profiles.scm b/guix/build/profiles.scm
index c4460f624b..2ab76bde74 100644
--- a/guix/build/profiles.scm
+++ b/guix/build/profiles.scm
@@ -149,6 +149,18 @@ (define (manifest-sexp->inputs+search-paths manifest)
   "Parse MANIFEST, an sexp as produced by 'manifest->gexp', and return two
 values: the list of store items of its manifest entries, and the list of
 search path specifications."
+  (define-syntax let-fields
+    (syntax-rules ()
+      ;; Bind the fields NAME of LST to same-named variables in the lexical
+      ;; scope of BODY.
+      ((_ lst (name rest ...) body ...)
+       (let ((name (match (assq 'name lst)
+                     ((_ value) value)
+                     (#f '()))))
+         (let-fields lst (rest ...) body ...)))
+      ((_ lst () body ...)
+       (begin body ...))))
+
   (match manifest                            ;this must match 'manifest->gexp'
     (('manifest ('version 4)
                 ('packages (entries ...)))
@@ -156,12 +168,12 @@ (define (manifest-sexp->inputs+search-paths manifest)
                 (inputs '())
                 (search-paths '()))
        (match entries
-         (((name version output item
-                 ('propagated-inputs deps)
-                 ('search-paths paths) _ ...) . rest)
-          (loop (append rest deps)                ;breadth-first traversal
-                (cons item inputs)
-                (append paths search-paths)))
+         (((name version output item fields ...) . rest)
+          (let ((paths search-paths))
+            (let-fields fields (propagated-inputs search-paths properties)
+              (loop (append rest propagated-inputs) ;breadth-first traversal
+                    (cons item inputs)
+                    (append search-paths paths)))))
          ((('repeated name version item) . rest)
           (loop rest inputs search-paths))
          (()
@@ -214,4 +226,8 @@ (define manifest-file
     ;; Write 'OUTPUT/etc/profile'.
     (build-etc/profile output search-paths)))
 
+;;; Local Variables:
+;;; eval: (put 'let-fields 'scheme-indent-function 2)
+;;; End:
+
 ;;; profile.scm ends here
diff --git a/guix/profiles.scm b/guix/profiles.scm
index 44ff37e75b..d694ac07da 100644
--- a/guix/profiles.scm
+++ b/guix/profiles.scm
@@ -454,6 +454,11 @@ (define (inferior->entry)
 
 (define (manifest->gexp manifest)
   "Return a representation of MANIFEST as a gexp."
+  (define (optional name value)
+    (if (null? value)
+        #~()
+        #~((#$name #$value))))
+
   (define (entry->gexp entry)
     ;; Maintain in state monad a vhash of visited entries, indexed by their
     ;; item, usually package objects (we cannot use the entry itself as an
@@ -477,24 +482,22 @@ (define (entry->gexp entry)
               (return
                (match entry
                  (($ <manifest-entry> name version output (? string? path)
-                                      (_deps ...) (search-paths ...) _ (properties ...))
+                                      (_ ...) (search-paths ...) _ (properties ...))
                   #~(#$name #$version #$output #$path
-                            (propagated-inputs #$deps)
-                            (search-paths #$(map search-path-specification->sexp
-                                                 search-paths))
-                            #$@(if (null? properties)
-                                   #~()
-                                   #~((properties . #$properties)))))
+                            #$@(optional 'propagated-inputs deps)
+                            #$@(optional 'search-paths
+                                         (map search-path-specification->sexp
+                                              search-paths))
+                            #$@(optional 'properties properties)))
                  (($ <manifest-entry> name version output package
                                       (_deps ...) (search-paths ...) _ (properties ...))
                   #~(#$name #$version #$output
                             (ungexp package (or output "out"))
-                            (propagated-inputs #$deps)
-                            (search-paths #$(map search-path-specification->sexp
-                                                 search-paths))
-                            #$@(if (null? properties)
-                                   #~()
-                                   #~((properties . #$properties))))))))))))
+                            #$@(optional 'propagated-inputs deps)
+                            #$@(optional 'search-paths
+                                         (map search-path-specification->sexp
+                                              search-paths))
+                            #$@(optional 'properties properties))))))))))
 
   (match manifest
     (($ <manifest> (entries ...))
@@ -542,6 +545,40 @@ (define (infer-dependency item parent)
         (item item)
         (parent parent))))
 
+  (define* (sexp->manifest-entry/v3 sexp #:optional (parent (delay #f)))
+    (match sexp
+      ((name version output path
+             ('propagated-inputs deps)
+             ('search-paths search-paths)
+             extra-stuff ...)
+       ;; For each of DEPS, keep a promise pointing to ENTRY.
+       (letrec* ((deps* (map (cut sexp->manifest-entry/v3 <> (delay entry))
+                             deps))
+                 (entry (manifest-entry
+                          (name name)
+                          (version version)
+                          (output output)
+                          (item path)
+                          (dependencies deps*)
+                          (search-paths (map sexp->search-path-specification
+                                             search-paths))
+                          (parent parent)
+                          (properties (or (assoc-ref extra-stuff 'properties)
+                                          '())))))
+         entry))))
+
+  (define-syntax let-fields
+    (syntax-rules ()
+      ;; Bind the fields NAME of LST to same-named variables in the lexical
+      ;; scope of BODY.
+      ((_ lst (name rest ...) body ...)
+       (let ((name (match (assq 'name lst)
+                     ((_ value) value)
+                     (#f '()))))
+         (let-fields lst (rest ...) body ...)))
+      ((_ lst () body ...)
+       (begin body ...))))
+
   (define* (sexp->manifest-entry sexp #:optional (parent (delay #f)))
     (match sexp
       (('repeated name version path)
@@ -556,32 +593,29 @@ (define* (sexp->manifest-entry sexp #:optional (parent (delay #f)))
                     sexp)))
            ((_ . entry)
             (return entry)))))
-      ((name version output path
-             ('propagated-inputs deps)
-             ('search-paths search-paths)
-             extra-stuff ...)
-       (mlet* %state-monad
-           ((entry -> #f)
-            (deps*    (mapm %state-monad
-                            (cut sexp->manifest-entry <> (delay entry))
-                            deps))
-            (visited  (current-state))
-            (key ->   (list name version path)))
-         (set! entry                              ;XXX: emulate 'letrec*'
-               (manifest-entry
-                 (name name)
-                 (version version)
-                 (output output)
-                 (item path)
-                 (dependencies deps*)
-                 (search-paths (map sexp->search-path-specification
-                                    search-paths))
-                 (parent parent)
-                 (properties (or (assoc-ref extra-stuff 'properties)
-                                 '()))))
-         (mbegin %state-monad
-           (set-current-state (vhash-cons key entry visited))
-           (return entry))))))
+      ((name version output path fields ...)
+       (let-fields fields (propagated-inputs search-paths properties)
+         (mlet* %state-monad
+             ((entry -> #f)
+              (deps     (mapm %state-monad
+                              (cut sexp->manifest-entry <> (delay entry))
+                              propagated-inputs))
+              (visited  (current-state))
+              (key ->   (list name version path)))
+           (set! entry                             ;XXX: emulate 'letrec*'
+                 (manifest-entry
+                   (name name)
+                   (version version)
+                   (output output)
+                   (item path)
+                   (dependencies deps)
+                   (search-paths (map sexp->search-path-specification
+                                      search-paths))
+                   (parent parent)
+                   (properties properties)))
+           (mbegin %state-monad
+             (set-current-state (vhash-cons key entry visited))
+             (return entry)))))))
 
   (match sexp
     (('manifest ('version 0)
@@ -649,9 +683,7 @@ (define* (sexp->manifest-entry sexp #:optional (parent (delay #f)))
     ;; Version 3 represents DEPS as full-blown manifest entries.
     (('manifest ('version 3 minor-version ...)
                 ('packages (entries ...)))
-     (manifest (run-with-state
-                   (mapm %state-monad sexp->manifest-entry entries)
-                 vlist-null)))
+     (manifest (map sexp->manifest-entry/v3 entries)))
 
     ;; Version 4 deduplicates repeated entries, as can happen with deep
     ;; propagated input trees.
@@ -2368,4 +2400,8 @@ (define (user-friendly-profile profile)
             %known-shorthand-profiles)
       profile))
 
+;;; Local Variables:
+;;; eval: (put 'let-fields 'scheme-indent-function 2)
+;;; End:
+
 ;;; profiles.scm ends here
-- 
2.36.1





Information forwarded to bug-guix <at> gnu.org:
bug#55499; Package guix. (Tue, 31 May 2022 17:37:02 GMT) Full text and rfc822 format available.

Message #28 received at 55499 <at> debbugs.gnu.org (full text, mbox):

From: Maxime Devos <maximedevos <at> telenet.be>
To: Ludovic Courtès <ludo <at> gnu.org>, 55499 <at> debbugs.gnu.org
Subject: Re: bug#55499: [PATCH 2/3] profiles: Do not repeat entries in
 'manifest' file.
Date: Tue, 31 May 2022 19:35:50 +0200
[Message part 1 (text/plain, inline)]
Ludovic Courtès schreef op di 31-05-2022 om 18:09 [+0200]:
> With this change, the manifest file created for:
> 
>   guix install r r-seurat r-cistopic r-monocle3 r-cicero-monocle3 r-assertthat
> 
> goes from 5.6M to 192K.  Likewise, on this profile, wall-clock time of:
> 
>   GUIX_PROFILING=gc guix package -I
> 
> goes from 0.7s to 0.1s, with heap usage going from 55M to 9M.

I'm not familiar enough with this part of Guix to evaluate the patches,
but the time, disk memory and heap memory decreases sound great!

Greetings,
Maxime.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#55499; Package guix. (Wed, 01 Jun 2022 09:40:02 GMT) Full text and rfc822 format available.

Message #31 received at 55499 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Maxime Devos <maximedevos <at> telenet.be>
Cc: 55499 <at> debbugs.gnu.org
Subject: Re: bug#55499: [PATCH 2/3] profiles: Do not repeat entries in
 'manifest' file.
Date: Wed, 01 Jun 2022 11:38:54 +0200
Hi,

Maxime Devos <maximedevos <at> telenet.be> skribis:

> Ludovic Courtès schreef op di 31-05-2022 om 18:09 [+0200]:
>> With this change, the manifest file created for:
>> 
>>   guix install r r-seurat r-cistopic r-monocle3 r-cicero-monocle3 r-assertthat
>> 
>> goes from 5.6M to 192K.  Likewise, on this profile, wall-clock time of:
>> 
>>   GUIX_PROFILING=gc guix package -I
>> 
>> goes from 0.7s to 0.1s, with heap usage going from 55M to 9M.
>
> I'm not familiar enough with this part of Guix to evaluate the patches,
> but the time, disk memory and heap memory decreases sound great!

Yup!  The difference is significant primarily for profiles with lots of
propagated inputs, so typically profiles with R or Python packages.

For my home profile (300+ packages but no R and no Python), it goes from
316K to 112K, heap usage for ‘guix package -I’ goes from 12M to 9M and
wall-clock time is almost unchanged.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#55499; Package guix. (Tue, 14 Jun 2022 07:08:01 GMT) Full text and rfc822 format available.

Message #34 received at 55499 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 55499 <at> debbugs.gnu.org
Cc: Ricardo Wurmus <rekado <at> elephly.net>
Subject: Re: bug#55499: excessively large manifests due to propagation
Date: Tue, 14 Jun 2022 09:06:48 +0200
Hey Ricardo,

Ludovic Courtès <ludo <at> gnu.org> skribis:

> These patches implement what you suggested on IRC: not repeating
> entire manifest entries and their propagated inputs.  This has a
> dramatic impact on the size of the ‘manifest’ file and on the memory
> and processing time to read it for the the use case you gave.
>
> The second patch goes a tiny bit further by making the ‘search-paths’
> and ‘propagated-inputs’ fields optional, shaving another ~10% on the
> size of ‘manifest’ in this example.
>
> The second patch should be squashed with the first one (so we don’t
> bump version formats a second time and duplicate code).  It’s kinda
> optional because it doesn’t bring much compared to the first patch and
> causes a bit of extra complexity, but maybe it’s still worth keeping?
>
> Could you try this on your larger use cases and tell me how it goes?

Did you have a chance to give it a try?

Maybe I can double-check that everything’s alright and go ahead.

Thanks,
Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Fri, 01 Jul 2022 21:55:02 GMT) Full text and rfc822 format available.

Notification sent to Ricardo Wurmus <rekado <at> elephly.net>:
bug acknowledged by developer. (Fri, 01 Jul 2022 21:55:02 GMT) Full text and rfc822 format available.

Message #39 received at 55499-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 55499-done <at> debbugs.gnu.org
Cc: Ricardo Wurmus <rekado <at> elephly.net>
Subject: Re: bug#55499: excessively large manifests due to propagation
Date: Fri, 01 Jul 2022 23:54:41 +0200
Hi!

I went ahead and pushed it as 4ff12d1de7cd617b791996ee7ca1240660b4c20e.

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 30 Jul 2022 11:24:09 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 242 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.