GNU bug report logs -
#66650
[PATCH] git: Shell out to ‘git gc’ when necessary.
Previous Next
Reported by: Ludovic Courtès <ludo <at> gnu.org>
Date: Fri, 20 Oct 2023 16:17:01 UTC
Severity: normal
Tags: patch
Done: Ludovic Courtès <ludo <at> gnu.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 66650 in the body.
You can then email your comments to 66650 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
guix <at> cbaines.net, dev <at> jpoiret.xyz, ludo <at> gnu.org, othacehe <at> gnu.org, rekado <at> elephly.net, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org
:
bug#66650
; Package
guix-patches
.
(Fri, 20 Oct 2023 16:17:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Ludovic Courtès <ludo <at> gnu.org>
:
New bug report received and forwarded. Copy sent to
guix <at> cbaines.net, dev <at> jpoiret.xyz, ludo <at> gnu.org, othacehe <at> gnu.org, rekado <at> elephly.net, zimon.toutoune <at> gmail.com, me <at> tobias.gr, guix-patches <at> gnu.org
.
(Fri, 20 Oct 2023 16:17:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Fixes <https://issues.guix.gnu.org/65720>.
This fixes a bug whereby libgit2-managed checkouts would keep growing as
we fetch.
* guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New
procedures.
(update-cached-checkout): Use it.
---
guix/git.scm | 39 ++++++++++++++++++++++++++++++++++++---
1 file changed, 36 insertions(+), 3 deletions(-)
Hi!
This is a radical fix/workaround for the unbounded Git checkout growth
problem, shelling out to ‘git gc’ when it’s likely needed (“too many”
pack files around).
I thought we might be able to implement a ‘git gc’ approximation using
the libgit2 “packbuilder” interface, but I haven’t got around to doing
it: <https://libgit2.org/libgit2/#HEAD/search/pack>.
Once again, shelling out is not my favorite option, but it’s a bug we
should fix sooner rather than later, hence this compromise.
Thoughts?
Ludo’.
diff --git a/guix/git.scm b/guix/git.scm
index b7182305cf..d704b62333 100644
--- a/guix/git.scm
+++ b/guix/git.scm
@@ -1,6 +1,6 @@
;;; GNU Guix --- Functional package management for GNU
;;; Copyright © 2017, 2020 Mathieu Othacehe <m.othacehe <at> gmail.com>
-;;; Copyright © 2018-2022 Ludovic Courtès <ludo <at> gnu.org>
+;;; Copyright © 2018-2023 Ludovic Courtès <ludo <at> gnu.org>
;;; Copyright © 2021 Kyle Meyer <kyle <at> kyleam.com>
;;; Copyright © 2021 Marius Bakke <marius <at> gnu.org>
;;; Copyright © 2022 Maxime Devos <maximedevos <at> telenet.be>
@@ -29,15 +29,16 @@ (define-module (guix git)
#:use-module (guix cache)
#:use-module (gcrypt hash)
#:use-module ((guix build utils)
- #:select (mkdir-p delete-file-recursively))
+ #:select (mkdir-p delete-file-recursively invoke/quiet))
#:use-module (guix store)
#:use-module (guix utils)
#:use-module (guix records)
#:use-module (guix gexp)
#:autoload (guix git-download)
(git-reference-url git-reference-commit git-reference-recursive?)
+ #:autoload (guix config) (%git)
#:use-module (guix sets)
- #:use-module ((guix diagnostics) #:select (leave warning))
+ #:use-module ((guix diagnostics) #:select (leave warning info))
#:use-module (guix progress)
#:autoload (guix swh) (swh-download commit-id?)
#:use-module (rnrs bytevectors)
@@ -428,6 +429,35 @@ (define (delete-checkout directory)
(rename-file directory trashed)
(delete-file-recursively trashed)))
+(define (packs-in-git-repository directory)
+ "Return the number of pack files under DIRECTORY, a Git checkout."
+ (catch 'system-error
+ (lambda ()
+ (let ((directory (opendir (in-vicinity directory ".git/objects/pack"))))
+ (let loop ((count 0))
+ (match (readdir directory)
+ ((? eof-object?)
+ (closedir directory)
+ count)
+ (str
+ (loop (if (string-suffix? ".pack" str)
+ (+ 1 count)
+ count)))))))
+ (const 0)))
+
+(define (maybe-run-git-gc directory)
+ "Run 'git gc' in DIRECTORY if needed."
+ ;; XXX: As of libgit2 1.3.x (used by Guile-Git), there's no support for GC.
+ ;; Each time a checkout is pulled, a new pack is created, which eventually
+ ;; takes up a lot of space (lots of small, poorly-compressed packs). As a
+ ;; workaround, shell out to 'git gc' when the number of packs in a
+ ;; repository has become "too large", potentially wasting a lot of space.
+ ;; See <https://issues.guix.gnu.org/65720>.
+ (when (> (packs-in-git-repository directory) 25)
+ (info (G_ "compressing cached Git repository at '~a'...~%")
+ directory)
+ (invoke/quiet %git "-C" directory "gc")))
+
(define* (update-cached-checkout url
#:key
(ref '())
@@ -515,6 +545,9 @@ (define* (update-cached-checkout url
seconds seconds
nanoseconds nanoseconds))))
+ ;; Run 'git gc' if needed.
+ (maybe-run-git-gc cache-directory)
+
;; When CACHE-DIRECTORY is a sub-directory of the default cache
;; directory, remove expired checkouts that are next to it.
(let ((parent (dirname cache-directory)))
base-commit: 6b0a32196982a0a2f4dbb59d35e55833a5545ac6
--
2.41.0
Information forwarded
to
guix-patches <at> gnu.org
:
bug#66650
; Package
guix-patches
.
(Mon, 30 Oct 2023 12:05:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 66650 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> writes:
> Fixes <https://issues.guix.gnu.org/65720>.
>
> This fixes a bug whereby libgit2-managed checkouts would keep growing as
> we fetch.
>
> * guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New
> procedures.
> (update-cached-checkout): Use it.
> ---
> guix/git.scm | 39 ++++++++++++++++++++++++++++++++++++---
> 1 file changed, 36 insertions(+), 3 deletions(-)
>
> Hi!
>
> This is a radical fix/workaround for the unbounded Git checkout growth
> problem, shelling out to ‘git gc’ when it’s likely needed (“too many”
> pack files around).
>
> I thought we might be able to implement a ‘git gc’ approximation using
> the libgit2 “packbuilder” interface, but I haven’t got around to doing
> it: <https://libgit2.org/libgit2/#HEAD/search/pack>.
>
> Once again, shelling out is not my favorite option, but it’s a bug we
> should fix sooner rather than later, hence this compromise.
>
> Thoughts?
This sounds good to me, the data service has this problem as well of
cached checkouts that grow to be too large and this sounds like it'll
address it.
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
guix-patches <at> gnu.org
:
bug#66650
; Package
guix-patches
.
(Tue, 14 Nov 2023 09:20:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 66650 <at> debbugs.gnu.org (full text, mbox):
Hello,
Christopher Baines <mail <at> cbaines.net> skribis:
> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> Fixes <https://issues.guix.gnu.org/65720>.
>>
>> This fixes a bug whereby libgit2-managed checkouts would keep growing as
>> we fetch.
[...]
> This sounds good to me, the data service has this problem as well of
> cached checkouts that grow to be too large and this sounds like it'll
> address it.
Thanks for your input, Chris.
Any other comments? I’d like to push the patch within a few days if
there are no objections.
https://issues.guix.gnu.org/66650
Ludo’.
Information forwarded
to
guix-patches <at> gnu.org
:
bug#66650
; Package
guix-patches
.
(Tue, 14 Nov 2023 12:05:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 66650 <at> debbugs.gnu.org (full text, mbox):
Hi,
On Tue, 14 Nov 2023 at 10:19, Ludovic Courtès <ludo <at> gnu.org> wrote:
> Any other comments? I’d like to push the patch within a few days if
> there are no objections.
As mentioned in [1],
>> * guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New
>> procedures.
>> (update-cached-checkout): Use it.
>> ---
>> guix/git.scm | 39 ++++++++++++++++++++++++++++++++++++---
>> 1 file changed, 36 insertions(+), 3 deletions(-)
LGTM. Just two colors for the bikeshed. :-)
>> + (when (> (packs-in-git-repository directory) 25)
Why 25? And not 10 or 50 or 100?
>> (define* (update-cached-checkout url
>> #:key
>> (ref '())
>> @@ -515,6 +545,9 @@ (define* (update-cached-checkout url
>> seconds seconds
>> nanoseconds nanoseconds))))
>>
>> + ;; Run 'git gc' if needed.
>> + (maybe-run-git-gc cache-directory)
Why not trigger it by “guix gc”?
Well, I expect “guix gc” to take some time and I choose when. However,
I want “guix pull” or “guix time-machine” to be as fast as possible and
here some extra time is added, and I cannot control exactly when.
Cheers,
simon
1: bug#65720: [PATCH] git: Shell out to ‘git gc’ when necessary.
Simon Tournier <zimon.toutoune <at> gmail.com>
Mon, 23 Oct 2023 12:08:07 +0200
id:87il6xlkhk.fsf <at> gmail.com
https://issues.guix.gnu.org/65720
https://issues.guix.gnu.org/msgid/87il6xlkhk.fsf <at> gmail.com
https://yhetil.org/guix/87il6xlkhk.fsf <at> gmail.com
Information forwarded
to
guix-patches <at> gnu.org
:
bug#66650
; Package
guix-patches
.
(Thu, 16 Nov 2023 12:13:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 66650 <at> debbugs.gnu.org (full text, mbox):
Hi,
Simon Tournier <zimon.toutoune <at> gmail.com> skribis:
>>> * guix/git.scm (packs-in-git-repository, maybe-run-git-gc): New
>>> procedures.
>>> (update-cached-checkout): Use it.
>>> ---
>>> guix/git.scm | 39 ++++++++++++++++++++++++++++++++++++---
>>> 1 file changed, 36 insertions(+), 3 deletions(-)
>
> LGTM.
Thanks!
> Just two colors for the bikeshed. :-)
>
>
>>> + (when (> (packs-in-git-repository directory) 25)
>
> Why 25? And not 10 or 50 or 100?
Totally arbitrary. :-) I sampled the checkouts I had on my laptop and
that seems like a reasonable heuristic. In particular, it seems that
Git-managed checkouts never have this many packs; only libgit2-managed
checkouts do, precisely because libgit2 doesn’t repack/GC.
>>> + ;; Run 'git gc' if needed.
>>> + (maybe-run-git-gc cache-directory)
>
> Why not trigger it by “guix gc”?
Because so far the idea is that ~/.cache/guix/checkouts is automatically
managed without user intervention; it’s really a cache in that sense.
> Well, I expect “guix gc” to take some time and I choose when. However,
> I want “guix pull” or “guix time-machine” to be as fast as possible and
> here some extra time is added, and I cannot control exactly when.
Yes, I see. The thing is ‘maybe-run-git-gc’ is only called on the slow
path; so for example, it’s not called on a ‘time-machine’ cache hit, but
only on a cache miss, which is already expensive anyway.
Does that make sense?
Thanks,
Ludo’.
Information forwarded
to
guix-patches <at> gnu.org
:
bug#66650
; Package
guix-patches
.
(Thu, 16 Nov 2023 13:26:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 66650 <at> debbugs.gnu.org (full text, mbox):
Hi,
On Thu, 16 Nov 2023 at 13:12, Ludovic Courtès <ludo <at> gnu.org> wrote:
> > Well, I expect “guix gc” to take some time and I choose when. However,
> > I want “guix pull” or “guix time-machine” to be as fast as possible and
> > here some extra time is added, and I cannot control exactly when.
>
> Yes, I see. The thing is ‘maybe-run-git-gc’ is only called on the slow
> path; so for example, it’s not called on a ‘time-machine’ cache hit, but
> only on a cache miss, which is already expensive anyway.
What you mean as "only called on the slow path" is each time
'update-cached-checkout' is called, right? So, somehow when
'maybe-run-git-gc' is called appears to me "unpredictable". But
anyway. :-)
Let move it elsewhere if I am really annoyed.
Cheers,
simon
Information forwarded
to
guix-patches <at> gnu.org
:
bug#66650
; Package
guix-patches
.
(Wed, 22 Nov 2023 11:19:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 66650 <at> debbugs.gnu.org (full text, mbox):
Hi,
Simon Tournier <zimon.toutoune <at> gmail.com> skribis:
> On Thu, 16 Nov 2023 at 13:12, Ludovic Courtès <ludo <at> gnu.org> wrote:
>
>> > Well, I expect “guix gc” to take some time and I choose when. However,
>> > I want “guix pull” or “guix time-machine” to be as fast as possible and
>> > here some extra time is added, and I cannot control exactly when.
>>
>> Yes, I see. The thing is ‘maybe-run-git-gc’ is only called on the slow
>> path; so for example, it’s not called on a ‘time-machine’ cache hit, but
>> only on a cache miss, which is already expensive anyway.
>
> What you mean as "only called on the slow path" is each time
> 'update-cached-checkout' is called, right?
Yes, which usually indicates we’re on a cache miss (for example a cache
miss of ‘guix time-machine’) and thus are going to do potentially more
work (updating a Git repo, building things, etc.). That’s why I think
it’s on the “slow path” and shouldn’t make much of a difference. More
importantly, unless I’m mistaken, it’s rarely going to fire.
> So, somehow when 'maybe-run-git-gc' is called appears to me
> "unpredictable". But anyway. :-)
Sure, but the way I see it, that’s the nature of caches.
> Let move it elsewhere if I am really annoyed.
:-/
Ludo’.
Information forwarded
to
guix-patches <at> gnu.org
:
bug#66650
; Package
guix-patches
.
(Wed, 22 Nov 2023 11:58:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 66650 <at> debbugs.gnu.org (full text, mbox):
Hi Ludo,
Thanks for explaining.
On Wed, 22 Nov 2023 at 12:17, Ludovic Courtès <ludo <at> gnu.org> wrote:
> it’s rarely going to fire.
[...]
>> Let move it elsewhere if I am really annoyed.
>
> :-/
Sorry, I poorly worded my last comment. :-)
Somehow I was expressing: my view probably falls into the “Premature
optimization is the root of all evil” category. Other said, I have no
objection and I will revisit the issue when I will be on fire, if I am,
or annoyed for real.
Cheers,
simon
PS:
Aside this patch:
>> So, somehow when 'maybe-run-git-gc' is called appears to me
>> "unpredictable". But anyway. :-)
>
> Sure, but the way I see it, that’s the nature of caches.
What makes cache unpredictable is their current state. However, this
does not imply that *all* the actions modifying from one state to
another must also be triggered in unpredictable moment.
For instance, I choose when I wash family’s clothes and the wash-machine
does not start by itself when the unpredictable stack of family’s dirty
clothes is enough. Because, maybe today it’s rainy so drying is
difficult and tomorrow will be sunny so it will be a better moment. :-)
For me, “guix gc” should be the driver for cleaning all the various Guix
caches. Anyway. :-D
Reply sent
to
Ludovic Courtès <ludo <at> gnu.org>
:
You have taken responsibility.
(Wed, 22 Nov 2023 16:01:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Ludovic Courtès <ludo <at> gnu.org>
:
bug acknowledged by developer.
(Wed, 22 Nov 2023 16:01:02 GMT)
Full text and
rfc822 format available.
Message #31 received at 66650-done <at> debbugs.gnu.org (full text, mbox):
Hi,
Simon Tournier <zimon.toutoune <at> gmail.com> skribis:
> Somehow I was expressing: my view probably falls into the “Premature
> optimization is the root of all evil” category. Other said, I have no
> objection and I will revisit the issue when I will be on fire, if I am,
> or annoyed for real.
Alright!
Pushed as b150c546b04c9ebb09de9f2c39789221054f5eea.
Let’s see how it behaves and if there are problems we had overlooked…
Ludo’.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 21 Dec 2023 12:24:09 GMT)
Full text and
rfc822 format available.
This bug report was last modified 1 year and 167 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.