GNU bug report logs - #70877
guix-daemon fails to copy 4+GB file to store

Previous Next

Package: guix;

Reported by: Ricardo Wurmus <rekado <at> elephly.net>

Date: Sat, 11 May 2024 10:54:01 UTC

Severity: important

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 70877 in the body.
You can then email your comments to 70877 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to ludo <at> gnu.org, bug-guix <at> gnu.org:
bug#70877; Package guix. (Sat, 11 May 2024 10:54:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ricardo Wurmus <rekado <at> elephly.net>:
New bug report received and forwarded. Copy sent to ludo <at> gnu.org, bug-guix <at> gnu.org. (Sat, 11 May 2024 10:54:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: bug-guix <at> gnu.org
Subject: guix-daemon fails to copy 4+GB file to store
Date: Sat, 11 May 2024 12:52:53 +0200
[Message part 1 (text/plain, inline)]
The guix-daemon's libutil/util.cc uses copy_file_range to copy a
downloaded file into the store.  copy_file_range fails on files larger
than 4GB with an error like this:

    guix build: error: short write in copy_file_range `15' to `16': No such file or directory

The man page for copy_file_range says that it could return EFBIG when
the range exceeds the maximum range.  The daemon code does not check any
limits and will attempt to copy the whole file.

I believe our code ought to check the value of st.size and fall back to
a boring copy if it exceeds some "reasonable" value.

This is where copy_file_range is used:
https://git.savannah.gnu.org/cgit/guix.git/tree/nix/libutil/util.cc#n382

Here is a little reproducer:

[bug.scm (text/plain, inline)]
(use-modules (guix download)
             (guix packages)
             (guix build-system trivial))

(package
  (name "chungus")
  (version "1")
  (source
   (origin
     (method url-fetch)
     (uri "http://localhost:1111/chungus")
     (sha256
      (base32 "0nx67d4ls2nfwcfdmg81vf240z6lpwpdqypssr1wzn3hyz4szci4"))))
  (build-system trivial-build-system)
  (home-page "")
  (synopsis "")
  (description "")
  (license #f))
[Message part 3 (text/plain, inline)]
--8<---------------cut here---------------start------------->8---
# generate a big file
dd bs=1M count=4096 if=/dev/zero of=/tmp/chungus
# serve it
guix shell woof -- woof -i 127.0.0.1 -p 1111 -c 1 /tmp/chungus
# build the source derivation
guix build --no-grafts -Sf bug.scm
# observe the error
# guix build: error: short write in copy_file_range `15' to `16': No such file or directory
--8<---------------cut here---------------end--------------->8---

-- 
Ricardo

Information forwarded to bug-guix <at> gnu.org:
bug#70877; Package guix. (Sun, 12 May 2024 07:14:01 GMT) Full text and rfc822 format available.

Message #8 received at 70877 <at> debbugs.gnu.org (full text, mbox):

From: Efraim Flashner <efraim <at> flashner.co.il>
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: ludo <at> gnu.org, 70877 <at> debbugs.gnu.org
Subject: Re: bug#70877: guix-daemon fails to copy 4+GB file to store
Date: Sun, 12 May 2024 10:12:11 +0300
[Message part 1 (text/plain, inline)]
On Sat, May 11, 2024 at 12:52:53PM +0200, Ricardo Wurmus wrote:
> The guix-daemon's libutil/util.cc uses copy_file_range to copy a
> downloaded file into the store.  copy_file_range fails on files larger
> than 4GB with an error like this:
> 
>     guix build: error: short write in copy_file_range `15' to `16': No such file or directory
> 
> The man page for copy_file_range says that it could return EFBIG when
> the range exceeds the maximum range.  The daemon code does not check any
> limits and will attempt to copy the whole file.
> 
> I believe our code ought to check the value of st.size and fall back to
> a boring copy if it exceeds some "reasonable" value.
> 
> This is where copy_file_range is used:
> https://git.savannah.gnu.org/cgit/guix.git/tree/nix/libutil/util.cc#n382
> 
> Here is a little reproducer:
> 

> (use-modules (guix download)
>              (guix packages)
>              (guix build-system trivial))
> 
> (package
>   (name "chungus")
>   (version "1")
>   (source
>    (origin
>      (method url-fetch)
>      (uri "http://localhost:1111/chungus")
>      (sha256
>       (base32 "0nx67d4ls2nfwcfdmg81vf240z6lpwpdqypssr1wzn3hyz4szci4"))))
>   (build-system trivial-build-system)
>   (home-page "")
>   (synopsis "")
>   (description "")
>   (license #f))

> 
> --8<---------------cut here---------------start------------->8---
> # generate a big file
> dd bs=1M count=4096 if=/dev/zero of=/tmp/chungus
> # serve it
> guix shell woof -- woof -i 127.0.0.1 -p 1111 -c 1 /tmp/chungus
> # build the source derivation
> guix build --no-grafts -Sf bug.scm
> # observe the error
> # guix build: error: short write in copy_file_range `15' to `16': No such file or directory
> --8<---------------cut here---------------end--------------->8---
> 

This sounds like a similar failure to bug 65714 that I ran into with
guix copy, but I wasn't able to diagnose it.

https://issues.guix.gnu.org/65714


-- 
Efraim Flashner   <efraim <at> flashner.co.il>   רנשלפ םירפא
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
[signature.asc (application/pgp-signature, inline)]

Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Mon, 13 May 2024 09:06:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#70877; Package guix. (Mon, 13 May 2024 10:11:01 GMT) Full text and rfc822 format available.

Message #13 received at 70877 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: 70877 <at> debbugs.gnu.org
Subject: Re: bug#70877: guix-daemon fails to copy 4+GB file to store
Date: Mon, 13 May 2024 12:10:34 +0200
[Message part 1 (text/plain, inline)]
Hi,

Thanks for the bug report and nice reproducer!

Ricardo Wurmus <rekado <at> elephly.net> skribis:

> The guix-daemon's libutil/util.cc uses copy_file_range to copy a
> downloaded file into the store.  copy_file_range fails on files larger
> than 4GB with an error like this:
>
>     guix build: error: short write in copy_file_range `15' to `16': No such file or directory
>
> The man page for copy_file_range says that it could return EFBIG when
> the range exceeds the maximum range.  The daemon code does not check any
> limits and will attempt to copy the whole file.
>
> I believe our code ought to check the value of st.size and fall back to
> a boring copy if it exceeds some "reasonable" value.

The goal leading to this error message looks like this:

  copy_file_range(15, NULL, 16, NULL, 4294967297, 0) = 2147479552

… which is precisely 2 GiB - 4 KiB.

Reading the man page, it’s entirely fine: like ‘write’,
‘copy_file_range’ might copy less than asked for, so it’s really a
mistake of mine to assume that short writes can’t happen.  Presumably
there’s an internal limit here we’re reaching that explains why it won’t
copy more than 2 GiB at once.

With the following change, we get:

  newfstatat(15, "", {st_mode=S_IFREG|0644, st_size=4294967297, ...}, AT_EMPTY_PATH) = 0
  copy_file_range(15, NULL, 16, NULL, 4294967297, 0) = 2147479552
  copy_file_range(15, NULL, 16, NULL, 2147487745, 0) = 2147479552
  copy_file_range(15, NULL, 16, NULL, 8193, 0) = 8193
  fchown(16, 30001, 30000)          = 0

Could you confirm that it works for you?

Thanks,
Ludo’.

[0001-daemon-Loop-over-copy_file_range-upon-short-writes.patch (text/x-patch, inline)]
From efd9f3383756df9959651125c0f2e2e769630851 Mon Sep 17 00:00:00 2001
Message-ID: <efd9f3383756df9959651125c0f2e2e769630851.1715594931.git.ludo <at> gnu.org>
From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo <at> gnu.org>
Date: Mon, 13 May 2024 12:02:30 +0200
Subject: [PATCH] =?UTF-8?q?daemon:=20Loop=20over=20=E2=80=98copy=5Ffile=5F?=
 =?UTF-8?q?range=E2=80=99=20upon=20short=20writes.?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fixes <https://issues.guix.gnu.org/70877>.

* nix/libutil/util.cc (copyFile): Loop over ‘copy_file_range’ instead of
throwing upon short write.

Reported-by: Ricardo Wurmus <rekado <at> elephly.net>
Change-Id: Id7b8a65ea59006c2d91bc23732309a68665b9ca0
---
 nix/libutil/util.cc | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/nix/libutil/util.cc b/nix/libutil/util.cc
index 578d6572934..3206dea11b1 100644
--- a/nix/libutil/util.cc
+++ b/nix/libutil/util.cc
@@ -397,9 +397,14 @@ static void copyFile(int sourceFd, int destinationFd)
     } else {
 	if (result < 0)
 	    throw SysError(format("copy_file_range `%1%' to `%2%'") % sourceFd % destinationFd);
-	if (result < st.st_size)
-	    throw SysError(format("short write in copy_file_range `%1%' to `%2%'")
-			   % sourceFd % destinationFd);
+
+	/* If 'copy_file_range' copied less than requested, try again.  */
+	for (ssize_t copied = result; copied < st.st_size; copied += result) {
+	    result = copy_file_range(sourceFd, NULL, destinationFd, NULL,
+				     st.st_size - copied, 0);
+	    if (result < 0)
+		throw SysError(format("copy_file_range `%1%' to `%2%'") % sourceFd % destinationFd);
+	}
     }
 }
 

base-commit: 89cd778f6a45cd9b43a4dc1f236dcd0a87af955c
-- 
2.41.0


Information forwarded to bug-guix <at> gnu.org:
bug#70877; Package guix. (Mon, 13 May 2024 12:11:02 GMT) Full text and rfc822 format available.

Message #16 received at 70877 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 70877 <at> debbugs.gnu.org
Subject: Re: bug#70877: guix-daemon fails to copy 4+GB file to store
Date: Mon, 13 May 2024 14:09:47 +0200
Ludovic Courtès <ludo <at> gnu.org> writes:

> Could you confirm that it works for you?

I've applied this locally, started the new daemon, and used it to build
the 4+GB source code derivation of a big package that used to fail
before.  It works now.  Thank you!

-- 
Ricardo




Information forwarded to bug-guix <at> gnu.org:
bug#70877; Package guix. (Mon, 13 May 2024 14:35:02 GMT) Full text and rfc822 format available.

Message #19 received at 70877 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: 70877 <at> debbugs.gnu.org
Subject: Re: bug#70877: guix-daemon fails to copy 4+GB file to store
Date: Mon, 13 May 2024 16:34:26 +0200
Ricardo Wurmus <rekado <at> elephly.net> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> Could you confirm that it works for you?
>
> I've applied this locally, started the new daemon, and used it to build
> the 4+GB source code derivation of a big package that used to fail
> before.  It works now.  Thank you!

Pushed as 7757fdd491862fa5c33f1f894503346b89898a01.

I’ll update the ‘guix’ package to make the fix available.

Thanks for testing!

Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Mon, 13 May 2024 16:25:02 GMT) Full text and rfc822 format available.

Notification sent to Ricardo Wurmus <rekado <at> elephly.net>:
bug acknowledged by developer. (Mon, 13 May 2024 16:25:02 GMT) Full text and rfc822 format available.

Message #24 received at 70877-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Ricardo Wurmus <rekado <at> elephly.net>
Cc: 70877-done <at> debbugs.gnu.org
Subject: Re: bug#70877: guix-daemon fails to copy 4+GB file to store
Date: Mon, 13 May 2024 18:24:22 +0200
Ludovic Courtès <ludo <at> gnu.org> skribis:

> Pushed as 7757fdd491862fa5c33f1f894503346b89898a01.
>
> I’ll update the ‘guix’ package to make the fix available.

Done in 58be9a79e2862d5fa9842d73f498ce2e5442b9ce.

Ludo'.




Information forwarded to bug-guix <at> gnu.org:
bug#70877; Package guix. (Tue, 14 May 2024 22:27:02 GMT) Full text and rfc822 format available.

Message #27 received at 70877 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 70877 <at> debbugs.gnu.org, 70398 <at> debbugs.gnu.org
Subject: Re: bug#70877: guix-daemon fails to copy 4+GB file to store
Date: Wed, 15 May 2024 00:26:34 +0200
BTW, the newly updated ‘guix’ package is 8% smaller, as a result of
<https://issues.guix.gnu.org/70398>:

--8<---------------cut here---------------start------------->8---
$ guix describe 
Generation 302  May 12 2024 23:29:11    (current)
  guix 89cd778
    repository URL: https://git.savannah.gnu.org/git/guix.git
    branch: master
    commit: 89cd778f6a45cd9b43a4dc1f236dcd0a87af955c
$ guix size guix |head -2
store item                                                       total    self
/gnu/store/r96xq0064nqf43ygcr7z9lgb18vrd1wa-guix-1.4.0-18.4c94b9e   705.8   400.6  56.8%
$ ./pre-inst-env guix size guix |head -2
store item                                                       total    self
/gnu/store/mcw1d2zy96is5ymjj903i3bi5a0qdwr5-guix-1.4.0-19.7ca9809   673.8   368.7  54.7%
$ git log |head -1
commit 58be9a79e2862d5fa9842d73f498ce2e5442b9ce
--8<---------------cut here---------------end--------------->8---

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 12 Jun 2024 11:24:18 GMT) Full text and rfc822 format available.

This bug report was last modified 5 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.