GNU bug report logs - #74203
Coreutils test failure on Btrfs

Previous Next

Package: guix;

Reported by: "Collin J. Doering" <collin <at> rekahsoft.ca>

Date: Mon, 4 Nov 2024 15:39:01 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 74203 in the body.
You can then email your comments to 74203 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#74203; Package guix. (Mon, 04 Nov 2024 15:39:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Collin J. Doering" <collin <at> rekahsoft.ca>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Mon, 04 Nov 2024 15:39:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Collin J. Doering" <collin <at> rekahsoft.ca>
To: bug-guix <at> gnu.org
Subject: coreutils fails to build
Date: Mon, 04 Nov 2024 10:37:53 -0500
[Message part 1 (text/plain, inline)]
Hi lovely maintainers of Guix!

Some time ago I announced the availability of a guix build farm running out of the University of Tennessee[1]. Some time ago, builds started failing due to a failure to build coreutils[2]; investigation showed a unexpected failing test:

--8<---------------cut here---------------start------------->8---
FAIL tests/cp/reflink-auto.sh (exit status: 1)
--8<---------------cut here---------------end--------------->8---

I found that on other guix systems, this is not occurring. After some online sleuthing, it appears that the nix folks have seen this before[3]. They opted to disable the test 'tests/cp/reflink-auto.sh' as it can fail when using btrfs. On the guix system impacted, disabling coreutils tests makes the package build.

For reference, coreutils was building on cuirass.genenetwork.org on guix commit `0c908518375aea50be6dec703367c01944c8c721` and stopped building on `66611696975409a52478b95a862a464daeaefe2a`.

I suggest we follow what the nix folks did (disable `tests/cp/reflink-auto.sh`). In a following email you will find a patch that does so, however, because it changes coreutils, this will cause many packages to be rebuilt, so I'm unsure whats the best way to correct this without having to wait for core-updates to be merged.

Any advise or insight appreciated.

[1]: https://lists.gnu.org/archive/html/guix-devel/2024-07/msg00033.html
[2]: https://cuirass.genenetwork.org/eval/157119/log/raw
[3]: https://github.com/NixOS/nixpkgs/pull/190211

-- 
Collin J. Doering

http://rekahsoft.ca
http://blog.rekahsoft.ca
http://git.rekahsoft.ca
[signature.asc (application/pgp-signature, inline)]

Information forwarded to andreas <at> enge.fr, ludo <at> gnu.org, bug-guix <at> gnu.org:
bug#74203; Package guix. (Mon, 04 Nov 2024 15:44:02 GMT) Full text and rfc822 format available.

Message #8 received at 74203 <at> debbugs.gnu.org (full text, mbox):

From: "Collin J. Doering" <collin <at> rekahsoft.ca>
To: 74203 <at> debbugs.gnu.org
Cc: "Collin J. Doering" <collin <at> rekahsoft.ca>
Subject: [PATCH] gnu: coreutils: Disable cp/reflink-auto.sh as it can fail on
 btrfs
Date: Mon,  4 Nov 2024 10:42:21 -0500
* gnu/packages/base.scm: Similarly to
nix (https://github.com/NixOS/nixpkgs/pull/190211), disable
tests/cp/reflink-auto.sh test as it can fail on btrfs. This was discovered by
the cuirass.genenetwork.org build farm.

Change-Id: If1cc3d516c5807e580ec64ab93670e30090581a7
---
 gnu/packages/base.scm | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gnu/packages/base.scm b/gnu/packages/base.scm
index 4e8121ae2c..bed708fc27 100644
--- a/gnu/packages/base.scm
+++ b/gnu/packages/base.scm
@@ -506,6 +506,8 @@ (define-public coreutils
                                    "tests/split/fail.sh"
                                    ;; These tests error
                                    "tests/dd/nocache.sh"
+                                   ;; These tests can intermitently fail on btrfs
+                                   "tests/cp/reflink-auto.sh"
                                    ;; These tests fail
                                    "tests/cp/sparse.sh"
                                    "tests/cp/special-f.sh"

base-commit: 915f807ce61c48c34141f0300ea7623170f4148a
-- 
2.46.0





Information forwarded to bug-guix <at> gnu.org:
bug#74203; Package guix. (Thu, 14 Nov 2024 02:52:02 GMT) Full text and rfc822 format available.

Message #11 received at 74203 <at> debbugs.gnu.org (full text, mbox):

From: "Collin J. Doering" <collin <at> rekahsoft.ca>
To: 74203 <at> debbugs.gnu.org
Subject: Further investigation and workaround
Date: Wed, 13 Nov 2024 21:50:47 -0500
[Message part 1 (text/plain, inline)]
Hi again,

I wanted to follow up on my previous report and patch. I still think its useful to consider disabling the coreutils test I previously suggested, however I found a way to work around the issue and wanted to make note of it, as well as provide some details of my investigation.

To work around the coreutils test `tests/cp/reflink-auto.sh` failing on guix commit `66611696975409a52478b95a862a464daeaefe2a`, I temporarily mounted a tmpfs to replace /tmp (which was on btrfs).

--8<---------------cut here---------------start------------->8---
mv /tmp /tmp.old
mkdir /tmp
mount -t tmpfs tmpfs /tmp
chmod 1777 /tmp
mv /tmp.old/{.*,*} /tmp/
--8<---------------cut here---------------end--------------->8---

Now, what made me do this? Well let me explain!

In `tests/cp/reflink-auto.sh` (https://github.com/coreutils/coreutils/blob/v9.1/tests/cp/reflink-auto.sh), the failing part of the test:

--8<---------------cut here---------------start------------->8---
# we shouldn't be able to reflink() files on separate partitions
. "$abs_srcdir/tests/other-fs-tmpdir"
a_other="$other_partition_tmpdir/a"
<..>
returns_ 1 cp --reflink "$a_other" b || fail=1
--8<---------------cut here---------------end--------------->8---

'$other_partition_tmpdir' is defined in 'tests/other-fs-tmpdir' (https://github.com/coreutils/coreutils/blob/v9.1/tests/other-fs-tmpdir) by looking through a list of candidate directories, comparing the current working directory to each candidate to see if they have different device ids (as given by 'stat -c %d <path>') and that the current user can create directories there. Once it finds a candidate, it sets '$other_partition_tmpdir' to the temporary directory it created. The candidate directories that are considered are as follows:

--8<---------------cut here---------------start------------->8---
test "${CANDIDATE_TMP_DIRS+set}" = set \
  || CANDIDATE_TMP_DIRS="$TMPDIR /tmp /dev/shm /var/tmp /usr/tmp $HOME"
--8<---------------cut here---------------end--------------->8---

Looking at a remaining failed build of coreutils (left over by building with `--keep-failed`), I see that in 'top/environment-variables', 'TMPDIR' is set to '/tmp/guix-build-guix-1.4.0-26.5ab3c4c.drv-0'. This directory is the same place the build is taking place, so I would expect it to 'be on the same partition'. So, next would be /tmp, where the same premise applies; next is /dev/shm. From my tests simulating the coreutils guix shell build environment, this would meet the conditions and be selected. However, if this were the case, I wouldn't expect the coreutils reflink test to fail.

My suspicion is that for some reason, 'stat -c %d <path>' to check whether two files, a and b are on the same partition doesn't play well with btrfs subvolumes in some instances with guix-daemon sandboxed builds. However, when trying to test this in a simulated coreutils guix shell build environment, I found that paths outside of the environment on different subvolumes (that do indeed show different device ids (as per 'stat -c %d <path>' outside of the guix shell container)), show the same id's within it. I suspect this is related to why the coreutils test fails, but does not when I use a tmpfs for /tmp. Its worth noting that on the system impacted, /gnu/store is a btrfs subvolume.

I am not yet satisfied with my with my partial explanation, and am very curious if anyone spots something I'm missing (eg. has a better understanding of the guix build environment and why the reflink coreutils test could be failing like this).

Thanks for your time and attention.

-- 
Collin J. Doering

http://rekahsoft.ca
http://blog.rekahsoft.ca
http://git.rekahsoft.ca
[signature.asc (application/pgp-signature, inline)]

Changed bug title to 'Coreutils test failure on Btrfs' from 'coreutils fails to build' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Wed, 20 Nov 2024 21:50:01 GMT) Full text and rfc822 format available.

Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Sat, 22 Mar 2025 22:33:01 GMT) Full text and rfc822 format available.

Notification sent to "Collin J. Doering" <collin <at> rekahsoft.ca>:
bug acknowledged by developer. (Sat, 22 Mar 2025 22:33:02 GMT) Full text and rfc822 format available.

Message #18 received at 74203-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "Collin J. Doering" <collin <at> rekahsoft.ca>
Cc: Andreas Enge <andreas <at> enge.fr>, 74203-done <at> debbugs.gnu.org
Subject: Re: bug#74203: [PATCH] gnu: coreutils: Disable cp/reflink-auto.sh
 as it can fail on btrfs
Date: Sat, 22 Mar 2025 23:32:40 +0100
Hi Collin,

"Collin J. Doering" <collin <at> rekahsoft.ca> skribis:

> * gnu/packages/base.scm: Similarly to
> nix (https://github.com/NixOS/nixpkgs/pull/190211), disable
> tests/cp/reflink-auto.sh test as it can fail on btrfs. This was discovered by
> the cuirass.genenetwork.org build farm.
>
> Change-Id: If1cc3d516c5807e580ec64ab93670e30090581a7

Finally applied, thanks!

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 20 Apr 2025 11:24:08 GMT) Full text and rfc822 format available.

This bug report was last modified 19 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.