GNU bug report logs - #65979
incorrect “guix hash” for FastQC

Previous Next

Package: guix;

Reported by: Simon Tournier <zimon.toutoune <at> gmail.com>

Date: Thu, 14 Sep 2023 16:48:01 UTC

Severity: normal

To reply to this bug, email your comments to 65979 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#65979; Package guix. (Thu, 14 Sep 2023 16:48:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Simon Tournier <zimon.toutoune <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Thu, 14 Sep 2023 16:48:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Simon Tournier <zimon.toutoune <at> gmail.com>
To: bug-guix <at> gnu.org
Subject: incorrect “guix hash” for FastQC
Date: Thu, 14 Sep 2023 18:46:05 +0200
Hi,

Currently Guix stores in its source the hash
00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk and this has not
changed since February (Guix revision
b6a4fbb488f1f539ae45ed7924c9af8905fa0d8b).

Well, because the origin contains a snippet, “guix build -S” does not
return the result of the fixed derivation.  Instead, it is one or the other:

--8<---------------cut here---------------start------------->8---
$ guix gc -R $(guix build -S fastqc -d) | grep checkout.drv
/gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv
/gnu/store/7rdcyl7iq00xfdwh6azgmc3i7mr2769b-fastqc-0.11.9-checkout.drv
--8<---------------cut here---------------end--------------->8---

Let pick the correct one and I get:

--8<---------------cut here---------------start------------->8---
$ guix hash -rx $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479
--8<---------------cut here---------------end--------------->8---

Oh, it is not the expected result!  Instead, I have to omit the option
’-x’:

--8<---------------cut here---------------start------------->8---
$ guix hash -r $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk
--8<---------------cut here---------------end--------------->8---

Hum, I am missing something…

Let clone from upstream.

--8<---------------cut here---------------start------------->8---
$ git clone https://github.com/s-andrews/FastQC
$ git -C FastQC checkout v0.11.9
$ guix hash -rx FastQC
0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479
--8<---------------cut here---------------end--------------->8---

Ah… incorrect but consistent with previously.  The question is why is
this following,

--8<---------------cut here---------------start------------->8---
$ mv FastQC/.git bye-bye
$ guix hash -r FastQC
00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk
--8<---------------cut here---------------end--------------->8---

is correct?  And obviously without .git we get the same hash just as
above,

--8<---------------cut here---------------start------------->8---
$ guix hash -rx FastQC
0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479
--8<---------------cut here---------------end--------------->8---

Any idea?

Cheers,
simon




Information forwarded to bug-guix <at> gnu.org:
bug#65979; Package guix. (Tue, 26 Sep 2023 13:01:01 GMT) Full text and rfc822 format available.

Message #8 received at 65979 <at> debbugs.gnu.org (full text, mbox):

From: Simon Tournier <zimon.toutoune <at> gmail.com>
To: 65979 <at> debbugs.gnu.org
Subject: Re: bug#65979: incorrect “guix hash” for FastQC
Date: Tue, 26 Sep 2023 15:00:25 +0200
Hi,

On Thu, 14 Sep 2023 at 18:46, Simon Tournier <zimon.toutoune <at> gmail.com> wrote:

> $ guix hash -rx $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
> 0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479
>
> Oh, it is not the expected result!  Instead, I have to omit the option
> ’-x’:
>
> $ guix hash -r $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
> 00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk

[...]

> $ git clone https://github.com/s-andrews/FastQC
> $ git -C FastQC checkout v0.11.9
> $ guix hash -rx FastQC
> 0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479

[...]

> $ mv FastQC/.git bye-bye
> $ guix hash -r FastQC
> 00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk

[...]

> Any idea?

The issue is from .svn folders.

--8<---------------cut here---------------start------------->8---
$ find $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv) \
       -type d -name ".svn" -print
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/1 Introduction/.svn
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/3 Analysis Modules/.svn
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/2 Basic Operations/.svn
--8<---------------cut here---------------end--------------->8---

The option ’-x’ excludes them; see vcs-file? from (guix hash) module,
and let confirm:

    $ git clone https://github.com/s-andrews/FastQC
    $ git -C FastQC checkout v0.11.9

    $ guix hash -rx FastQC
    0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479

    $ find FastQC -type d -name ".svn" -exec rm -r "{}" \;
    $ mv FastQC/.git bye-bye
    $ guix hash -r FastQC
    0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479

Therefore, we have a discrepancy between “guix hash” command-line and
checksum verification from ’source’ field.

The package definition reads:

    (source
     (origin
       (method git-fetch)
       (uri (git-reference
             (url "https://github.com/s-andrews/FastQC")
             (commit (string-append "v" version))))
       (file-name (git-file-name name version))
       (sha256
        (base32
         "00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk"))

when this hash contains .svn subfolders that are excluded by ’-x’
command-line option.

What do we fix?  The procedure vcs-file? or fixed-output computation?

Cheers,
simon




Information forwarded to bug-guix <at> gnu.org:
bug#65979; Package guix. (Tue, 26 Sep 2023 13:31:01 GMT) Full text and rfc822 format available.

Message #11 received at 65979 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Simon Tournier <zimon.toutoune <at> gmail.com>
Cc: 65979 <at> debbugs.gnu.org
Subject: Re: bug#65979: incorrect “guix hash” for FastQC
Date: Tue, 26 Sep 2023 15:29:53 +0200
Hi!

Simon Tournier <zimon.toutoune <at> gmail.com> skribis:

> Let pick the correct one and I get:
>
> $ guix hash -rx $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
> 0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479
>
>
> Oh, it is not the expected result!  Instead, I have to omit the option
> ’-x’:
>
> $ guix hash -r $(guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv)
> 00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk

This is because:

--8<---------------cut here---------------start------------->8---
$ find $( guix build /gnu/store/8gxx74w8pa0fx74lx1ka654rhyyr51cs-fastqc-0.11.9-checkout.drv) -name .svn
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/3 Analysis Modules/.svn
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/1 Introduction/.svn
/gnu/store/5dyvjgylwiixhp8z9a08rnddh53h4cdx-fastqc-0.11.9-checkout/Help/2 Basic Operations/.svn
--8<---------------cut here---------------end--------------->8---

Not a bug, IMO (‘vcs-file?’ shouldn’t try to be smart in this case.)

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#65979; Package guix. (Tue, 26 Sep 2023 13:36:01 GMT) Full text and rfc822 format available.

Message #14 received at 65979 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Simon Tournier <zimon.toutoune <at> gmail.com>
Cc: 65979 <at> debbugs.gnu.org
Subject: Re: bug#65979: incorrect “guix hash” for FastQC
Date: Tue, 26 Sep 2023 15:34:47 +0200
(Oops, now I see you had already found the reason.)

Simon Tournier <zimon.toutoune <at> gmail.com> skribis:

> What do we fix?  The procedure vcs-file? or fixed-output computation?

There are two things are:

  1. ‘vcs-file?’, used by ‘guix hash -rx’;

  2. ‘git-fetch’, which does (delete-file-recursively ".git").

Clearly #2 is correct (it’s perfectly fine to have a ‘.svn’ directory in
a Git repo), whereas #1 is an approximation that, in corner cases like
this one, gives the wrong answer.

My take is that it’s OK to keep ‘vcs-file?’ as is: the best we could do
would be to add complicated heuristics in the hope corner cases like
this one would be better dealt with, but it wouldn’t be bullet-proof
anyway.

WDYT?

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#65979; Package guix. (Tue, 26 Sep 2023 16:24:02 GMT) Full text and rfc822 format available.

Message #17 received at 65979 <at> debbugs.gnu.org (full text, mbox):

From: Simon Tournier <zimon.toutoune <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 65979 <at> debbugs.gnu.org
Subject: Re: bug#65979: incorrect “guix hash” for FastQC
Date: Tue, 26 Sep 2023 18:23:00 +0200
Hi,

On Tue, 26 Sept 2023 at 15:34, Ludovic Courtès <ludo <at> gnu.org> wrote:

> There are two things are:
>
>   1. ‘vcs-file?’, used by ‘guix hash -rx’;
>
>   2. ‘git-fetch’, which does (delete-file-recursively ".git").

Yes.

> Clearly #2 is correct (it’s perfectly fine to have a ‘.svn’ directory in
> a Git repo), whereas #1 is an approximation that, in corner cases like
> this one, gives the wrong answer.

These corner cases matter for some SWH loader implementing Nar hashes
in Python.  Since they load sources.json (conversion of hash checksum
from package recipe), read the Nar hash and compare it with the one
they internal compute, then they need to implement in Python the
correct behavior.

See https://gitlab.softwareheritage.org/swh/devel/swh-loader-git/-/issues/4751#note_149180
and all the thread for context.


> My take is that it’s OK to keep ‘vcs-file?’ as is: the best we could do
> would be to add complicated heuristics in the hope corner cases like
> this one would be better dealt with, but it wouldn’t be bullet-proof
> anyway.

Well, the question is what other VCS as 'svn-fetch', etc. are doing?

Maybe, we can just have a special case for Git repository.

Somehow, since the problem needs to be solved for SWH, the same
solution applies for vcs-file? and "guix hash", no?

Cheers,
simon




Information forwarded to bug-guix <at> gnu.org:
bug#65979; Package guix. (Sun, 01 Oct 2023 15:08:02 GMT) Full text and rfc822 format available.

Message #20 received at 65979 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Simon Tournier <zimon.toutoune <at> gmail.com>
Cc: 65979 <at> debbugs.gnu.org
Subject: Re: bug#65979: incorrect “guix hash” for FastQC
Date: Sun, 01 Oct 2023 17:07:01 +0200
Hi,

Simon Tournier <zimon.toutoune <at> gmail.com> skribis:

[...]

>> Clearly #2 is correct (it’s perfectly fine to have a ‘.svn’ directory in
>> a Git repo), whereas #1 is an approximation that, in corner cases like
>> this one, gives the wrong answer.
>
> These corner cases matter for some SWH loader implementing Nar hashes
> in Python.  Since they load sources.json (conversion of hash checksum
> from package recipe), read the Nar hash and compare it with the one
> they internal compute, then they need to implement in Python the
> correct behavior.

I agree, SWH needs something more robust than ‘vcs-file?’.

>> My take is that it’s OK to keep ‘vcs-file?’ as is: the best we could do
>> would be to add complicated heuristics in the hope corner cases like
>> this one would be better dealt with, but it wouldn’t be bullet-proof
>> anyway.
>
> Well, the question is what other VCS as 'svn-fetch', etc. are doing?

Remove the relevant metadata directory/directories.

> Maybe, we can just have a special case for Git repository.
>
> Somehow, since the problem needs to be solved for SWH, the same
> solution applies for vcs-file? and "guix hash", no?

Yes, ‘guix hash’ uses ‘vcs-file?’.

Now, SWH does not use ‘guix hash’ (I suppose) so they’ll have to address
that on their side (in Python code I guess).

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#65979; Package guix. (Sun, 01 Oct 2023 15:26:01 GMT) Full text and rfc822 format available.

Message #23 received at 65979 <at> debbugs.gnu.org (full text, mbox):

From: Simon Tournier <zimon.toutoune <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 65979 <at> debbugs.gnu.org
Subject: Re: bug#65979: incorrect “guix hash” for FastQC
Date: Sun, 1 Oct 2023 17:24:47 +0200
Hi,

On Sun, 1 Oct 2023 at 17:07, Ludovic Courtès <ludo <at> gnu.org> wrote:

> >> My take is that it’s OK to keep ‘vcs-file?’ as is: the best we could do
> >> would be to add complicated heuristics in the hope corner cases like
> >> this one would be better dealt with, but it wouldn’t be bullet-proof
> >> anyway.
> >
> > Well, the question is what other VCS as 'svn-fetch', etc. are doing?
>
> Remove the relevant metadata directory/directories.

What does it mean "relevant"? :-)

Where do I have to look to implement the exact same behaviour of what
do the fetch methods?


> Now, SWH does not use ‘guix hash’ (I suppose) so they’ll have to address
> that on their side (in Python code I guess).

Yes for sure.  It does not change the fact that "guix hash" has a bug.

Cheers,
simon




Information forwarded to bug-guix <at> gnu.org:
bug#65979; Package guix. (Tue, 17 Oct 2023 17:58:02 GMT) Full text and rfc822 format available.

Message #26 received at 65979 <at> debbugs.gnu.org (full text, mbox):

From: Simon Tournier <zimon.toutoune <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 65979 <at> debbugs.gnu.org
Subject: Re: bug#65979: incorrect “guix hash” for FastQC
Date: Tue, 17 Oct 2023 19:56:28 +0200
[Message part 1 (text/plain, inline)]
Hi Ludo,

On Tue, 26 Sep 2023 at 15:34, Ludovic Courtès <ludo <at> gnu.org> wrote:

> My take is that it’s OK to keep ‘vcs-file?’ as is: the best we could do
> would be to add complicated heuristics in the hope corner cases like
> this one would be better dealt with, but it wouldn’t be bullet-proof
> anyway.

Hum, I am probably pig-headed here. :-)

I propose the change below.  Well, it does not appear to me a
complicated heuristics but I am biased.  The idea is to detect at the
repository root what is the kind of VCS, just by checking if .git (or
.hg or .svn or etc.) exists or not.  Then, only exclude this kind… and
not the whole list.

And I think this way is almost bullet-proof, no?  Or could you provide
some contrived examples?

WDYT?

Obviously, this fixes the issue:

--8<---------------cut here---------------start------------->8---
$ guix hash -rx /tmp/FastQC
0jyk90kg6s62w3dn6qjx9nrawjs12qx172lii0yxbvsfylhnx479

$ ./pre-inst-env guix hash -rx /tmp/FastQC
00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk

$ grep -A 15 'define-public fastqc' gnu/packages/bioinformatics.scm | grep -C 1 base32
       (sha256
        (base32
         "00y9drm0bkpxw8xfl8ysss18jmnhj8blgqgr6fpa58rkpfcbg8qk"))
--8<---------------cut here---------------end--------------->8---

And I do not observe a regression using a couple of tests.

Cheers,
simon




[0001-scripts-hash-Handle-repository-several-VCS-folders.patch (text/x-diff, inline)]
From 1bf6555e6a89248f079437b842dbaff9c107de9a Mon Sep 17 00:00:00 2001
Message-Id: <1bf6555e6a89248f079437b842dbaff9c107de9a.1697562942.git.zimon.toutoune <at> gmail.com>
From: Simon Tournier <zimon.toutoune <at> gmail.com>
Date: Tue, 17 Oct 2023 18:53:04 +0200
Subject: [PATCH] scripts: hash: Handle repository several VCS folders.

Fixes <https://issues.guix.gnu.org/issue/65979>.
Reported by Simon Tournier <zimon.toutoune <at> gmail.com>

* guix/hash.scm (vcs-file?): Add optional argument for passing VCS kind of the
file/repository.
(file-hash*): Adjust accordingly.
* guix/scripts/hash.scm (guix-hash)[file-hash]: Detect VCS kind of the
file/repository and passes it.
---
 guix/hash.scm         | 18 ++++++++++++------
 guix/scripts/hash.scm | 18 +++++++++++++-----
 2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/guix/hash.scm b/guix/hash.scm
index 3cb68e5c44..8fff51e8f1 100644
--- a/guix/hash.scm
+++ b/guix/hash.scm
@@ -1,6 +1,7 @@
 ;;; GNU Guix --- Functional package management for GNU
 ;;; Copyright © 2021 Sarah Morgensen <iskarian <at> mgsn.dev>
 ;;; Copyright © 2022 Maxime Devos <maximedevos <at> telenet.be>
+;;; Copyright © 2023 Simon Tournier <zimon.toutoune <at> gmail.com>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -25,21 +26,26 @@ (define-module (guix hash)
   #:export (vcs-file?
             file-hash*))
 
-(define (vcs-file? file stat)
-  "Returns true if FILE is a version control system file."
+(define* (vcs-file? file stat
+                    #:optional
+                    (vcses (list ".bzr" ".git" ".hg" ".svn" "CVS")))
+  "Returns true if FILE matches a version control system from the list VCSES."
   (case (stat:type stat)
     ((directory)
-     (member (basename file) '(".bzr" ".git" ".hg" ".svn" "CVS")))
+     (member (basename file) vcses))
     ((regular)
-     ;; Git sub-modules have a '.git' file that is a regular text file.
-     (string=? (basename file) ".git"))
+     (if (member ".git" vcses)
+         ;; Git sub-modules have a '.git' file that is a regular text file.
+         (string=? (basename file) ".git")
+         #f))
     (else
      #f)))
 
 (define* (file-hash* file #:key
                      (algorithm (hash-algorithm sha256))
                      (recursive? 'auto)
-                     (select? (negate vcs-file?)))
+                     (select? (negate (lambda (file stat)
+                                        (vcs-file? file stat)))))
   "Compute the hash of FILE with ALGORITHM.
 
 Symbolic links are only dereferenced if RECURSIVE? is false.
diff --git a/guix/scripts/hash.scm b/guix/scripts/hash.scm
index 7197d3965c..ed96e6a7e1 100644
--- a/guix/scripts/hash.scm
+++ b/guix/scripts/hash.scm
@@ -3,7 +3,7 @@
 ;;; Copyright © 2013 Nikita Karetnikov <nikita <at> karetnikov.org>
 ;;; Copyright © 2016 Jan Nieuwenhuizen <janneke <at> gnu.org>
 ;;; Copyright © 2018 Tim Gesthuizen <tim.gesthuizen <at> yahoo.de>
-;;; Copyright © 2021 Simon Tournier <zimon.toutoune <at> gmail.com>
+;;; Copyright © 2021, 2023 Simon Tournier <zimon.toutoune <at> gmail.com>
 ;;; Copyright © 2021 Sarah Morgensen <iskarian <at> mgsn.dev>
 ;;;
 ;;; This file is part of GNU Guix.
@@ -181,9 +181,6 @@ (define-command (guix-hash . args)
                             (_ #f))
                            (reverse opts)))
          (fmt  (assq-ref opts 'format))
-         (select? (if (assq-ref opts 'exclude-vcs?)
-                      (negate vcs-file?)
-                      (const #t)))
          (algorithm (assoc-ref opts 'hash-algorithm))
          (serializer (assoc-ref opts 'serializer)))
 
@@ -193,7 +190,18 @@ (define-command (guix-hash . args)
       (catch 'system-error
         (lambda _
           (with-error-handling
-            (serializer file algorithm select?)))
+            (let* ((vcses (fold (lambda (vcs result)
+                                  (if (file-exists? (string-append file "/" vcs))
+                                      (cons vcs result)
+                                      result))
+                                '()
+                                (list ".bzr" ".git" ".hg" ".svn" "CVS")))
+                   (select? (if (assq-ref opts 'exclude-vcs?)
+                                (negate (lambda (file stat)
+                                          (vcs-file? file stat
+                                                     vcses)))
+                                (const #t))))
+              (serializer file algorithm select?))))
         (lambda args
           (leave (G_ "~a ~a~%")
                  file

base-commit: dcc5c34504c94732c135a85fb4db40ca9796270e
-- 
2.38.1


This bug report was last modified 199 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.