GNU bug report logs - #42019
[PATCH 0/1] sources.json compliant with SWH loader

Previous Next

Package: guix-patches;

Reported by: zimoun <zimon.toutoune <at> gmail.com>

Date: Tue, 23 Jun 2020 15:14:01 UTC

Severity: normal

Tags: patch

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 42019 in the body.
You can then email your comments to 42019 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to guix-patches <at> gnu.org:
bug#42019; Package guix-patches. (Tue, 23 Jun 2020 15:14:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to zimoun <zimon.toutoune <at> gmail.com>:
New bug report received and forwarded. Copy sent to guix-patches <at> gnu.org. (Tue, 23 Jun 2020 15:14:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: guix-patches <at> gnu.org
Cc: zimoun <zimon.toutoune <at> gmail.com>
Subject: [PATCH 0/1] sources.json compliant with SWH loader
Date: Tue, 23 Jun 2020 17:13:23 +0200
Dear,

This patch adds the "integrity" field.  It is SRI format i.e., 'origin-hash'
is converted to 'base64'.

The "revision" field is the Guix commit.  It should be used by SWH; for example
SWH could fetch several sources.json.

Currently, the SWH loader does only support the formats [1]
".tar.gz$|.zip$|tar.bz2$|.tbz$|.tar.xz$|.tgz$|.tar$" and their advice is
to filter out any other files (e.g., Gem).  For now, there is no filter and
it could be added then if it is really an issue for them.


1: https://forge.softwareheritage.org/T1352#45459

All the best,
simon


zimoun (1):
  website: Add integrity to JSON sources.

 website/apps/packages/builder.scm | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)


base-commit: 36fdde5b3efad445291588a5bc17a11802eb7ff8
-- 
2.26.2





Information forwarded to guix-patches <at> gnu.org:
bug#42019; Package guix-patches. (Tue, 23 Jun 2020 15:22:01 GMT) Full text and rfc822 format available.

Message #8 received at 42019 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: 42019 <at> debbugs.gnu.org
Cc: zimoun <zimon.toutoune <at> gmail.com>
Subject: [PATCH 1/1] website: Add integrity to JSON sources.
Date: Tue, 23 Jun 2020 17:21:39 +0200
* website/apps/packages/builder.scm (origin->json): Add integrity field using
SRI format.
---
 website/apps/packages/builder.scm | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/website/apps/packages/builder.scm b/website/apps/packages/builder.scm
index d2bccd7..e20d672 100644
--- a/website/apps/packages/builder.scm
+++ b/website/apps/packages/builder.scm
@@ -46,6 +46,8 @@
   #:use-module (guix hg-download)
   #:use-module (guix utils)                       ;location
   #:use-module ((guix build download) #:select (maybe-expand-mirrors))
+  #:use-module ((guix base64) #:select (base64-encode))
+  #:use-module ((guix config) #:select (%guix-version))
   #:use-module (json)
   #:use-module (ice-9 match)
   #:use-module ((web uri) #:select (string->uri uri->string))
@@ -114,7 +116,7 @@
     ,@(cond ((or (eq? url-fetch method)
                  (eq? url-fetch/tarbomb method)
                  (eq? url-fetch/zipbomb method))
-             `(("url" . ,(list->vector
+             `(("urls" . ,(list->vector
                           (resolve
                            (match uri
                              ((? string? url) (list url))
@@ -128,6 +130,16 @@
             ((eq? hg-fetch method)
              `(("hg_url" . ,(hg-reference-url uri))))
             (else '()))
+    ,@(if (or (eq? url-fetch method)
+              (eq? url-fetch/tarbomb method)
+              (eq? url-fetch/zipbomb method))
+          (let* ((content-hash (origin-hash origin))
+                 (hash-value (content-hash-value content-hash))
+                 (hash-algorithm (content-hash-algorithm content-hash))
+                 (algorithm-string (symbol->string hash-algorithm)))
+            `(("integrity" . ,(string-append algorithm-string "-"
+                                             (base64-encode hash-value)))))
+          '())
     ,@(if (eq? method git-fetch)
           `(("git_ref" . ,(git-reference-commit uri)))
           '())
@@ -174,9 +186,11 @@
              scm->json))
 
 (define (sources-json-builder)
-  "Return a JSON page listing all the sources.
-
-See <https://forge.softwareheritage.org/D2025#51269>."
+  "Return a JSON page listing all the sources."
+  ;; The Software Heritage format is described here:
+  ;; https://forge.softwareheritage.org/source/swh-loader-core/browse/master/swh/loader/package/nixguix/tests/data/https_nix-community.github.io/nixpkgs-swh_sources.json
+  ;; And the loader is implemented here:
+  ;; https://forge.softwareheritage.org/source/swh-loader-core/browse/master/swh/loader/package/nixguix/
   (define (package->json package)
     `(,@(if (origin? (package-source package))
             (origin->json (package-source package))
@@ -185,7 +199,8 @@ See <https://forge.softwareheritage.org/D2025#51269>."
 
   (make-page "sources.json"
              `(("sources" . ,(list->vector (map package->json (all-packages))))
-               ("version" . "1"))
+               ("version" . "1")
+               ("revision" . ,%guix-version))
              scm->json))
 
 (define (index-builder)
-- 
2.26.2





Information forwarded to guix-patches <at> gnu.org:
bug#42019; Package guix-patches. (Sat, 27 Jun 2020 17:06:02 GMT) Full text and rfc822 format available.

Message #11 received at 42019 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: 42019 <at> debbugs.gnu.org
Subject: Re: [bug#42019] [PATCH 1/1] website: Add integrity to JSON sources.
Date: Sat, 27 Jun 2020 19:05:50 +0200
Hi!

zimoun <zimon.toutoune <at> gmail.com> skribis:

> * website/apps/packages/builder.scm (origin->json): Add integrity field using
> SRI format.

[...]

> -             `(("url" . ,(list->vector
> +             `(("urls" . ,(list->vector
>                            (resolve
>                             (match uri
>                               ((? string? url) (list url))

Is this change OK for Repology?  Or should we keep “url” in addition to
“urls”?

>    (make-page "sources.json"
>               `(("sources" . ,(list->vector (map package->json (all-packages))))
> -               ("version" . "1"))
> +               ("version" . "1")
> +               ("revision" . ,%guix-version))

There’s no guarantee that ‘%guix-version’ is a commit ID, so perhaps we
should do something like:

  (match (current-profile)
    (#f %guix-version)   ;for lack of a better ID
    (profile
     (let ((channel (find guix-channel? (profile-channels profile))))
       (channel-commit channel))))

Otherwise LGTM, thank you!

Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#42019; Package guix-patches. (Sat, 27 Jun 2020 17:43:02 GMT) Full text and rfc822 format available.

Message #14 received at 42019 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 42019 <at> debbugs.gnu.org
Subject: Re: [bug#42019] [PATCH 1/1] website: Add integrity to JSON sources.
Date: Sat, 27 Jun 2020 19:41:59 +0200
Hi Ludo,

Thank you for the review.

On Sat, 27 Jun 2020 at 19:05, Ludovic Courtès <ludo <at> gnu.org> wrote:

>> -             `(("url" . ,(list->vector
>> +             `(("urls" . ,(list->vector
>>                            (resolve
>>                             (match uri
>>                               ((? string? url) (list url))
>
> Is this change OK for Repology?  Or should we keep “url” in addition to
> “urls”?

From what I understood of their API [1] when I checked it, I may say yes. :-)
Well, I do not think that repology parses the field 'sources'.

1: https://repology.org/addrepo


> There’s no guarantee that ‘%guix-version’ is a commit ID, so perhaps we
> should do something like:

Thanks for the tip, I did not know.  I will sent a v2 with your
suggestion or feel free to update the patch and push it. :-)


Cheers,
simon




Information forwarded to guix-patches <at> gnu.org:
bug#42019; Package guix-patches. (Mon, 29 Jun 2020 16:52:01 GMT) Full text and rfc822 format available.

Message #17 received at 42019 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: 42019 <at> debbugs.gnu.org
Cc: zimoun <zimon.toutoune <at> gmail.com>
Subject: [PATCH v2] website: Add integrity to JSON sources.
Date: Mon, 29 Jun 2020 18:50:57 +0200
* website/apps/packages/builder.scm (origin->json): Add integrity field using
SRI format.
---
 website/apps/packages/builder.scm | 31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/website/apps/packages/builder.scm b/website/apps/packages/builder.scm
index d2bccd7..fa488a5 100644
--- a/website/apps/packages/builder.scm
+++ b/website/apps/packages/builder.scm
@@ -46,6 +46,9 @@
   #:use-module (guix hg-download)
   #:use-module (guix utils)                       ;location
   #:use-module ((guix build download) #:select (maybe-expand-mirrors))
+  #:use-module ((guix base64) #:select (base64-encode))
+  #:use-module ((guix describe) #:select (current-profile))
+  #:use-module ((guix config) #:select (%guix-version))
   #:use-module (json)
   #:use-module (ice-9 match)
   #:use-module ((web uri) #:select (string->uri uri->string))
@@ -114,7 +117,7 @@
     ,@(cond ((or (eq? url-fetch method)
                  (eq? url-fetch/tarbomb method)
                  (eq? url-fetch/zipbomb method))
-             `(("url" . ,(list->vector
+             `(("urls" . ,(list->vector
                           (resolve
                            (match uri
                              ((? string? url) (list url))
@@ -128,6 +131,16 @@
             ((eq? hg-fetch method)
              `(("hg_url" . ,(hg-reference-url uri))))
             (else '()))
+    ,@(if (or (eq? url-fetch method)
+              (eq? url-fetch/tarbomb method)
+              (eq? url-fetch/zipbomb method))
+          (let* ((content-hash (origin-hash origin))
+                 (hash-value (content-hash-value content-hash))
+                 (hash-algorithm (content-hash-algorithm content-hash))
+                 (algorithm-string (symbol->string hash-algorithm)))
+            `(("integrity" . ,(string-append algorithm-string "-"
+                                             (base64-encode hash-value)))))
+          '())
     ,@(if (eq? method git-fetch)
           `(("git_ref" . ,(git-reference-commit uri)))
           '())
@@ -174,9 +187,11 @@
              scm->json))
 
 (define (sources-json-builder)
-  "Return a JSON page listing all the sources.
-
-See <https://forge.softwareheritage.org/D2025#51269>."
+  "Return a JSON page listing all the sources."
+  ;; The Software Heritage format is described here:
+  ;; https://forge.softwareheritage.org/source/swh-loader-core/browse/master/swh/loader/package/nixguix/tests/data/https_nix-community.github.io/nixpkgs-swh_sources.json
+  ;; And the loader is implemented here:
+  ;; https://forge.softwareheritage.org/source/swh-loader-core/browse/master/swh/loader/package/nixguix/
   (define (package->json package)
     `(,@(if (origin? (package-source package))
             (origin->json (package-source package))
@@ -185,7 +200,13 @@ See <https://forge.softwareheritage.org/D2025#51269>."
 
   (make-page "sources.json"
              `(("sources" . ,(list->vector (map package->json (all-packages))))
-               ("version" . "1"))
+               ("version" . "1")
+               ("revision" .
+                ,(match (current-profile)
+                   (#f %guix-version)   ;for lack of a better ID
+                   (profile
+                    (let ((channel (find guix-channel? (profile-channels profile))))
+                      (channel-commit channel))))))
              scm->json))
 
 (define (index-builder)
-- 
2.26.2





Information forwarded to guix-patches <at> gnu.org:
bug#42019; Package guix-patches. (Mon, 29 Jun 2020 17:02:02 GMT) Full text and rfc822 format available.

Message #20 received at 42019 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 42019 <at> debbugs.gnu.org
Subject: Re: [bug#42019] [PATCH 1/1] website: Add integrity to JSON sources.
Date: Mon, 29 Jun 2020 19:01:04 +0200
Hi Ludo,

On Sat, 27 Jun 2020 at 19:42, zimoun <zimon.toutoune <at> gmail.com> wrote:

> Thanks for the tip, I did not know.  I will sent a v2 with your
> suggestion or feel free to update the patch and push it. :-)

v2 is sent.

BTW, in the SWH picture and after a chat video with lewo, I do not
think that the website is the right place.  Instead, it should go to
ci.guix.gnu.org or data.guix.gnu.org.  Or maybe integrated with "guix
publish".  Well, the next step is to have a collection of sources.json
-- that the point of "revision".  WDYT?

Cheers,
simon




Information forwarded to guix-patches <at> gnu.org:
bug#42019; Package guix-patches. (Mon, 29 Jun 2020 20:43:02 GMT) Full text and rfc822 format available.

Message #23 received at 42019 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: 42019 <at> debbugs.gnu.org, Christopher Baines <mail <at> cbaines.net>
Subject: Re: [bug#42019] [PATCH 1/1] website: Add integrity to JSON sources.
Date: Mon, 29 Jun 2020 22:41:43 +0200
Hi,

zimoun <zimon.toutoune <at> gmail.com> skribis:

> BTW, in the SWH picture and after a chat video with lewo, I do not
> think that the website is the right place.  Instead, it should go to
> ci.guix.gnu.org or data.guix.gnu.org.  Or maybe integrated with "guix
> publish".  Well, the next step is to have a collection of sources.json
> -- that the point of "revision".  WDYT?

The Guix Data Service would be a natural place for ‘sources.json’ IMO.
Thoughts, Chris?

Thanks,
Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#42019; Package guix-patches. (Mon, 29 Jun 2020 23:29:01 GMT) Full text and rfc822 format available.

Message #26 received at 42019 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 42019 <at> debbugs.gnu.org, Christopher Baines <mail <at> cbaines.net>
Subject: Re: [bug#42019] [PATCH 1/1] website: Add integrity to JSON sources.
Date: Tue, 30 Jun 2020 01:28:45 +0200
Hi Chris,

On Mon, 29 Jun 2020 at 22:41, Ludovic Courtès <ludo <at> gnu.org> wrote:
> zimoun <zimon.toutoune <at> gmail.com> skribis:
>
>> BTW, in the SWH picture and after a chat video with lewo, I do not
>> think that the website is the right place.  Instead, it should go to
>> ci.guix.gnu.org or data.guix.gnu.org.  Or maybe integrated with "guix
>> publish".  Well, the next step is to have a collection of sources.json
>> -- that the point of "revision".  WDYT?
>
> The Guix Data Service would be a natural place for ‘sources.json’ IMO.
> Thoughts, Chris?

If it goes to the GDS, then first let point me where to start. :-)

And second, it could be nice in the "near" future to have at least 2
sources.json: one for the last commit refreshed every X minutes (or
hours) and another one containing the concatenation of all the sources
of Guix (at least the one reachable by guix time-machine i.e. after the
big overhaul of Inferiors).  I will go on #swh-devel or reach lewo to
know how "near" it is on SWH side.


Cheers,
simon




Information forwarded to guix-patches <at> gnu.org:
bug#42019; Package guix-patches. (Wed, 01 Jul 2020 19:36:01 GMT) Full text and rfc822 format available.

Message #29 received at 42019 <at> debbugs.gnu.org (full text, mbox):

From: Christopher Baines <mail <at> cbaines.net>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: Ludovic Courtès <ludo <at> gnu.org>, 42019 <at> debbugs.gnu.org
Subject: Re: [bug#42019] [PATCH 1/1] website: Add integrity to JSON sources.
Date: Wed, 01 Jul 2020 20:35:24 +0100
[Message part 1 (text/plain, inline)]
zimoun <zimon.toutoune <at> gmail.com> writes:

> Hi Chris,
>
> On Mon, 29 Jun 2020 at 22:41, Ludovic Courtès <ludo <at> gnu.org> wrote:
>> zimoun <zimon.toutoune <at> gmail.com> skribis:
>>
>>> BTW, in the SWH picture and after a chat video with lewo, I do not
>>> think that the website is the right place.  Instead, it should go to
>>> ci.guix.gnu.org or data.guix.gnu.org.  Or maybe integrated with "guix
>>> publish".  Well, the next step is to have a collection of sources.json
>>> -- that the point of "revision".  WDYT?
>>
>> The Guix Data Service would be a natural place for ‘sources.json’ IMO.
>> Thoughts, Chris?
>
> If it goes to the GDS, then first let point me where to start. :-)

I think this does sound like a good use of the Guix Data
Service. Unfortunately, the sources of packages aren't currently stored
in the Guix Data Service database, so I'm guessing this will require
storing some new data, then working out how to present it.

A question maybe for you Simon, what would be the perfect data for this
particular use case? I gather it's something about the (source ...)
field in packages, probably for all the exported (plus maybe
not-exported packages).

> And second, it could be nice in the "near" future to have at least 2
> sources.json: one for the last commit refreshed every X minutes (or
> hours) and another one containing the concatenation of all the sources
> of Guix (at least the one reachable by guix time-machine i.e. after the
> big overhaul of Inferiors).  I will go on #swh-devel or reach lewo to
> know how "near" it is on SWH side.

Once you can get the data for an individual revision in the Guix Data
Service, it should be reasonably easy to just get the data for multiple
revisions, say all for the last week.

Chris
[signature.asc (application/pgp-signature, inline)]

Information forwarded to guix-patches <at> gnu.org:
bug#42019; Package guix-patches. (Wed, 01 Jul 2020 20:30:01 GMT) Full text and rfc822 format available.

Message #32 received at 42019 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Christopher Baines <mail <at> cbaines.net>
Cc: Ludovic Courtès <ludo <at> gnu.org>, 42019 <at> debbugs.gnu.org
Subject: Re: [bug#42019] [PATCH 1/1] website: Add integrity to JSON sources.
Date: Wed, 01 Jul 2020 22:29:00 +0200
Hi Chris,

On Wed, 01 Jul 2020 at 20:35, Christopher Baines <mail <at> cbaines.net> wrote:

> A question maybe for you Simon, what would be the perfect data for this
> particular use case? I gather it's something about the (source ...)
> field in packages, probably for all the exported (plus maybe
> not-exported packages).

Currently the website builds source.json by using 'fold-packages'
(traversing all the modules and returning all the public variables, if I
read correctly) then excluding 'package-superseded' and
'package-replacement'.

Well, maybe an example is simpler than a lot of words.  The resulting
JSON looks like:

--8<---------------cut here---------------start------------->8---
    {
      "type": "url",
      "urls": [
        "https://ftpmirror.gnu.org/gnu/a2ps/a2ps-4.14.tar.gz",
        "ftp://ftp.cs.tu-berlin.de/pub/gnu/a2ps/a2ps-4.14.tar.gz",
        "ftp://ftp.funet.fi/pub/mirrors/ftp.gnu.org/gnu/a2ps/a2ps-4.14.tar.gz",
        "http://ftp.gnu.org/pub/gnu/a2ps/a2ps-4.14.tar.gz"
      ],
      "integrity": "sha256-866NPUVkpBtuKiHyN9LysQT0gQhZHouDSXUAGCo6s6Q="
    },
    {
      "type": "git",
      "git_url": "https://github.com/opencog/agi-bio.git",
      "git_ref": "b5c6f3d99e8cca3798bf0cdf2c32f4bdb8098efb"
    },
--8<---------------cut here---------------end--------------->8---

So basically, the data are: origin-method, origin-uri (implies reference
URLs and {git,hg,svn}-{commit,revision}), origin-hash (implies
content-hash-{value,algorithm}).  Note that the list of mirrors are
necessary too.

I have given a look to

  http://git.savannah.gnu.org/cgit/guix/data-service.git/tree/

but I am not sure to understand where the SQL table is defined.


Thanks,
simon




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Mon, 06 Jul 2020 10:21:01 GMT) Full text and rfc822 format available.

Notification sent to zimoun <zimon.toutoune <at> gmail.com>:
bug acknowledged by developer. (Mon, 06 Jul 2020 10:21:01 GMT) Full text and rfc822 format available.

Message #37 received at 42019-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: 42019-done <at> debbugs.gnu.org
Subject: Re: [bug#42019] [PATCH v2] website: Add integrity to JSON sources.
Date: Mon, 06 Jul 2020 12:20:20 +0200
Hi,

zimoun <zimon.toutoune <at> gmail.com> skribis:

> * website/apps/packages/builder.scm (origin->json): Add integrity field using
> SRI format.

I added missing bits to the commit log and pushed as
35bb77108fc7f2339da0b5be139043a5f3f21493 to guix-artwork.git.

Thanks, and apologies for the delay!

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 03 Aug 2020 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 263 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.