GNU bug report logs - #54723
'guix publish' without '--cache' can send garbled narinfo responses

Previous Next

Package: guix;

Reported by: Guillaume Le Vaillant <glv <at> posteo.net>

Date: Tue, 5 Apr 2022 10:16:02 UTC

Severity: important

Tags: patch

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 54723 in the body.
You can then email your comments to 54723 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to guix-patches <at> gnu.org:
bug#54723; Package guix-patches. (Tue, 05 Apr 2022 10:16:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Guillaume Le Vaillant <glv <at> posteo.net>:
New bug report received and forwarded. Copy sent to guix-patches <at> gnu.org. (Tue, 05 Apr 2022 10:16:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Guillaume Le Vaillant <glv <at> posteo.net>
To: guix-patches <at> gnu.org
Subject: [PATCH] Check URI when verifying narinfo validity.
Date: Tue, 05 Apr 2022 09:58:18 +0000
[Message part 1 (text/plain, inline)]
When trying to upgrade a machine using a substitute server on the same
LAN, I get this crash a lot:

--8<---------------cut here---------------start------------->8---
# guix system reconfigure --substitute-urls="http://192.168.0.22:8080 https://ci.guix.gnu.org" /etc/guix/config.scm
substitute: mise à jour des substituts depuis « http://192.168.0.22:8080 »... 100.0 %
substitute: Backtrace:
substitute: In ice-9/boot-9.scm:
substitute:   1752:10 17 (with-exception-handler _ _ #:unwind? _ # _)
substitute: In unknown file:
substitute:           16 (apply-smob/0 #<thunk 7fe08afb72e0>)
substitute: In ice-9/boot-9.scm:
substitute:     724:2 15 (call-with-prompt _ _ #<procedure default-prompt-handle…>)
substitute: In ice-9/eval.scm:
substitute:     619:8 14 (_ #(#(#<directory (guile-user) 7fe08afbcc80>)))
substitute: In guix/ui.scm:
substitute:    2209:7 13 (run-guix . _)
substitute:   2172:10 12 (run-guix-command _ . _)
substitute: In ice-9/boot-9.scm:
substitute:   1752:10 11 (with-exception-handler _ _ #:unwind? _ # _)
substitute:   1752:10 10 (with-exception-handler _ _ #:unwind? _ # _)
substitute: In guix/scripts/substitute.scm:
substitute:    757:18  9 (_)
substitute:    348:26  8 (process-query #<output: file 4> _ #:cache-urls _ #:acl _)
substitute: In guix/substitutes.scm:
substitute:    369:45  7 (lookup-narinfos/diverse _ _ #<procedure 7fe088c9cbc0 …> …)
substitute: In unknown file:
substitute:            6 (filter #<procedure 7fe088c9cbc0 at guix/scripts/subst…> …)
substitute: In guix/narinfo.scm:
substitute:    215:32  5 (valid-narinfo? _ _ #:verbose? _)
substitute: In ice-9/boot-9.scm:
substitute:   1685:16  4 (raise-exception _ #:continuable? _)
substitute:   1685:16  3 (raise-exception _ #:continuable? _)
substitute:   1780:13  2 (_ #<&compound-exception components: (#<&assertion-fail…>)
substitute:   1685:16  1 (raise-exception _ #:continuable? _)
substitute:   1685:16  0 (raise-exception _ #:continuable? _)
substitute: 
substitute: ice-9/boot-9.scm:1685:16: In procedure raise-exception:
substitute: In procedure car: Wrong type argument in position 1 (expecting pair): ()
guix system: erreur : `/gnu/store/wgygsxcdy1z3pfvwhpgyl5vjp4xvwhhh-guix-1.3.0-23.a27e47f/bin/guix substitute' died unexpectedly
--8<---------------cut here---------------end--------------->8---

It looks like the 'narinfo-uri' field is an empty list instead of a list
of URIs. Is that supposed to be possible?
Does the the attached patch adding a check for the validity of
this field in the 'valid-narinfo?' function make sense?


The substitute server configuration is:

--8<---------------cut here---------------start------------->8---
(service guix-publish-service-type
         (guix-publish-configuration
          (host "0.0.0.0")
          (port 8080)
          (compression '(("zstd" 3)))
          (advertise? #t)))
--8<---------------cut here---------------end--------------->8---
[0001-narinfo-Check-URI-when-verifying-narinfo-validity.patch (text/x-patch, attachment)]
[signature.asc (application/pgp-signature, inline)]

Information forwarded to guix-patches <at> gnu.org:
bug#54723; Package guix-patches. (Tue, 05 Apr 2022 17:09:01 GMT) Full text and rfc822 format available.

Message #8 received at 54723 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Guillaume Le Vaillant <glv <at> posteo.net>
Cc: 54723 <at> debbugs.gnu.org
Subject: Re: bug#54723: [PATCH] Check URI when verifying narinfo validity.
Date: Tue, 05 Apr 2022 19:08:12 +0200
Hi,

Guillaume Le Vaillant <glv <at> posteo.net> skribis:

> When trying to upgrade a machine using a substitute server on the same
> LAN, I get this crash a lot:
>
> # guix system reconfigure --substitute-urls="http://192.168.0.22:8080 https://ci.guix.gnu.org" /etc/guix/config.scm
> substitute: mise à jour des substituts depuis « http://192.168.0.22:8080 »... 100.0 %

[...]

> It looks like the 'narinfo-uri' field is an empty list instead of a list
> of URIs. Is that supposed to be possible?

I don’t think so.  Could you grab a narinfo and share it?

  wget -qO - http://192.168.0.22:8080/HASH.narinfo

where HASH is the hash part of a store item.

What could happen though is a situation where ‘guix publish’ only offers
a compression method not supported by the client.  In that case,
‘narinfo-best-uri’ throws a match-error because ‘choices’ is the empty
list.  We should fix that.

> Does the the attached patch adding a check for the validity of
> this field in the 'valid-narinfo?' function make sense?

Maybe, but I’d like to make sure we understand the issue.

Thanks,
Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#54723; Package guix-patches. (Tue, 05 Apr 2022 18:04:01 GMT) Full text and rfc822 format available.

Message #11 received at 54723 <at> debbugs.gnu.org (full text, mbox):

From: Guillaume Le Vaillant <glv <at> posteo.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 54723 <at> debbugs.gnu.org
Subject: Re: bug#54723: [PATCH] Check URI when verifying narinfo validity.
Date: Tue, 05 Apr 2022 17:51:47 +0000
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> skribis:

> Hi,
>
> [...]
>
> Could you grab a narinfo and share it?
>
>   wget -qO - http://192.168.0.22:8080/HASH.narinfo
>
> where HASH is the hash part of a store item.
>
> What could happen though is a situation where ‘guix publish’ only offers
> a compression method not supported by the client.  In that case,
> ‘narinfo-best-uri’ throws a match-error because ‘choices’ is the empty
> list.  We should fix that.

I tried downloading a few narinfos and they don't look broken (2 of them
in attachment).

However for one of them it looks like the guix-publish server got stuck
on the request for several minutes (the second attempt worked):

--8<---------------cut here---------------start------------->8---
wget http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
--2022-04-05 19:33:56--  http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
Connexion à 192.168.0.22:8080… connecté.
requête HTTP transmise, en attente de la réponse… Erreur de lecture (Connexion ré-initialisée par le correspondant) dans les en-têtes.
Nouvel essai.

--2022-04-05 19:36:40--  (essai :  2)  http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
Connexion à 192.168.0.22:8080… connecté.
requête HTTP transmise, en attente de la réponse… 200 OK
Taille : 7428 (7,3K) [application/x-nix-narinfo]
Sauvegarde en : « 184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo »

184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo        100%[=====================================================================================================>]   7,25K  --.-KB/s    ds 0,02s   

2022-04-05 19:36:40 (391 KB/s) — « 184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo » sauvegardé [7428/7428]
--8<---------------cut here---------------end--------------->8---

Could that be the cause of the issue?
[ly615k3xjapk0wq7y2n4ld6msi7307ba.narinfo (application/octet-stream, attachment)]
[184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo (application/octet-stream, attachment)]

Information forwarded to guix-patches <at> gnu.org:
bug#54723; Package guix-patches. (Sat, 09 Apr 2022 20:34:01 GMT) Full text and rfc822 format available.

Message #14 received at 54723 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Guillaume Le Vaillant <glv <at> posteo.net>
Cc: 54723 <at> debbugs.gnu.org
Subject: Re: bug#54723: [PATCH] Check URI when verifying narinfo validity.
Date: Sat, 09 Apr 2022 22:32:38 +0200
Hi,

Guillaume Le Vaillant <glv <at> posteo.net> skribis:

> However for one of them it looks like the guix-publish server got stuck
> on the request for several minutes (the second attempt worked):
>
> wget http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
> --2022-04-05 19:33:56--  http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
> Connexion à 192.168.0.22:8080… connecté.
> requête HTTP transmise, en attente de la réponse… Erreur de lecture (Connexion ré-initialisée par le correspondant) dans les en-têtes.
> Nouvel essai.
>
> --2022-04-05 19:36:40--  (essai :  2)  http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
> Connexion à 192.168.0.22:8080… connecté.
> requête HTTP transmise, en attente de la réponse… 200 OK
> Taille : 7428 (7,3K) [application/x-nix-narinfo]
> Sauvegarde en : « 184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo »
>
> 184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo        100%[=====================================================================================================>]   7,25K  --.-KB/s    ds 0,02s   
>
> 2022-04-05 19:36:40 (391 KB/s) — « 184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo » sauvegardé [7428/7428]
>
> Could that be the cause of the issue?

Yes, it could be the issue.  Is there any clue as to why ‘guix publish’
hung up first?  Something in its log maybe?

Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#54723; Package guix-patches. (Sat, 09 Apr 2022 21:32:01 GMT) Full text and rfc822 format available.

Message #17 received at 54723 <at> debbugs.gnu.org (full text, mbox):

From: Guillaume Le Vaillant <glv <at> posteo.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 54723 <at> debbugs.gnu.org
Subject: Re: bug#54723: [PATCH] Check URI when verifying narinfo validity.
Date: Sat, 09 Apr 2022 21:06:44 +0000
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> skribis:

> Hi,
>
> Guillaume Le Vaillant <glv <at> posteo.net> skribis:
>
>> However for one of them it looks like the guix-publish server got stuck
>> on the request for several minutes (the second attempt worked):
>>
>> wget http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
>> --2022-04-05 19:33:56--  http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
>> Connexion à 192.168.0.22:8080… connecté.
>> requête HTTP transmise, en attente de la réponse… Erreur de lecture (Connexion ré-initialisée par le correspondant) dans les en-têtes.
>> Nouvel essai.
>>
>> --2022-04-05 19:36:40--  (essai :  2)  http://192.168.0.22:8080/184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
>> Connexion à 192.168.0.22:8080… connecté.
>> requête HTTP transmise, en attente de la réponse… 200 OK
>> Taille : 7428 (7,3K) [application/x-nix-narinfo]
>> Sauvegarde en : « 184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo »
>>
>> 184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo
>> 100%[=====================================================================================================>]
>> 7,25K --.-KB/s ds 0,02s
>>
>> 2022-04-05 19:36:40 (391 KB/s) — « 184b50qkcrkchc6dpwwcd7ypqb5yvrm3.narinfo » sauvegardé [7428/7428]
>>
>> Could that be the cause of the issue?
>
> Yes, it could be the issue.  Is there any clue as to why ‘guix publish’
> hung up first?  Something in its log maybe?
>
> Ludo’.

There are 2 errors that occur a lot in the guix-publish log files:

 - "In procedure fport_write: Broken pipe"
    It happens when trying to write to a socket apparently.

 - "In procedure sign: gcrypt: Cannot allocate memory"
    The machine has 64 GiB of RAM, of which at least 50 GiB is free, so
    gcrypt should have enough to make a signature...

Note: full log file with complete error messages in attachment.
[guix-publish.log.gz (application/octet-stream, attachment)]
[signature.asc (application/pgp-signature, inline)]

Information forwarded to guix-patches <at> gnu.org:
bug#54723; Package guix-patches. (Mon, 11 Apr 2022 14:12:02 GMT) Full text and rfc822 format available.

Message #20 received at 54723 <at> debbugs.gnu.org (full text, mbox):

From: Guillaume Le Vaillant <glv <at> posteo.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 54723 <at> debbugs.gnu.org
Subject: Re: bug#54723: [PATCH] Check URI when verifying narinfo validity.
Date: Mon, 11 Apr 2022 13:31:31 +0000
[Message part 1 (text/plain, inline)]
Guillaume Le Vaillant <glv <at> posteo.net> skribis:

> There are 2 errors that occur a lot in the guix-publish log files:
>
>  - "In procedure fport_write: Broken pipe"
>     It happens when trying to write to a socket apparently.
>
>  - "In procedure sign: gcrypt: Cannot allocate memory"
>     The machine has 64 GiB of RAM, of which at least 50 GiB is free, so
>     gcrypt should have enough to make a signature...

I captured the network traffic between the "guix publish" server and
a "guix upgrade" client to see if the "broken pipe" errors could
come from real networking issues.
According to wireshark the problem doesn't come from there, the TCP
stream didn't have any error.

However, looking at the full TCP stream in wireshark I saw that the
"guix publish" server sends some bad narinfo responses.
Sometimes some parts of the response are missing (here, Signature
incomplete, URL and Compression fields missing):
--8<---------------cut here---------------start------------->8---
HTTP/1.1 200 OK
Content-Length: 959
Content-Type: application/x-nix-narinfo;charset=UTF-8

StorePath: /gnu/store/dxpaqmix7zixm8pwcvvmq8q969q50jpp-pngload-2.0.0-2.91f1d70-checkout
NarHash: sha256:0s94fdbrbqj12qvgyn2g4lfwvz7qhhzbclrpz5ni7adwxgrmvxl1
NarSize: 245224
References: 
Deriver: ybdimrfjs090kzmimf5j1x5hs8y4d24p-pngload-2.0.0-2.91f1d70-checkout.drv
Signature: 1;kitej;KHNpZ25hdHVyZSAKIChkYXRhIAogIChmbGFncyByZmM2OTc5KQogIChoYXNoIHNoYTI1NiAjNDY3NDk2RTJEOTZBMzc0QzFGN0M1MzJCNjc3MTM1NzVFOTkyRjQ0Qzc3MzQwRDUwQTcyRTkyMDJGRURDQkQxMyMpCiAgKQogKHNpZy12YWwgCiAgKGVjZHNhIAogICAociAjMDZEQTAwMkQyNjE3MEQ3ODVDNkM3NkMyMUEwM0UzNDlCMkUwMDc4MTUyQzFBQURFNjhFMEZGOUJDRkUyMUFDNSMpCiAgIChzICMwNjNDM0UyNjg2MEU2OTIzNDdEMjNGNTQ4RUM3RDJGRUZGQjc0Q0I4NjNEMjlDMUE3QjA4REFCQjEzQjZDRjAxIykKICAgKQogICkKIChwdWJsaWMta2V5IAogIC
--8<---------------cut here---------------end--------------->8---

Sometimes the response looks like almost complete garbage:
--8<---------------cut here---------------start------------->8---
HTTP/1.1 200 OK
Content-Length: 970
Content-Type: application/x-nix-narinfo;charsetcharsetHTTP/=UTF-8

1
1

1
.S
--8<---------------cut here---------------end--------------->8---

When the client receives these bad narinfos, it often makes it crash
with errors like:
 - Wrong type (expecting exact integer): #f
 - unmatched line "1\r"
 - Wrong type argument in position 1 (expecting pair): ()

So it looks like the broken pipe problem comes from the "guix publish"
server, or from Guile... And making the code reconstructing narinfos
from HTTP responses more robust in case of bad input would be useful.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to guix-patches <at> gnu.org:
bug#54723; Package guix-patches. (Tue, 12 Apr 2022 07:48:01 GMT) Full text and rfc822 format available.

Message #23 received at 54723 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Guillaume Le Vaillant <glv <at> posteo.net>
Cc: 54723 <at> debbugs.gnu.org
Subject: Re: bug#54723: [PATCH] Check URI when verifying narinfo validity.
Date: Tue, 12 Apr 2022 09:47:12 +0200
Hi,

Guillaume Le Vaillant <glv <at> posteo.net> skribis:

> However, looking at the full TCP stream in wireshark I saw that the
> "guix publish" server sends some bad narinfo responses.
> Sometimes some parts of the response are missing (here, Signature
> incomplete, URL and Compression fields missing):
>
> HTTP/1.1 200 OK
> Content-Length: 959
> Content-Type: application/x-nix-narinfo;charset=UTF-8
>
> StorePath: /gnu/store/dxpaqmix7zixm8pwcvvmq8q969q50jpp-pngload-2.0.0-2.91f1d70-checkout
> NarHash: sha256:0s94fdbrbqj12qvgyn2g4lfwvz7qhhzbclrpz5ni7adwxgrmvxl1
> NarSize: 245224
> References: 
> Deriver: ybdimrfjs090kzmimf5j1x5hs8y4d24p-pngload-2.0.0-2.91f1d70-checkout.drv
> Signature: 1;kitej;KHNpZ25hdHVyZSAKIChkYXRhIAogIChmbGFncyByZmM2OTc5KQogIChoYXNoIHNoYTI1NiAjNDY3NDk2RTJEOTZBMzc0QzFGN0M1MzJCNjc3MTM1NzVFOTkyRjQ0Qzc3MzQwRDUwQTcyRTkyMDJGRURDQkQxMyMpCiAgKQogKHNpZy12YWwgCiAgKGVjZHNhIAogICAociAjMDZEQTAwMkQyNjE3MEQ3ODVDNkM3NkMyMUEwM0UzNDlCMkUwMDc4MTUyQzFBQURFNjhFMEZGOUJDRkUyMUFDNSMpCiAgIChzICMwNjNDM0UyNjg2MEU2OTIzNDdEMjNGNTQ4RUM3RDJGRUZGQjc0Q0I4NjNEMjlDMUE3QjA4REFCQjEzQjZDRjAxIykKICAgKQogICkKIChwdWJsaWMta2V5IAogIC
>
>
> Sometimes the response looks like almost complete garbage:
>
> HTTP/1.1 200 OK
> Content-Length: 970
> Content-Type: application/x-nix-narinfo;charsetcharsetHTTP/=UTF-8
>
> 1
> 1
>
> 1
> .S
>
> When the client receives these bad narinfos, it often makes it crash
> with errors like:
>  - Wrong type (expecting exact integer): #f
>  - unmatched line "1\r"
>  - Wrong type argument in position 1 (expecting pair): ()

Woow.  How do you build and run ‘guix publish’?  Is it a distro package
or is it coming straight from Guix?  What command-line options are you
passing?

I’ve never seen this, although we have it running on several servers,
notably ci.guix.  I wonder what could cause this.

Thanks,
Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#54723; Package guix-patches. (Tue, 12 Apr 2022 09:27:01 GMT) Full text and rfc822 format available.

Message #26 received at 54723 <at> debbugs.gnu.org (full text, mbox):

From: Guillaume Le Vaillant <glv <at> posteo.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Simon Streit <simon <at> netpanic.org>, 54723 <at> debbugs.gnu.org,
 Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Subject: Re: bug#54723: [PATCH] Check URI when verifying narinfo validity.
Date: Tue, 12 Apr 2022 08:54:00 +0000
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> skribis:

> Woow.  How do you build and run ‘guix publish’?  Is it a distro package
> or is it coming straight from Guix?  What command-line options are you
> passing?
>
> I’ve never seen this, although we have it running on several servers,
> notably ci.guix.  I wonder what could cause this.
>
> Thanks,
> Ludo’.

I'm using guix-publish-service-type in the operating-system definition
to manage the "guix publish" server, using the on-the-fly mode and
fast Zstandard compression:

--8<---------------cut here---------------start------------->8---
(service guix-publish-service-type
         (guix-publish-configuration
          (host "0.0.0.0")
          (port 8080)
          (compression '(("zstd" 3)))
          (advertise? #t)))
--8<---------------cut here---------------end--------------->8---

When booting the machine, shepherd starts the server with the following
command-line options:

--8<---------------cut here---------------start------------->8---
/gnu/store/059svbd32i4s0l9s5i7z0krcnl666bjy-guix-1.3.0-24.2fb4304/libexec/guix/guile \ /gnu/store/059svbd32i4s0l9s5i7z0krcnl666bjy-guix-1.3.0-24.2fb4304/bin/guix publish -u guix-publish -p 8080 -C zstd:3 --nar-path=nar --listen=0.0.0.0 --advertise
--8<---------------cut here---------------end--------------->8---

There's another report about this at <https://issues.guix.gnu.org/53668>
I had forgotten about, where Simon Streit and Maxim Cournoyer indicate
that they have seen this issue too.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to guix-patches <at> gnu.org:
bug#54723; Package guix-patches. (Thu, 14 Apr 2022 12:31:02 GMT) Full text and rfc822 format available.

Message #29 received at 54723 <at> debbugs.gnu.org (full text, mbox):

From: Guillaume Le Vaillant <glv <at> posteo.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Simon Streit <simon <at> netpanic.org>, 54723 <at> debbugs.gnu.org,
 Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Subject: Re: bug#54723: [PATCH] Check URI when verifying narinfo validity.
Date: Thu, 14 Apr 2022 12:18:51 +0000
[Message part 1 (text/plain, inline)]
Guillaume Le Vaillant <glv <at> posteo.net> skribis:

> Ludovic Courtès <ludo <at> gnu.org> skribis:
>
>> Woow.  How do you build and run ‘guix publish’?  Is it a distro package
>> or is it coming straight from Guix?  What command-line options are you
>> passing?
>>
>> I’ve never seen this, although we have it running on several servers,
>> notably ci.guix.  I wonder what could cause this.
>>
>> Thanks,
>> Ludo’.
>
> I'm using guix-publish-service-type in the operating-system definition
> to manage the "guix publish" server, using the on-the-fly mode and
> fast Zstandard compression:
>
> (service guix-publish-service-type
>          (guix-publish-configuration
>           (host "0.0.0.0")
>           (port 8080)
>           (compression '(("zstd" 3)))
>           (advertise? #t)))
>
>
> When booting the machine, shepherd starts the server with the following
> command-line options:
>
> /gnu/store/059svbd32i4s0l9s5i7z0krcnl666bjy-guix-1.3.0-24.2fb4304/libexec/guix/guile \ /gnu/store/059svbd32i4s0l9s5i7z0krcnl666bjy-guix-1.3.0-24.2fb4304/bin/guix publish -u guix-publish -p 8080 -C zstd:3 --nar-path=nar --listen=0.0.0.0 --advertise
>
> There's another report about this at <https://issues.guix.gnu.org/53668>
> I had forgotten about, where Simon Streit and Maxim Cournoyer indicate
> that they have seen this issue too.

I tested a few things, and it looks like there is a concurrency issue in
the 'http-write' function of the publish server.
After removing the 'call-with-new-thread' in the part of the function
sending narinfos, I can't reproduce the "broken pipe" error anymore.

--8<---------------cut here---------------start------------->8---
diff --git a/guix/scripts/publish.scm b/guix/scripts/publish.scm
index 870dfc11e9..e9fad5845a 100644
--- a/guix/scripts/publish.scm
+++ b/guix/scripts/publish.scm
@@ -1002,28 +1002,25 @@ (define compression
            (begin
              (when (keep-alive? response)
                (keep-alive client))
-             (call-with-new-thread
-              (lambda ()
-                (set-thread-name "publish narinfo")
-                (let* ((narinfo
-                        (with-store store
-                          (narinfo-string store (utf8->string body)
-                                          #:nar-path nar-path
-                                          #:compressions compressions)))
-                       (narinfo-bv (string->bytevector narinfo "UTF-8"))
-                       (narinfo-length
-                        (bytevector-length narinfo-bv))
-                       (response (write-response
-                                  (with-content-length response
-                                                       narinfo-length)
-                                  client))
-                       (output (response-port response)))
-                  (configure-socket client)
-                  (put-bytevector output narinfo-bv)
-                  (force-output output)
-                  (unless (keep-alive? response)
-                    (close-port output))
-                  (values)))))
+             (let* ((narinfo
+                     (with-store store
+                       (narinfo-string store (utf8->string body)
+                                       #:nar-path nar-path
+                                       #:compressions compressions)))
+                    (narinfo-bv (string->bytevector narinfo "UTF-8"))
+                    (narinfo-length
+                     (bytevector-length narinfo-bv))
+                    (response (write-response
+                               (with-content-length response
+                                                    narinfo-length)
+                               client))
+                    (output (response-port response)))
+               (configure-socket client)
+               (put-bytevector output narinfo-bv)
+               (force-output output)
+               (unless (keep-alive? response)
+                 (close-port output))
+               (values)))
            (%http-write server client response body))))
     (_
      (match (assoc-ref (response-headers response) 'x-raw-file)
--8<---------------cut here---------------end--------------->8---
[signature.asc (application/pgp-signature, inline)]

Information forwarded to guix-patches <at> gnu.org:
bug#54723; Package guix-patches. (Mon, 18 Apr 2022 19:40:01 GMT) Full text and rfc822 format available.

Message #32 received at 54723 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Guillaume Le Vaillant <glv <at> posteo.net>
Cc: Simon Streit <simon <at> netpanic.org>, 54723 <at> debbugs.gnu.org,
 Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Subject: Re: bug#54723: [PATCH] Check URI when verifying narinfo validity.
Date: Mon, 18 Apr 2022 21:39:14 +0200
Hi!

Guillaume Le Vaillant <glv <at> posteo.net> skribis:

> I tested a few things, and it looks like there is a concurrency issue in
> the 'http-write' function of the publish server.
> After removing the 'call-with-new-thread' in the part of the function
> sending narinfos, I can't reproduce the "broken pipe" error anymore.

Ah ha!  Little has changed in this code, so I think the culprit must be
f743f2046be2c5a338ab871ae8666d8f6de7440b.

Could you try running it from master with this patch reverted?

You can spawn it from the build tree with:

  sudo -E ./pre-inst-env guix publish -u nobody

Bonus points if you have a script to reproduce the issue.  I guess that
involves making pipelined keep-alive narinfo requests the way ‘guix
substitute’ and ‘guix weather’ do.

Thanks,
Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#54723; Package guix-patches. (Wed, 20 Apr 2022 14:31:02 GMT) Full text and rfc822 format available.

Message #35 received at 54723 <at> debbugs.gnu.org (full text, mbox):

From: Guillaume Le Vaillant <glv <at> posteo.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 54723 <at> debbugs.gnu.org
Subject: Re: bug#54723: [PATCH] Check URI when verifying narinfo validity.
Date: Wed, 20 Apr 2022 14:10:43 +0000
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> skribis:

> Hi!
>
> Guillaume Le Vaillant <glv <at> posteo.net> skribis:
>
>> I tested a few things, and it looks like there is a concurrency issue in
>> the 'http-write' function of the publish server.
>> After removing the 'call-with-new-thread' in the part of the function
>> sending narinfos, I can't reproduce the "broken pipe" error anymore.
>
> Ah ha!  Little has changed in this code, so I think the culprit must be
> f743f2046be2c5a338ab871ae8666d8f6de7440b.
>
> Could you try running it from master with this patch reverted?
>
> You can spawn it from the build tree with:
>
>   sudo -E ./pre-inst-env guix publish -u nobody
>
> Bonus points if you have a script to reproduce the issue.  I guess that
> involves making pipelined keep-alive narinfo requests the way ‘guix
> substitute’ and ‘guix weather’ do.
>
> Thanks,
> Ludo’.

Hi,

With master at 101edbe63a887678722bc26cd85a7b7f5637428f, I reproduce the
issue very often when trying to upgrade a profile with around 400
packages. The logs of the publish server show hundreds of narinfo
requests and tens of "broken pipe" errors.

With an additional commit reverting
f743f2046be2c5a338ab871ae8666d8f6de7440b, I can't reproduce the issue
anymore. The logs still show hundreds of narinfo requests, but no
errors.


I tried to make a script making many narinfo requests to reproduce the
issue more easily, but I didn't get it to work yet. I thought accessing
the 'fetch-narinfos' function from "guix/substitutes.scm" with:

--8<---------------cut here---------------start------------->8---
(use-modules (guix)
             (guix substitutes))

(define fetch-narinfos (@@ (guix substitutes) fetch-narinfos))
--8<---------------cut here---------------end--------------->8---

would work, but it doesn't:

--8<---------------cut here---------------start------------->8---
$ guix repl fetch-narinfos.scm
Backtrace:
           4 (primitive-load "/home/guillaume/fetch-narinfos.scm")
In ice-9/eval.scm:
   626:19  3 (_ #<directory (guix-user) 7faab8e0e8c0>)
   223:20  2 (proc #<directory (guix-user) 7faab8e0e8c0>)
In unknown file:
           1 (%resolve-variable (5 (guix substitutes) # . #f) #<dire…>)
In ice-9/boot-9.scm:
  1685:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
erreur : fetch-narinfos : variable non liée
--8<---------------cut here---------------end--------------->8---
[signature.asc (application/pgp-signature, inline)]

Information forwarded to guix-patches <at> gnu.org:
bug#54723; Package guix-patches. (Fri, 29 Apr 2022 16:21:02 GMT) Full text and rfc822 format available.

Message #38 received at 54723 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Guillaume Le Vaillant <glv <at> posteo.net>
Cc: Mathieu Othacehe <othacehe <at> gnu.org>, 54723 <at> debbugs.gnu.org
Subject: Re: bug#54723: [PATCH] Check URI when verifying narinfo validity.
Date: Fri, 29 Apr 2022 18:20:14 +0200
Hi,

(+Cc: Mathieu.)

Guillaume Le Vaillant <glv <at> posteo.net> skribis:

> With master at 101edbe63a887678722bc26cd85a7b7f5637428f, I reproduce the
> issue very often when trying to upgrade a profile with around 400
> packages. The logs of the publish server show hundreds of narinfo
> requests and tens of "broken pipe" errors.
>
> With an additional commit reverting
> f743f2046be2c5a338ab871ae8666d8f6de7440b, I can't reproduce the issue
> anymore. The logs still show hundreds of narinfo requests, but no
> errors.

For the record, a simple way to reproduce it is to start ‘guix publish’:

  sudo -E ./pre-inst-env guix publish -u nobody -p 9999

and to spawn ‘guix weather’, for example with:

  guix weather $(guix package -I. -p ~/.guix-home/profile |cut -f1) \
    --substitute-urls=http://localhost:9999

which crashes with the dreaded backtrace:

--8<---------------cut here---------------start------------->8---
Backtrace:
          11 (primitive-load "/home/ludo/.config/guix/current/bin/guix")
In guix/ui.scm:
   2230:7 10 (run-guix . _)
  2193:10  9 (run-guix-command _ . _)
In ice-9/boot-9.scm:
  1752:10  8 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
In guix/scripts/weather.scm:
    595:9  7 (_)
In guix/build/utils.scm:
   677:23  6 (every* #<procedure 7f4af9621de0 at guix/scripts/weather.scm:595:17 (server)> _)
In guix/scripts/weather.scm:
   597:21  5 (_ "http://localhost:9999")
   120:17  4 (report-server-coverage _ ("/gnu/store/428bzp9325mfapyr4ywzwsz4ic7ssx2b-shepherd-0.9.0" "/gnu/store/sll9nfmqk8lkrraqbkyp3y…" …) …)
In unknown file:
           3 (_ #<procedure 7f4af896ca40 at guix/scripts/weather.scm:201:2 ()> #<procedure list _> . #w())
In guix/substitutes.scm:
   322:31  2 (lookup-narinfos "http://localhost:9999" _ #:open-connection _ #:make-progress-reporter _)
   245:26  1 (fetch-narinfos _ _ #:open-connection _ #:make-progress-reporter _)
In ice-9/boot-9.scm:
  1685:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Wrong type (expecting exact integer): #f
--8<---------------cut here---------------end--------------->8---

I looked more closely at f743f2046be2c5a338ab871ae8666d8f6de7440b and I
think it had a logical flaw: if you pipeline 100 GET requests, then,
with this commit, ‘guix publish’ would spawn 100 threads that would all
reply concurrently (more or less).  This is clearly wrong: replies
should be sent sequentially.

So in commit c1719a0adf3fa7611b56ca4d75b3ac8cf5c9c8ac I went ahead and
reverted it that commit.  I also added a test that reproduces the issue
above.


Now, commit f743f2046be2c5a338ab871ae8666d8f6de7440b was itself an
attempt to fix a bug whereby ‘narinfo-string’ would take too long,
thereby preventing ‘guix publish’ from accepting connections (since it’s
single-threaded).

I think the only reasonable way to fix it is by using Fibers to make
‘guix publish’ concurrent (using a fiberized server like in Cuirass).
We should do that at some point.  That’ll also allow us to remove some
of the hacks we’ve accumulated.

Now, the only way ‘narinfo-string’ can take too long these days is (I
think) if the store GC lock is held (we should check that hypothesis,
but I believe that if the GC lock is held, then the ‘query-path-info’
RPC made from ‘narinfo-string’ might block until the lock is released).
The GC lock is no longer held for hours on berlin, so there’s less
pressure to address that.

To summarize: I think ‘guix publish’ is okay as-is but we should
fiberize it sometime.

Thoughts?

Ludo’.




Changed bug title to ''guix publish' without '--cache' can send garbled narinfo responses' from '[PATCH] Check URI when verifying narinfo validity.' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Fri, 29 Apr 2022 16:22:01 GMT) Full text and rfc822 format available.

bug reassigned from package 'guix-patches' to 'guix'. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Fri, 29 Apr 2022 16:22:02 GMT) Full text and rfc822 format available.

Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Fri, 29 Apr 2022 16:22:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 54723 <at> debbugs.gnu.org and Guillaume Le Vaillant <glv <at> posteo.net> Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Fri, 29 Apr 2022 16:22:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#54723; Package guix. (Fri, 29 Apr 2022 21:17:01 GMT) Full text and rfc822 format available.

Message #49 received at 54723 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Guillaume Le Vaillant <glv <at> posteo.net>
Cc: Mathieu Othacehe <othacehe <at> gnu.org>, 54723 <at> debbugs.gnu.org
Subject: Re: bug#54723: [PATCH] Check URI when verifying narinfo validity.
Date: Fri, 29 Apr 2022 23:16:32 +0200
Ludovic Courtès <ludo <at> gnu.org> skribis:

> So in commit c1719a0adf3fa7611b56ca4d75b3ac8cf5c9c8ac I went ahead and
> reverted it that commit.  I also added a test that reproduces the issue
> above.

Commit 9eca13094d9bdf091f9096e0f3a8b450dac0e595 updates the ‘guix’
package, so you can now pull, reconfigure, and enjoy.  :-)

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#54723; Package guix. (Sat, 30 Apr 2022 12:14:01 GMT) Full text and rfc822 format available.

Message #52 received at 54723 <at> debbugs.gnu.org (full text, mbox):

From: Guillaume Le Vaillant <glv <at> posteo.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Mathieu Othacehe <othacehe <at> gnu.org>, 54723 <at> debbugs.gnu.org
Subject: Re: bug#54723: [PATCH] Check URI when verifying narinfo validity.
Date: Sat, 30 Apr 2022 12:12:30 +0000
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> skribis:

> Ludovic Courtès <ludo <at> gnu.org> skribis:
>
>> So in commit c1719a0adf3fa7611b56ca4d75b3ac8cf5c9c8ac I went ahead and
>> reverted it that commit.  I also added a test that reproduces the issue
>> above.
>
> Commit 9eca13094d9bdf091f9096e0f3a8b450dac0e595 updates the ‘guix’
> package, so you can now pull, reconfigure, and enjoy.  :-)
>
> Ludo’.

I just tried and everything worked without a glitch.
Thanks.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#54723; Package guix. (Sat, 30 Apr 2022 12:16:02 GMT) Full text and rfc822 format available.

Message #55 received at 54723 <at> debbugs.gnu.org (full text, mbox):

From: Mathieu Othacehe <othacehe <at> gnu.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Guillaume Le Vaillant <glv <at> posteo.net>, 54723 <at> debbugs.gnu.org
Subject: Re: bug#54723: [PATCH] Check URI when verifying narinfo validity.
Date: Sat, 30 Apr 2022 14:15:31 +0200
Hey Ludo,

> Now, the only way ‘narinfo-string’ can take too long these days is (I
> think) if the store GC lock is held (we should check that hypothesis,
> but I believe that if the GC lock is held, then the ‘query-path-info’
> RPC made from ‘narinfo-string’ might block until the lock is released).
> The GC lock is no longer held for hours on berlin, so there’s less
> pressure to address that.

Thanks for having a look to that issue. On my side, I'd like to take
some time to look closely at the situation.

> To summarize: I think ‘guix publish’ is okay as-is but we should
> fiberize it sometime.

I think however that it would be nice to create a wip branch for those
changes, deploy it and monitor it closely on Berlin before pushing. From
experience, the publish code is really fragile and the I/O pressure on
Berlin can exhibit a lot of unexpected issues.

Mathieu




Information forwarded to bug-guix <at> gnu.org:
bug#54723; Package guix. (Sun, 01 May 2022 13:13:02 GMT) Full text and rfc822 format available.

Message #58 received at 54723 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Mathieu Othacehe <othacehe <at> gnu.org>
Cc: Guillaume Le Vaillant <glv <at> posteo.net>, 54723 <at> debbugs.gnu.org
Subject: Re: bug#54723: [PATCH] Check URI when verifying narinfo validity.
Date: Sun, 01 May 2022 15:12:07 +0200
Hi,

Mathieu Othacehe <othacehe <at> gnu.org> skribis:

> I think however that it would be nice to create a wip branch for those
> changes, deploy it and monitor it closely on Berlin before pushing. From
> experience, the publish code is really fragile and the I/O pressure on
> Berlin can exhibit a lot of unexpected issues.

Agreed, it looks like we have to be more cautious now.

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 30 May 2022 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 326 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.