GNU bug report logs - #26201
Downloading substitutes is too slow upon nginx cache misses

Previous Next

Package: guix;

Reported by: <dian_cecht <at> zoho.com>

Date: Tue, 21 Mar 2017 01:46:02 UTC

Severity: important

Tags: fixed

Done: ludo <at> gnu.org (Ludovic Courtès)

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 26201 in the body.
You can then email your comments to 26201 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Tue, 21 Mar 2017 01:46:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to <dian_cecht <at> zoho.com>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Tue, 21 Mar 2017 01:46:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: <dian_cecht <at> zoho.com>
To: GuixSD <bug-guix <at> gnu.org>
Subject: No notification of cache misses when downloading substitutes
Date: Mon, 20 Mar 2017 18:44:49 -0700
Just ran guix pull and guix package -u, and found some of the programs
download VERY slowly (<100kb/s, usually around 95). I asked on #guix
and lfam mentioned it was probably a cache miss.

It would be nice if there was some notification that a cache miss
happened and the download will likely be slow, otherwise a user might
wonder what problem there is with their connection.





Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Tue, 21 Mar 2017 02:46:01 GMT) Full text and rfc822 format available.

Message #8 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: Tobias Geerinckx-Rice <me <at> tobias.gr>
To: dian_cecht <at> zoho.com
Cc: 26201 <at> debbugs.gnu.org
Subject: Re: bug#26201: No notification of cache misses when downloading
 substitutes
Date: Tue, 21 Mar 2017 03:46:29 +0100
[Message part 1 (text/plain, inline)]
Hullo,

On 21/03/17 02:44, dian_cecht <at> zoho.com wrote:
> Just ran guix pull and guix package -u, and found some of the programs
> download VERY slowly (<100kb/s, usually around 95). I asked on #guix
> and lfam mentioned it was probably a cache miss.

Do you mean that *substitutes* existed, but were not yet on
mirror.hydra.gnu.org and so were silently proxied from the much slower
hydra.gnu.org?

Or did Guix fall back to downloading *source* tarballs from some slow
upstream to build locally?

(I've no access to IRC at the mo'.)

Kind regards,

T G-R

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Tue, 21 Mar 2017 02:53:01 GMT) Full text and rfc822 format available.

Message #11 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: <dian_cecht <at> zoho.com>
To: Tobias Geerinckx-Rice <me <at> tobias.gr>
Cc: 26201 <at> debbugs.gnu.org
Subject: Re: bug#26201: No notification of cache misses when downloading
 substitutes
Date: Mon, 20 Mar 2017 19:52:47 -0700
On Tue, 21 Mar 2017 03:46:29 +0100
Tobias Geerinckx-Rice <me <at> tobias.gr> wrote:

> Hullo,
> 
> On 21/03/17 02:44, dian_cecht <at> zoho.com wrote:
> > Just ran guix pull and guix package -u, and found some of the
> > programs download VERY slowly (<100kb/s, usually around 95). I
> > asked on #guix and lfam mentioned it was probably a cache miss.  
> 
> Do you mean that *substitutes* existed, but were not yet on
> mirror.hydra.gnu.org and so were silently proxied from the much slower
> hydra.gnu.org?

The URL displayed during the download was mirror.hydra.gnu.org. 

> 
> Or did Guix fall back to downloading *source* tarballs from some slow
> upstream to build locally?

It was a binary download, not source. At least, I don't recall anything
about compiles at any point (and I'm sure it didn't take long enough to
do that; one package was icecat which I'm sure wouldn't have downloaded
at 90k/s then compiled in less than 15 minutes (fwiw, according to my
build logs firefox takes about 2 hours to build, so unless icecat is
magically orders of magnitude faster to build, then I'm sure it was
just a download + install, and not download + compile + install)
 






Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Tue, 21 Mar 2017 03:57:02 GMT) Full text and rfc822 format available.

Message #14 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: Tobias Geerinckx-Rice <me <at> tobias.gr>
To: dian_cecht <at> zoho.com
Cc: 26201 <at> debbugs.gnu.org
Subject: Re: bug#26201: No notification of cache misses when downloading
 substitutes
Date: Tue, 21 Mar 2017 04:57:09 +0100
[Message part 1 (text/plain, inline)]
Ahoy,

On 21/03/17 03:52, dian_cecht <at> zoho.com wrote:
> The URL displayed during the download was mirror.hydra.gnu.org.
> [...] It was a binary download, not source.

Oh, OK. I'm not an expert on how Hydra's set up these days, but will
assume it's not too different from my own (a fast nginx proxy_cache,
mirror.hydra.gnu.org, in front of a slower build farm, hydra.gnu.org).

Whenever you're the first to request a substitute, mirror.hydra.gnu.org
transparently forwards the request to hydra.gnu.org.

The latter has to compress the response on the fly, leading to much
slower transfer speeds. It slowly sends it back to the mirror, which
slowly sends it on to you while also saving it on disc so all subsequent
downloads will be fast — by Hydra standards – and not involve hydra.gnu.org.

Maybe you knew all this, but it's also the reason that...

> On 21/03/17 02:44, dian_cecht <at> zoho.com wrote:
> It would be nice if there was some notification that a cache miss
> happened and the download will likely be slow, otherwise a user might
> wonder what problem there is with their connection.

...I'm afraid this makes no sense from guix's point of view.

The term ‘cache miss’ here is an implementation detail of our current
Hydra set-up, not something guix can or IMO should care about. There are
hundreds of reasons why your connection might be slow at any given time.
Guix should just tell you so (it does), not guess why. Or worse: know.

(But if others disagree, we'll have to extend the Hydra API to somehow
relay this information to the client, in the spirit of the modern Web.)

HTTP 200½: OK, fine, but it's Going to Suck.

T G-R

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Tue, 21 Mar 2017 04:49:02 GMT) Full text and rfc822 format available.

Message #17 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: <dian_cecht <at> zoho.com>
To: Tobias Geerinckx-Rice <me <at> tobias.gr>
Cc: 26201 <at> debbugs.gnu.org
Subject: Re: bug#26201: No notification of cache misses when downloading
 substitutes
Date: Mon, 20 Mar 2017 21:48:09 -0700
On Tue, 21 Mar 2017 04:57:09 +0100
Tobias Geerinckx-Rice <me <at> tobias.gr> wrote:

> Ahoy,
> 
> On 21/03/17 03:52, dian_cecht <at> zoho.com wrote:
> > The URL displayed during the download was mirror.hydra.gnu.org.
> > [...] It was a binary download, not source.  
> 
> Oh, OK. I'm not an expert on how Hydra's set up these days, but will
> assume it's not too different from my own (a fast nginx proxy_cache,
> mirror.hydra.gnu.org, in front of a slower build farm, hydra.gnu.org).
> 
> Whenever you're the first to request a substitute,
> mirror.hydra.gnu.org transparently forwards the request to
> hydra.gnu.org.
> 
> The latter has to compress the response on the fly, leading to much
> slower transfer speeds. It slowly sends it back to the mirror, which
> slowly sends it on to you while also saving it on disc so all
> subsequent downloads will be fast — by Hydra standards – and not
> involve hydra.gnu.org.
> 
> Maybe you knew all this, but it's also the reason that...

I'm not familiar with the implementation details, nor how hydra is
currently setup.

> > On 21/03/17 02:44, dian_cecht <at> zoho.com wrote:
> > It would be nice if there was some notification that a cache miss
> > happened and the download will likely be slow, otherwise a user
> > might wonder what problem there is with their connection.  
> 
> ...I'm afraid this makes no sense from guix's point of view.
> 
> The term ‘cache miss’ here is an implementation detail of our current
> Hydra set-up, not something guix can or IMO should care about. There
> are hundreds of reasons why your connection might be slow at any
> given time. Guix should just tell you so (it does), not guess why. Or
> worse: know.

I'm not suggesting having Guix tell me why my network is slow, only if
the download might be slow because it's having to pull from
hydra.gnu.org. Having Guix automagically troubleshoot networking
problems is well beyond the scope of a package manager, even one that
goes as far beyond simple package management as Guix does.

> 
> (But if others disagree, we'll have to extend the Hydra API to somehow
> relay this information to the client, in the spirit of the modern
> Web.)

AFAIK, Guix devs are working on a replacement for the current build
system, so the sane option wouldn't be extending the current hydra
system to handle a new API call, but to try and work this type of
feature into the next system. Unless, of course, something like this
could be done in hydra reasonably easily, in which case why not.

Another option would be to have the mirrors automatically cache the
files as soon as they are available to try. I'd hope this would be how
things are handled already, but one never knows.






Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Tue, 21 Mar 2017 06:22:01 GMT) Full text and rfc822 format available.

Message #20 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: Tobias Geerinckx-Rice <me <at> tobias.gr>
To: dian_cecht <at> zoho.com
Cc: 26201 <at> debbugs.gnu.org
Subject: Re: bug#26201: No notification of cache misses when downloading
 substitutes
Date: Tue, 21 Mar 2017 07:21:54 +0100
[Message part 1 (text/plain, inline)]
Mornin',

On 21/03/17 05:48, dian_cecht <at> zoho.com wrote:
> I'm not suggesting having Guix tell me why my network is slow,

I never mentioned your network. Your proxied connection to a substitute
server, yes. And, well, this very bug report is for Guix to tell you why
that's slow...

> only if the download might be slow because it's having to pull from 
> hydra.gnu.org.

(Side note: ‘it’ here is mirror.hydra.gnu.org, never a well-configured
Guix client.)

So to implement this, the client would need to display a ‘warning‘
message or flag sent by the substitute server, to notify the user that
their download might be slower... sometimes... by an unknown amount...
possibly?

But see, that wouldn't be true at all on my system (and surely others),
despite being set up nearly identically to Hydra. On the other hand, my
home download speed fluctuates wildly, even between simultaneous
connections to the same server. Whether or not a file is cached makes no
difference. To be told would be noise at best, misleading at worst.

I'd be against this only for those reasons, but I promise I'm not.

It's just all a bit vague, 's all, and my personal opinion is that once
the vagueness is resolved, not much will remain. But who knows.

> AFAIK, Guix devs are working on a replacement for the current build 
> system, so the sane option wouldn't be extending the current hydra 
> system to handle a new API call, but to try and work this type of 
> feature into the next system.

My point is that it wouldn't be sane, and would be an ugly hack in
either system. Cuirass isn't really different from Hydra is this regard.

Me shut up now :-) I'm more interested in what others have to say.

Kind regards,

T G-R

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Tue, 21 Mar 2017 06:50:02 GMT) Full text and rfc822 format available.

Message #23 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: <dian_cecht <at> zoho.com>
To: Tobias Geerinckx-Rice <me <at> tobias.gr>
Cc: 26201 <at> debbugs.gnu.org
Subject: Re: bug#26201: No notification of cache misses when downloading
 substitutes
Date: Mon, 20 Mar 2017 23:49:12 -0700
On Tue, 21 Mar 2017 07:21:54 +0100
Tobias Geerinckx-Rice <me <at> tobias.gr> wrote:
> > only if the download might be slow because [mirror.hydra] is having
> > to pull from hydra.gnu.org.  
> 
> So to implement this, the client would need to display a ‘warning‘
> message or flag sent by the substitute server, to notify the user that
> their download might be slower... sometimes... by an unknown amount...
> possibly?

Simply a notification that mirror.hydra doesn't currently have a cached
version of the file and the download might be slower than normal would
be fine. As-is, looking up and seeing download speeds that amount to
less than 10% of one's normal bandwidth is a bit concerning since it
would seem like there is a problem. In this case, Guix would be giving
the user some notification that something /is/ out of the ordinary, and
possibly save the user some effort trying to determine the cause of the
slowdown.

> But see, that wouldn't be true at all on my system (and surely
> others), despite being set up nearly identically to Hydra. On the
> other hand, my home download speed fluctuates wildly, even between
> simultaneous connections to the same server.

I'm not sure how any of this matters. If you are running a local Hydra
instance or whatever, then I'd assume you'd be aware of what, if any,
problems that could arise. In this case, I'd hope hydra would allow you
to disable this feature.

> Whether or not a file is cached makes no difference. To be told would
> be noise at best, is leading at worst.

Had I been notified that mirror.hydra was currently pulling from hydra,
it would have saved me the time of jumping on IRC and asking what was
up, which only worked because someone was in #guix and had an idea of
what was going on; had that not been the case, I would have started
looking for the cause for the slowdown and wasted several minutes (at
least) trying to figure out what was wrong, and since it was on
mirror.hydra's end, I'd have no way to know the slowdown was on their
end and not mine, nor my ISP's problem.

> > AFAIK, Guix devs are working on a replacement for the current build 
> > system, so the sane option wouldn't be extending the current hydra 
> > system to handle a new API call, but to try and work this type of 
> > feature into the next system.  
> 
> My point is that it wouldn't be sane, and would be an ugly hack in
> either system.

I don't see how this would have to be "an ugly hack". It's simply a
query and response. The simplest way I can see for this to work would
be for mirror.hydra to either just send the requested file, or a
response that the file isn't cached then start to trickle the file on to
the client. 





Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Tue, 21 Mar 2017 13:00:03 GMT) Full text and rfc822 format available.

Message #26 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: Florian Pelz <pelzflorian <at> pelzflorian.de>
To: dian_cecht <at> zoho.com, Tobias Geerinckx-Rice <me <at> tobias.gr>
Cc: 26201 <at> debbugs.gnu.org
Subject: Re: bug#26201: No notification of cache misses when downloading
 substitutes
Date: Tue, 21 Mar 2017 13:59:27 +0100
On Mon, 2017-03-20 at 21:48 -0700, dian_cecht <at> zoho.com wrote:
> Another option would be to have the mirrors automatically cache the
> files as soon as they are available to try. I'd hope this would be how
> things are handled already, but one never knows.
> 

If it cached everything, it wouldn’t be a cache?




Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Tue, 21 Mar 2017 14:55:02 GMT) Full text and rfc822 format available.

Message #29 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: Tobias Geerinckx-Rice <me <at> tobias.gr>
To: dian_cecht <at> zoho.com
Cc: 26201 <at> debbugs.gnu.org
Subject: Re: bug#26201: No notification of cache misses when downloading
 substitutes
Date: Tue, 21 Mar 2017 15:55:05 +0100
[Message part 1 (text/plain, inline)]
Hullo!

On 21/03/17 07:49, dian_cecht <at> zoho.com wrote:
> I'm not sure how any of this matters. If you are running a local 
> Hydra instance or whatever, then I'd assume you'd be aware of what, 
> if any, problems that could arise.

It matters for the reasons mentioned. It's not a ‘local Hydra’ & I have
no idea what problems you're talking about.

My problem is that every invocation of Guix already fills several
screens with Guile cache misses. Adding another warning (‘warning! the
system is working exactly as designed!’) will only serve to make those
other warnings look less silly, and I think that would be a shame.

To clarify:

- Warnings should be scary because warnings should be actionable.
  There's nothing the user can or needs to do about a cache miss.
- It would be randomly shown to everyone, since this happens constantly.
- The behaviour warned about is not incorrect or abnormal.
- As already noted, it's how caching works.

> I don't see how this would have to be "an ugly hack". It's simply a 
> query and response. The simplest way I can see for this to work would
> be for mirror.hydra to either just send the requested file, or a
> response that the file isn't cached then start to trickle the file on
> to the client.

Well, yeah... That's the ugly hack. :-)

It's not that your suggestion's hard to implement. In fact, it's
just one line for nginx (which it turns out I already had):

  add_header X-Cache-Status $upstream_cache_status;

and 6 lines of lightly-tested Guile (attached)¹. And presto. This thing.

Doesn't mean we should.

Kind regards,

T G-R

¹: Why? Practice. Irony. Light masochism.
[0001-http-client-Warn-on-proxy-cache-misses.patch (text/x-patch, attachment)]
[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Tue, 21 Mar 2017 15:33:02 GMT) Full text and rfc822 format available.

Message #32 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: <dian_cecht <at> zoho.com>
To: Tobias Geerinckx-Rice <me <at> tobias.gr>
Cc: 26201 <at> debbugs.gnu.org
Subject: Re: bug#26201: No notification of cache misses when downloading
 substitutes
Date: Tue, 21 Mar 2017 08:32:39 -0700
On Tue, 21 Mar 2017 15:55:05 +0100
Tobias Geerinckx-Rice <me <at> tobias.gr> wrote:
> To clarify:
> 
> - Warnings should be scary because warnings should be actionable.

There are warnings and there are errors. Warnings don't have to be
scary; I get them every time I update emacs because of duplicate icons
stored in two different directories in the store. Is that actionable?
Not as far as I am concerned, unless I want to hand delete something
from the store, which, as far as I understand it, shouldn't be done.

>   There's nothing the user can or needs to do about a cache miss.

Please reread the 2nd part of my response in Message #23 in this
bugreport for why this is needed.

> - It would be randomly shown to everyone, since this happens
> constantly.

Unless mirror.hydra randomly loses data in it's cache from hydra, it
won't be random in the least.

> - The behaviour warned about is not incorrect or abnormal.

No, but the behavior would inform the user that the unusual and random
slowdown isn't another problem and is because mirror.hydra is having to
update it's cache, which, as I explained before, is useful information.

> [...]

Quite frankly I'd like someone else to take a look at this bug, if
for no other reason than I'm not sure if we're communicating clearly
with each other here. Most of what you are saying makes no sense
whatsoever and seems to miss the point I have attempted to make.

While I will thank you for actually writing a patch, saying "the
caching proxy is working properly! and there's nothing you can do about
it." seems rather cynical and clearly misses the point of what I'm
requesting here.





Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Tue, 21 Mar 2017 15:36:01 GMT) Full text and rfc822 format available.

Message #35 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: <dian_cecht <at> zoho.com>
To: Florian Pelz <pelzflorian <at> pelzflorian.de>
Cc: 26201 <at> debbugs.gnu.org
Subject: Re: bug#26201: No notification of cache misses when downloading
 substitutes
Date: Tue, 21 Mar 2017 08:35:36 -0700
On Tue, 21 Mar 2017 13:59:27 +0100
Florian Pelz <pelzflorian <at> pelzflorian.de> wrote:

> On Mon, 2017-03-20 at 21:48 -0700, dian_cecht <at> zoho.com wrote:
> > Another option would be to have the mirrors automatically cache the
> > files as soon as they are available to try. I'd hope this would be
> > how things are handled already, but one never knows.
> >   
> 
> If it cached everything, it wouldn’t be a cache?

If the point is to reduce the load on hydra, then at some point it
could have everything. If it doesn't, then why have a mirror when it's
just pulling right the source all the time anyways?





Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Tue, 21 Mar 2017 16:09:02 GMT) Full text and rfc822 format available.

Message #38 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: Tobias Geerinckx-Rice <me <at> tobias.gr>
To: dian_cecht <at> zoho.com
Cc: 26201 <at> debbugs.gnu.org
Subject: Re: bug#26201: No notification of cache misses when downloading
 substitutes
Date: Tue, 21 Mar 2017 17:07:56 +0100
[Message part 1 (text/plain, inline)]
On 21/03/17 16:32, dian_cecht <at> zoho.com wrote:
> Unless mirror.hydra randomly loses data in it's cache from hydra, it
> won't be random in the least.

It will. Whether one is first to download from the cache after the
substitute is built is essentially random.

> Quite frankly I'd like someone else to take a look at this bug,

Glad you agree.

> if for no other reason than I'm not sure if we're communicating clearly
> with each other here. Most of what you are saying makes no sense
> whatsoever and seems to miss the point I have attempted to make.

I assure you it does not.

Kind regards,

T G-R

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Tue, 21 Mar 2017 16:44:02 GMT) Full text and rfc822 format available.

Message #41 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Tobias Geerinckx-Rice <me <at> tobias.gr>
Cc: 26201 <at> debbugs.gnu.org, dian_cecht <at> zoho.com
Subject: Re: bug#26201: No notification of cache misses when downloading
 substitutes
Date: Tue, 21 Mar 2017 17:43:29 +0100
Hello!

Tobias Geerinckx-Rice <me <at> tobias.gr> skribis:

> Oh, OK. I'm not an expert on how Hydra's set up these days, but will
> assume it's not too different from my own (a fast nginx proxy_cache,
> mirror.hydra.gnu.org, in front of a slower build farm, hydra.gnu.org).

I think there’s room for improvement in our nginx config at
<https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/mirror.conf>.

For instance, I just discovered ‘proxy_cache_lock’ while looking at
<http://nginx.org/en/docs/http/ngx_http_proxy_module.html>; looks useful
in reducing load on hydra.gnu.org.  Surely there are other ways to tweak
caching.

Besides, I’d like to use ‘guix publish’ on hydra.gnu.org.  I suspect
it’s going to be faster than Starman (the HTTP server behind Hydra), and
also it uses an in-process gzip by default, as opposed to bzip2 which is
what Hydra uses (better compression ratio, but super CPU-intensive).

At any rate, clients should not paper over server-side performance
issues IMO.

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Tue, 21 Mar 2017 17:08:01 GMT) Full text and rfc822 format available.

Message #44 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: Tobias Geerinckx-Rice <me <at> tobias.gr>
To: ludo <at> gnu.org
Cc: 26201 <at> debbugs.gnu.org
Subject: Re: bug#26201: No notification of cache misses when downloading
 substitutes
Date: Tue, 21 Mar 2017 18:08:02 +0100
[Message part 1 (text/plain, inline)]
Ludo',

On 21/03/17 17:43, Ludovic Courtès wrote:
> I think there’s room for improvement in our nginx config at
> <https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/mirror.conf>.
> 
> For instance, I just discovered ‘proxy_cache_lock’ while looking at
> <http://nginx.org/en/docs/http/ngx_http_proxy_module.html>; looks useful
> in reducing load on hydra.gnu.org.  Surely there are other ways to tweak
> caching.

Indeed! For reference, here's my cache configuration.

That's right. Now you can all¹ steal some criminally overpriced Belgian
bandwidth!

  server {
    server_name                 substitutes.tobias.gr;
    listen                      [::]:443 ssl http2;
    listen                           443 ssl http2;

    # FIXME move to main LE cert
    ssl_certificate             substitutes.pem;
    ssl_certificate_key         substitutes.key;

    # "" means ‘inherit from upstream’ here.
    add_header                  Cache-Control "";
    # So does ‘off’. This is all a bit hacky.
    expires                     off;
    proxy_hide_header           Set-Cookie;
    proxy_ignore_headers        Set-Cookie;

    # Almost all traffic is already compressed.
    gzip                        off;

    ...

    location / {
      limit_except GET {        deny all; }
      proxy_pass                SUPER_SEKRIT_BACKEND;

      # https://www.nginx.com/blog/nginx-caching-guide
      add_header                X-Cache-Status $upstream_cache_status;

      proxy_cache               default;
      # We allow only GET requests, so don't waste key space:
      proxy_cache_key           "$request_uri";
      proxy_cache_lock          on;
      proxy_cache_lock_timeout  3h; #yolo
      proxy_cache_use_stale     error timeout
                                http_500 http_502 http_503 http_504;
    }
  ...
  }

I'm sure it's hardly optimal (or, erm, ‘good’) either but it works.

> Besides, I’d like to use ‘guix publish’ on hydra.gnu.org.  I suspect
> it’s going to be faster than Starman (the HTTP server behind Hydra), and
> also it uses an in-process gzip by default, as opposed to bzip2 which is
> what Hydra uses (better compression ratio, but super CPU-intensive).

Back when I used Hydra-the-software I do so briefly and I think it
worked. But no hard tests.

> At any rate, clients should not paper over server-side performance
> issues IMO.

Entirely off-topic, but this 'tude is a part of what drew me to Guix in
the first place. So, like, thanks, in general :-)

Kind regards,

T G-R

¹: Just put it *after* mirror.hydra.gnu.org, OK?

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Wed, 22 Mar 2017 22:07:02 GMT) Full text and rfc822 format available.

Message #47 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Tobias Geerinckx-Rice <me <at> tobias.gr>
Cc: 26201 <at> debbugs.gnu.org
Subject: Re: bug#26201: No notification of cache misses when downloading
 substitutes
Date: Wed, 22 Mar 2017 23:06:11 +0100
Hey Tobias,

Tobias Geerinckx-Rice <me <at> tobias.gr> skribis:

> On 21/03/17 17:43, Ludovic Courtès wrote:
>> I think there’s room for improvement in our nginx config at
>> <https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/mirror.conf>.
>> 
>> For instance, I just discovered ‘proxy_cache_lock’ while looking at
>> <http://nginx.org/en/docs/http/ngx_http_proxy_module.html>; looks useful
>> in reducing load on hydra.gnu.org.  Surely there are other ways to tweak
>> caching.
>
> Indeed! For reference, here's my cache configuration.
>
> That's right. Now you can all¹ steal some criminally overpriced Belgian
> bandwidth!

Heheh.  :-)

>       limit_except GET {        deny all; }
>       proxy_pass                SUPER_SEKRIT_BACKEND;
>
>       # https://www.nginx.com/blog/nginx-caching-guide
>       add_header                X-Cache-Status $upstream_cache_status;
>
>       proxy_cache               default;
>       # We allow only GET requests, so don't waste key space:
>       proxy_cache_key           "$request_uri";
>       proxy_cache_lock          on;
>       proxy_cache_lock_timeout  3h; #yolo
>       proxy_cache_use_stale     error timeout
>                                 http_500 http_502 http_503 http_504;

I didn’t fully understand the docs for the last 3 directives here.  For
instance, what happens when 10 clients do GET /nar/xyz-texlive?  Do the
9 unlucky clients wait for 3 hours and then get 404?

Anyway, thanks for sharing your tips.  :-)

> Entirely off-topic, but this 'tude is a part of what drew me to Guix in
> the first place. So, like, thanks, in general :-)

:-)

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Wed, 22 Mar 2017 22:23:01 GMT) Full text and rfc822 format available.

Message #50 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Tobias Geerinckx-Rice <me <at> tobias.gr>
Cc: 26201 <at> debbugs.gnu.org, guix-sysadmin <at> gnu.org
Subject: hydra.gnu.org uses ‘guix publish’ for
 nars and narinfos
Date: Wed, 22 Mar 2017 23:22:37 +0100
Hi again!

Until now hydra.gnu.org was using Hydra (the software) to serve not only
the Web interface but also all the .narinfo and /nar URLs (substitute
meta-data and substitutes).

Starting from now, hydra.gnu.org directs all .narinfo and corresponding
nar requests to ‘guix publish’ instead of Hydra.

‘guix publish’ should be faster and less resource-hungry than Hydra.  It
uses in-process gzip for nar compression instead of bzip2 (I chose level
7, which seems to provide compression ratios close to what bzip2
provides with its default compression level, while being 3 times
faster).  Unlike Hydra it never forks so for instance, 404 responses for
.narinfo URLs should be quicker.  Hopefully, that will improve the
worst-case (cache miss) throughput.

I configured nginx in such a way that the former Hydra-provided /nar
URLs (which are cached in nginx instances, in our
/var/guix/substitute/cache directories, etc.) are still available.
‘guix publish’ uses the /guix/nar URLs while Hydra uses /nar, so the
nginx config redirects to either Hydra or ‘guix publish’ depending on
the URL:

  https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/hydra.gnu.org-locations.conf#n29

Hydra-provided .narinfos are still cached here and there; they’ll be
progressively expire and be replaced by ‘guix publish’-provided
.narinfos.

Let me know if you notice anything fishy!

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Thu, 23 Mar 2017 10:30:02 GMT) Full text and rfc822 format available.

Message #53 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Tobias Geerinckx-Rice <me <at> tobias.gr>, 26201 <at> debbugs.gnu.org,
 guix-sysadmin <at> gnu.org
Subject: Re: hydra.gnu.org uses ‘guix publish’
 for nars and narinfos
Date: Thu, 23 Mar 2017 11:29:29 +0100
Ludovic Courtès <ludo <at> gnu.org> writes:

> Until now hydra.gnu.org was using Hydra (the software) to serve not only
> the Web interface but also all the .narinfo and /nar URLs (substitute
> meta-data and substitutes).
>
> Starting from now, hydra.gnu.org directs all .narinfo and corresponding
> nar requests to ‘guix publish’ instead of Hydra.

That’s very cool!  I’m happy to see more of Hydra replaced.

-- 
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net





Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Thu, 23 Mar 2017 18:37:02 GMT) Full text and rfc822 format available.

Message #56 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: ludo <at> gnu.org (Ludovic Courtès)
Cc: Tobias Geerinckx-Rice <me <at> tobias.gr>, 26201 <at> debbugs.gnu.org,
 guix-sysadmin <at> gnu.org
Subject: Re: bug#26201: hydra.gnu.org uses ‘guix publish’ for nars and narinfos
Date: Thu, 23 Mar 2017 14:36:20 -0400
ludo <at> gnu.org (Ludovic Courtès) writes:

> Hi again!
>
> Until now hydra.gnu.org was using Hydra (the software) to serve not only
> the Web interface but also all the .narinfo and /nar URLs (substitute
> meta-data and substitutes).
>
> Starting from now, hydra.gnu.org directs all .narinfo and corresponding
> nar requests to ‘guix publish’ instead of Hydra.
>
> ‘guix publish’ should be faster and less resource-hungry than Hydra.  It
> uses in-process gzip for nar compression instead of bzip2 (I chose level
> 7, which seems to provide compression ratios close to what bzip2
> provides with its default compression level, while being 3 times
> faster).  Unlike Hydra it never forks so for instance, 404 responses for
> .narinfo URLs should be quicker.  Hopefully, that will improve the
> worst-case (cache miss) throughput.

Excellent!  Any improvement in 404 response time will be very helpful.
I've noticed that spikes of narinfo requests resulting in 404 has been a
major source of overloading on Hydra, because these requests cannot be
cached for very long.  The reason: if we cache those failures for N
minutes, this effectively delays the appearance of new nars by N minutes
(if it was requested before that).  This forces us to choose a small N
for negative cache entries, which means the cache is not much help here.

One question: what will happen in the case of multiple concurrent
requests for the same nar?  Will multiple nar-pack-and-bzip2 processes
be run on-demand?  Recall that the nginx proxy will pass all of those
requests through, and not create the cache entry until it has received a
complete response.  This has caused us severe problems with huge nars
such as texinfo-texmf, to the point that we had to crudely block those
nar requests.  Unfortunately, it is not obvious how to block the
associated narinfo requests due to the lack of job name in the URL, so
this results in failures on the client side that must be manually worked
around.

     Thanks,
       Mark




Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Thu, 23 Mar 2017 18:52:02 GMT) Full text and rfc822 format available.

Message #59 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: Tobias Geerinckx-Rice <me <at> tobias.gr>
To: mhw <at> netris.org
Cc: ludo <at> gnu.org, 26201 <at> debbugs.gnu.org, guix-sysadmin <at> gnu.org
Subject: Re: bug#26201: hydra.gnu.org uses ‘guix publish’ for nars and narinfos
Date: Thu, 23 Mar 2017 19:52:30 +0100
[Message part 1 (text/plain, inline)]
Mark,

On 23/03/17 19:36, Mark H Weaver wrote:
> One question: what will happen in the case of multiple concurrent
> requests for the same nar?  Will multiple nar-pack-and-bzip2 processes
> be run on-demand?

I think this used to be the case with the previous nginx configuration,
but the recent changes pushed by Ludo' were aimed in part at preventing
that.

> Recall that the nginx proxy will pass all of those requests through,

Are you sure? I was under the impression¹ that this is exactly what
‘proxy_cache_lock on;’ prevents. I'm no nginx guru, obviously, so please
— anyone! — correct me if I'm misguided.

Kind regards,

T G-R

¹:
https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_lock

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Thu, 23 Mar 2017 19:25:01 GMT) Full text and rfc822 format available.

Message #62 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: Tobias Geerinckx-Rice <me <at> tobias.gr>
To: ludo <at> gnu.org
Cc: 26201 <at> debbugs.gnu.org
Subject: Re: bug#26201: hydra.gnu.org uses ‘guix publish’ for nars and narinfos
Date: Thu, 23 Mar 2017 20:25:44 +0100
[Message part 1 (text/plain, inline)]
Ludo',

On 22/03/17 23:06, Ludovic Courtès wrote:
> Tobias Geerinckx-Rice <me <at> tobias.gr> skribis:
>>       proxy_cache_lock          on;
>>       proxy_cache_lock_timeout  3h; #yolo
>>       proxy_cache_use_stale     error timeout
>>                                 http_500 http_502 http_503 http_504;
> I didn’t fully understand the docs for the last 3 directives here.  For
> instance, what happens when 10 clients do GET /nar/xyz-texlive?  Do the
> 9 unlucky clients wait for 3 hours and then get 404?

From ‘proxy_cache_lock’ [1]:

  “When enabled, only one request at a time will be allowed to populate
   a new cache element identified according to the proxy_cache_key
   directive by passing a request to a proxied server. Other requests
   of the same cache element will either wait for a response to appear
   in the cache or the cache lock for this element to be released, up
   to the time set by the proxy_cache_lock_timeout directive.”

Hmm. Good point: ‘to appear in the cache’, when we don't cache 404s or
even 410s.

I don't actually know.

Kind regards,

T G-R

[1]:
https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_lock

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Fri, 24 Mar 2017 02:17:01 GMT) Full text and rfc822 format available.

Message #65 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Tobias Geerinckx-Rice <me <at> tobias.gr>
Cc: 26201 <at> debbugs.gnu.org, dian_cecht <at> zoho.com
Subject: Re: bug#26201: No notification of cache misses when downloading
 substitutes
Date: Thu, 23 Mar 2017 19:15:48 -0700
[Message part 1 (text/plain, inline)]
Hi!

Tobias Geerinckx-Rice <me <at> tobias.gr> writes:

> On 21/03/17 16:32, dian_cecht <at> zoho.com wrote:
>> Unless mirror.hydra randomly loses data in it's cache from hydra, it
>> won't be random in the least.
>
> It will. Whether one is first to download from the cache after the
> substitute is built is essentially random.
>
>> Quite frankly I'd like someone else to take a look at this bug,
>
> Glad you agree.
>
>> if for no other reason than I'm not sure if we're communicating clearly
>> with each other here. Most of what you are saying makes no sense
>> whatsoever and seems to miss the point I have attempted to make.
>
> I assure you it does not.
>
> Kind regards,
>
> T G-R

Please allow me to jump in and voice my opinion here. To me it doesn't
make sense to concern the Guix client with implementation details of how
the caching of substitutes happen and its impacts.

This situation is bound to change in the future or become irrelevant
(say, if a new build farm would be able to sustain higher transfer
speeds to the cache mirror), or if the caching implementation changes.

If the current cache building implementation is slow to the point of
being a problem it should be fixed (or documented).

Cheers,

Maxim
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Fri, 24 Mar 2017 08:14:02 GMT) Full text and rfc822 format available.

Message #68 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: Tobias Geerinckx-Rice <me <at> tobias.gr>
Cc: ludo <at> gnu.org, 26201 <at> debbugs.gnu.org, guix-sysadmin <at> gnu.org
Subject: Re: bug#26201: hydra.gnu.org uses ‘guix publish’ for nars and narinfos
Date: Fri, 24 Mar 2017 04:12:50 -0400
Hi,

Tobias Geerinckx-Rice <me <at> tobias.gr> writes:

> On 23/03/17 19:36, Mark H Weaver wrote:
>> One question: what will happen in the case of multiple concurrent
>> requests for the same nar?  Will multiple nar-pack-and-bzip2 processes
>> be run on-demand?
>
> I think this used to be the case with the previous nginx configuration,
> but the recent changes pushed by Ludo' were aimed in part at preventing
> that.
>
>> Recall that the nginx proxy will pass all of those requests through,
>
> Are you sure? I was under the impression¹ that this is exactly what
> ‘proxy_cache_lock on;’ prevents. I'm no nginx guru, obviously, so please
> — anyone! — correct me if I'm misguided.

I agree that "proxy_cache_lock on" should prevent multiple concurrent
requests for the same URL, but unfortunately its behavior is quite
undesirable, and arguably worse than leaving it off in our case.  See:

  https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_lock

Specifically:

  Other requests of the same cache element will either wait for a
  response to appear in the cache or the cache lock for this element to
  be released, up to the time set by the proxy_cache_lock_timeout
  directive.

In our problem case, it takes more than an hour for Hydra to finish
sending a response for the 'texlive-texmf' nar.  During that time, the
nar will be slowly sent to the first client while it's being packed and
bzipped on-demand.

IIUC, with "proxy_cache_lock on", we have two choices of how other
client requests will be treated:

(1) If we increase "proxy_cache_lock_timeout" to a huge value, then
    there will *no* data sent to the other clients until the first
    client has received the entire nar, which means they wait over an
    hour before receiving the first byte.  I guess this will result in
    timeouts on the client side.

(2) If "proxy_cache_lock_timeout" is *not* huge, then all other clients
    will get failure responses until the first client has received the
    entire nar.

Either way, this would cause users to see the same download failures
(requiring user work-arounds like --fallback) that this fix is intended
to prevent for 'texlive-texmf', but instead of happening only for that
one nar, it will now happen for *all* large nars.

Or at least that's what I'd expect based on my reading of the nginx docs
linked above.  I haven't tried it.

IMO, the best solution is to *never* generate nars on Hydra in response
to client requests, but rather to have the build slaves pack and
compress the nars, copy them to Hydra, and then serve them as static
files using nginx.

A far inferior solution, but possibly acceptable and closer to the
current approach, would be to arrange for all concurrent responses for
the same nar to be sent incrementally from a single nar-packing process.
More concretely, while packing and sending a nar response to the first
client, the data would also be written to a file.  Subsequent requests
for the same nar would be serviced using the equivalent of:

  tail --bytes=+0 --follow FILENAME

This way, no one would have to wait an hour to receive the first byte.

What do you think?

      Mark




Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Fri, 24 Mar 2017 09:26:01 GMT) Full text and rfc822 format available.

Message #71 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Mark H Weaver <mhw <at> netris.org>
Cc: Tobias Geerinckx-Rice <me <at> tobias.gr>, 26201 <at> debbugs.gnu.org,
 guix-sysadmin <at> gnu.org
Subject: Re: bug#26201: hydra.gnu.org uses ‘guix publish’ for nars and narinfos
Date: Fri, 24 Mar 2017 10:25:35 +0100
Hi!

Mark H Weaver <mhw <at> netris.org> skribis:

> Tobias Geerinckx-Rice <me <at> tobias.gr> writes:

[...]

>> Are you sure? I was under the impression¹ that this is exactly what
>> ‘proxy_cache_lock on;’ prevents. I'm no nginx guru, obviously, so please
>> — anyone! — correct me if I'm misguided.
>
> I agree that "proxy_cache_lock on" should prevent multiple concurrent
> requests for the same URL, but unfortunately its behavior is quite
> undesirable, and arguably worse than leaving it off in our case.  See:
>
>   https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_lock
>
> Specifically:
>
>   Other requests of the same cache element will either wait for a
>   response to appear in the cache or the cache lock for this element to
>   be released, up to the time set by the proxy_cache_lock_timeout
>   directive.
>
> In our problem case, it takes more than an hour for Hydra to finish
> sending a response for the 'texlive-texmf' nar.  During that time, the
> nar will be slowly sent to the first client while it's being packed and
> bzipped on-demand.
>
> IIUC, with "proxy_cache_lock on", we have two choices of how other
> client requests will be treated:
>
> (1) If we increase "proxy_cache_lock_timeout" to a huge value, then
>     there will *no* data sent to the other clients until the first
>     client has received the entire nar, which means they wait over an
>     hour before receiving the first byte.  I guess this will result in
>     timeouts on the client side.
>
> (2) If "proxy_cache_lock_timeout" is *not* huge, then all other clients
>     will get failure responses until the first client has received the
>     entire nar.
>
> Either way, this would cause users to see the same download failures
> (requiring user work-arounds like --fallback) that this fix is intended
> to prevent for 'texlive-texmf', but instead of happening only for that
> one nar, it will now happen for *all* large nars.

My understanding is that proxy_cache_lock allows us to avoid spawning
concurrent compression threads of the same item at the same time, while
also avoiding starvation (proxy_cache_lock_timeout should ensure that
nobody ends up waiting until the nar-compression process is done.)

IOW, it should help reduce load in most cases, while introducing small
delays in some cases (if you’re downloading a nar that’s already being
downloaded.)

> IMO, the best solution is to *never* generate nars on Hydra in response
> to client requests, but rather to have the build slaves pack and
> compress the nars, copy them to Hydra, and then serve them as static
> files using nginx.

The problem is that we want nars to be signed by the master node.  Or,
if we don’t require that, we need a PKI that allows us to express the
fact that hydra.gnu.org delegates to the build machines.

> A far inferior solution, but possibly acceptable and closer to the
> current approach, would be to arrange for all concurrent responses for
> the same nar to be sent incrementally from a single nar-packing process.
> More concretely, while packing and sending a nar response to the first
> client, the data would also be written to a file.  Subsequent requests
> for the same nar would be serviced using the equivalent of:
>
>   tail --bytes=+0 --follow FILENAME
>
> This way, no one would have to wait an hour to receive the first byte.

Yes.  I would think that NGINX does something like that for its caching,
but I don’t know exactly when/how.

Other solutions I’ve thought about:

  1. Produce narinfos and nars periodically rather than on-demand and
     serve them as static files.

     pros: better HTTP latency and bandwidth
     pros: allows us to add a Content-Length for nars
     cons: doesn’t reduce load on hydra.gnu.org
     cons: introduces arbitrary delays in delivering nars
     cons: difficult/expensive to know what new store items are available

  2. Produce a narinfo and corresponding nar the first time they are
     requested.  So, the first time we receive “GET foo.narinfo”, return
     404 and spawn a thread to compute foo.narinfo and foo.nar.  Return
     200 only when both are ready.

     The precomputed nar{,info}s would be kept in a cache and we could
     make sure a narinfo and its nar have the same lifetime, which
     addresses one of the problems we have.

     pros: better HTTP latency and bandwidth
     pros: allows us to add a Content-Length for nars
     pros: helps keep narinfo/nar lifetime in sync
     cons: doesn’t reduce load on hydra.gnu.org
     cons: exposes inconsistency between the store contents and the HTTP
           response (you may get 404 even if the thing is actually in
           store), but maybe that’s not a problem

Thoughts?

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Sun, 26 Mar 2017 17:36:01 GMT) Full text and rfc822 format available.

Message #74 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: Tobias Geerinckx-Rice <me <at> tobias.gr>
To: mhw <at> netris.org
Cc: ludo <at> gnu.org, 26201 <at> debbugs.gnu.org, guix-sysadmin <at> gnu.org
Subject: Re: bug#26201: hydra.gnu.org uses ‘guix publish’ for nars and narinfos
Date: Sun, 26 Mar 2017 19:35:25 +0200
[Message part 1 (text/plain, inline)]
Mark,

On 24/03/17 09:12, Mark H Weaver wrote:
> IIUC, with "proxy_cache_lock on", we have two choices of how other
> client requests will be treated:
> 
>   [badly, ed.]

Eh. You're probably (and disappointingly) right.

When configuring my little cache, I had a clear idea of how such a cache
 should work (basically, your last scenario below), then looked at the
nginx documentation to find what I had in mind. ‘proxy_cache_lock’ matched.

I should have been more pessimistic and done more testing.
Shame on me, &c. Too much other things on my mind. :-/

> Or at least that's what I'd expect based on my reading of the nginx docs
> linked above.  I haven't tried it.

I can try to do some simple tests tomorrow.

> IMO, the best solution is to *never* generate nars on Hydra in response
> to client requests, but rather to have the build slaves pack and
> compress the nars, copy them to Hydra, and then serve them as static
> files using nginx.

A true mirror at last! Do we have the disc space for that?

And could Hydra actually handle compressing *everything*, without an
infinitely growing back-log? I don't have access to any statistics, but
I'm guessing that a fair number of package+versions are never actually
requested, and hence never compressed. This would change that.

> A far inferior solution, but possibly acceptable and closer to the
> current approach, would be to arrange for all concurrent responses for
> the same nar to be sent incrementally from a single nar-packing process.
> More concretely, while packing and sending a nar response to the first
> client, the data would also be written to a file.  Subsequent requests
> for the same nar would be serviced using the equivalent of:
> 
>   tail --bytes=+0 --follow FILENAME
> 
> This way, no one would have to wait an hour to receive the first byte.

^ This is so obviously the right solution, that it would be
disappointing if nginx really couldn't be made to do it. It already
buffers proxy responses to a temporary file anyway...

Kind regards,

T G-R

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Mon, 27 Mar 2017 11:21:01 GMT) Full text and rfc822 format available.

Message #77 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Tobias Geerinckx-Rice <me <at> tobias.gr>
Cc: 26201 <at> debbugs.gnu.org, guix-sysadmin <at> gnu.org
Subject: Bandwidth when retrieving substitutes
Date: Mon, 27 Mar 2017 13:20:42 +0200
Hi there!

ludo <at> gnu.org (Ludovic Courtès) skribis:

> ‘guix publish’ should be faster and less resource-hungry than Hydra.  It
> uses in-process gzip for nar compression instead of bzip2 (I chose level
> 7, which seems to provide compression ratios close to what bzip2
> provides with its default compression level, while being 3 times
> faster).  Unlike Hydra it never forks so for instance, 404 responses for
> .narinfo URLs should be quicker.  Hopefully, that will improve the
> worst-case (cache miss) throughput.

Another interesting data point on the client side this time:

--8<---------------cut here---------------start------------->8---
$ wget -O- https://mirror.hydra.gnu.org/nar/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17 |bunzip2 >/dev/null
--2017-03-27 13:12:50--  https://mirror.hydra.gnu.org/nar/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17
Resolving mirror.hydra.gnu.org (mirror.hydra.gnu.org)... 131.159.14.26, 2001:4ca0:2001:10:225:90ff:fedb:c720
Connecting to mirror.hydra.gnu.org (mirror.hydra.gnu.org)|131.159.14.26|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-nix-archive]
Saving to: ‘STDOUT’

-                                [                            <=>                ]  53.01M  9.29MB/s    in 5.5s    

2017-03-27 13:12:55 (9.57 MB/s) - written to stdout [55582050]

$ wget -O- https://mirror.hydra.gnu.org/guix/nar/gzip/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17 |gunzip >/dev/null
--2017-03-27 13:13:00--  https://mirror.hydra.gnu.org/guix/nar/gzip/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17
Resolving mirror.hydra.gnu.org (mirror.hydra.gnu.org)... 131.159.14.26, 2001:4ca0:2001:10:225:90ff:fedb:c720
Connecting to mirror.hydra.gnu.org (mirror.hydra.gnu.org)|131.159.14.26|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-nix-archive]
Saving to: ‘STDOUT’

-                                [        <=>                                    ]  59.19M  40.8MB/s    in 1.4s    

2017-03-27 13:13:02 (40.8 MB/s) - written to stdout [62068901]

$ wget -O- https://mirror.hydra.gnu.org/guix/nar/gzip/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17 >/dev/null
--2017-03-27 13:15:58--  https://mirror.hydra.gnu.org/guix/nar/gzip/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17
Resolving mirror.hydra.gnu.org (mirror.hydra.gnu.org)... 131.159.14.26, 2001:4ca0:2001:10:225:90ff:fedb:c720
Connecting to mirror.hydra.gnu.org (mirror.hydra.gnu.org)|131.159.14.26|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-nix-archive]
Saving to: ‘STDOUT’

-                                [       <=>                                     ]  59.19M  42.5MB/s    in 1.4s    

2017-03-27 13:16:00 (42.5 MB/s) - written to stdout [62068901]
--8<---------------cut here---------------end--------------->8---

40 MB/s vs. 10 MB/s!  (Both items were cached on mirror.hydra.gnu.org.)

IOW, bunzip2 was the bottleneck when retrieving substitutes (and that’s
on an i7.)  With ‘perf timechart’ we see that bunzip2 is indeed busy
all the time right from the start.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Mon, 27 Mar 2017 18:47:01 GMT) Full text and rfc822 format available.

Message #80 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: Tobias Geerinckx-Rice <me <at> tobias.gr>
To: 26201 <at> debbugs.gnu.org
Cc: ludo <at> gnu.org
Subject: Re: bug#26201: hydra.gnu.org uses ‘guix publish’ for nars and narinfos
Date: Mon, 27 Mar 2017 20:47:17 +0200
[Message part 1 (text/plain, inline)]
Guix,

On 26/03/17 19:35, Tobias Geerinckx-Rice wrote:
> I can try to do some simple tests tomorrow.

Two observations:

- ‘proxy_cache_lock_timeout’ alone won't suffice to serialise requests;
  ‘proxy_cache_lock_age’ must also be set to an equally ridiculously
  long span. Otherwise, multiple requests will still be sent to ‘guix
  publish’ if they are more than 5s apart. Bleh.

  (The problem then becomes that clients will stall while the file is
   being cached, as explained by Mark. curl patiently waited.)

- Say client A requests a nar from ‘guix publish’ (no nginx involved).
  If another client requests the same nar while A's still downloading,
  ‘guix publish’ will... silently drop A's connection?
  I was not expecting this.

Kind regards,

T G-R

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Tue, 28 Mar 2017 14:48:02 GMT) Full text and rfc822 format available.

Message #83 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Tobias Geerinckx-Rice <me <at> tobias.gr>
Cc: 26201 <at> debbugs.gnu.org
Subject: Re: bug#26201: hydra.gnu.org uses ‘guix publish’ for nars and narinfos
Date: Tue, 28 Mar 2017 16:47:14 +0200
Hey!

Tobias Geerinckx-Rice <me <at> tobias.gr> skribis:

> On 26/03/17 19:35, Tobias Geerinckx-Rice wrote:
>> I can try to do some simple tests tomorrow.
>
> Two observations:
>
> - ‘proxy_cache_lock_timeout’ alone won't suffice to serialise requests;
>   ‘proxy_cache_lock_age’ must also be set to an equally ridiculously
>   long span. Otherwise, multiple requests will still be sent to ‘guix
>   publish’ if they are more than 5s apart. Bleh.
>
>   (The problem then becomes that clients will stall while the file is
>    being cached, as explained by Mark. curl patiently waited.)

Setting ‘proxy_cache_lock_timeout’ to 5s is reasonable I think: if
you’re unlucky, you wait for 5 seconds, and then we get ‘guix publish’
threads serving the same request in parallel; in the most common case,
there’s only ever one instance of a given request being served at a
given time.

> - Say client A requests a nar from ‘guix publish’ (no nginx involved).
>   If another client requests the same nar while A's still downloading,
>   ‘guix publish’ will... silently drop A's connection?
>   I was not expecting this.

That would be a bug.  Do you have an easy way to reproduce?

Thanks,
Ludo’.




Changed bug title to 'Downloading substitutes is too slow upon nginx cache misses' from 'No notification of cache misses when downloading substitutes' Request was from ludo <at> gnu.org (Ludovic Courtès) to control <at> debbugs.gnu.org. (Sat, 08 Apr 2017 21:19:01 GMT) Full text and rfc822 format available.

Severity set to 'important' from 'normal' Request was from ludo <at> gnu.org (Ludovic Courtès) to control <at> debbugs.gnu.org. (Sat, 08 Apr 2017 21:19:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Mon, 17 Apr 2017 21:37:02 GMT) Full text and rfc822 format available.

Message #90 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Mark H Weaver <mhw <at> netris.org>
Cc: Tobias Geerinckx-Rice <me <at> tobias.gr>, 26201 <at> debbugs.gnu.org,
 guix-sysadmin <at> gnu.org
Subject: Re: bug#26201: hydra.gnu.org uses ‘guix publish’ for nars and narinfos
Date: Mon, 17 Apr 2017 23:36:06 +0200
Hello,

ludo <at> gnu.org (Ludovic Courtès) skribis:

> Other solutions I’ve thought about:
>
>   1. Produce narinfos and nars periodically rather than on-demand and
>      serve them as static files.
>
>      pros: better HTTP latency and bandwidth
>      pros: allows us to add a Content-Length for nars
>      cons: doesn’t reduce load on hydra.gnu.org
>      cons: introduces arbitrary delays in delivering nars
>      cons: difficult/expensive to know what new store items are available
>
>   2. Produce a narinfo and corresponding nar the first time they are
>      requested.  So, the first time we receive “GET foo.narinfo”, return
>      404 and spawn a thread to compute foo.narinfo and foo.nar.  Return
>      200 only when both are ready.
>
>      The precomputed nar{,info}s would be kept in a cache and we could
>      make sure a narinfo and its nar have the same lifetime, which
>      addresses one of the problems we have.
>
>      pros: better HTTP latency and bandwidth
>      pros: allows us to add a Content-Length for nars
>      pros: helps keep narinfo/nar lifetime in sync
>      cons: doesn’t reduce load on hydra.gnu.org
>      cons: exposes inconsistency between the store contents and the HTTP
>            response (you may get 404 even if the thing is actually in
>            store), but maybe that’s not a problem

The ‘wip-publish-baking’ implements #2 as a new option to ‘guix
publish’.  It gives some control on the upper bound on CPU usage since
we can specify how many worker threads are used.

I’ll finish it soon so we can experiment with it.

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Tue, 18 Apr 2017 21:29:01 GMT) Full text and rfc822 format available.

Message #93 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Mark H Weaver <mhw <at> netris.org>
Cc: Tobias Geerinckx-Rice <me <at> tobias.gr>, 26201 <at> debbugs.gnu.org,
 guix-sysadmin <at> gnu.org
Subject: Re: bug#26201: hydra.gnu.org uses ‘guix publish’ for nars and narinfos
Date: Tue, 18 Apr 2017 23:27:44 +0200
ludo <at> gnu.org (Ludovic Courtès) skribis:

>   2. Produce a narinfo and corresponding nar the first time they are
>      requested.  So, the first time we receive “GET foo.narinfo”, return
>      404 and spawn a thread to compute foo.narinfo and foo.nar.  Return
>      200 only when both are ready.
>
>      The precomputed nar{,info}s would be kept in a cache and we could
>      make sure a narinfo and its nar have the same lifetime, which
>      addresses one of the problems we have.
>
>      pros: better HTTP latency and bandwidth
>      pros: allows us to add a Content-Length for nars
>      pros: helps keep narinfo/nar lifetime in sync
>      cons: doesn’t reduce load on hydra.gnu.org
>      cons: exposes inconsistency between the store contents and the HTTP
>            response (you may get 404 even if the thing is actually in
>            store), but maybe that’s not a problem

Implemented in commit 00753f7038234a0f5a79be3ec9ab949840a18743.

I’ll set up a test instance shortly.

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Wed, 19 Apr 2017 14:26:02 GMT) Full text and rfc822 format available.

Message #96 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Mark H Weaver <mhw <at> netris.org>
Cc: Tobias Geerinckx-Rice <me <at> tobias.gr>, 26201 <at> debbugs.gnu.org,
 guix-sysadmin <at> gnu.org
Subject: Heads-up: hydra.gnu.org uses ‘guix publish
 --cache’
Date: Wed, 19 Apr 2017 16:24:46 +0200
ludo <at> gnu.org (Ludovic Courtès) skribis:

> ludo <at> gnu.org (Ludovic Courtès) skribis:
>
>>   2. Produce a narinfo and corresponding nar the first time they are
>>      requested.  So, the first time we receive “GET foo.narinfo”, return
>>      404 and spawn a thread to compute foo.narinfo and foo.nar.  Return
>>      200 only when both are ready.
>>
>>      The precomputed nar{,info}s would be kept in a cache and we could
>>      make sure a narinfo and its nar have the same lifetime, which
>>      addresses one of the problems we have.
>>
>>      pros: better HTTP latency and bandwidth
>>      pros: allows us to add a Content-Length for nars
>>      pros: helps keep narinfo/nar lifetime in sync
>>      cons: doesn’t reduce load on hydra.gnu.org
>>      cons: exposes inconsistency between the store contents and the HTTP
>>            response (you may get 404 even if the thing is actually in
>>            store), but maybe that’s not a problem
>
> Implemented in commit 00753f7038234a0f5a79be3ec9ab949840a18743.
>
> I’ll set up a test instance shortly.

I ended up deploying it on hydra.gnu.org directly.  :-)

Progressively the cached nar/narinfo at {,mirror.}hydra.gnu.org will be
replaced with the new ones.  Now, the /guix/nar URLs have a
‘Content-Length’ header you should see a progress bar when downloading
one of these:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix build vim
The following file will be downloaded:
   /gnu/store/ax5cm9gr1741pcq17w7bhgss5nvq5470-vim-8.0.0566
@ substituter-started /gnu/store/ax5cm9gr1741pcq17w7bhgss5nvq5470-vim-8.0.0566 /gnu/store/rnpz1svz4aw75kibb5qb02hhccy2m4y0-guix-0.12.0-7.aabe/libexec/guix/substitute
Downloading https://mirror.hydra.gnu.org/guix/nar/gzip/ax5cm9gr1741pcq17w7bhgss5nvq5470-vim-8.0.0566 (23.4MiB installed)...
 vim-8.0.0566  7.8MiB                                      385KiB/s 00:21 [####################] 100.0%

@ substituter-succeeded /gnu/store/ax5cm9gr1741pcq17w7bhgss5nvq5470-vim-8.0.0566
/gnu/store/ax5cm9gr1741pcq17w7bhgss5nvq5470-vim-8.0.0566
--8<---------------cut here---------------end--------------->8---

This new caching scheme should put an end to caching of truncated nars
in nginx, which has been too frequent lately.

It should also mostly avoid the problem where we have a narinfo for
something but not the corresponding nar, which leads to user frustration
(‘guix’ reports that the thing will be downloaded and eventually fails
with 410 “Gone” while trying to download it), because ‘guix publish’
caches narinfo/nar pairs together.  I say “mostly” because nginx caching
in front of ‘guix publish’ makes things more complicated.

The bandwidth issue reported at the beginning of this thread should be
mostly fixed: serving a narinfo or nar URL is now just sendfile(2),
which is the best we can do; 404s on narinfo should be immediate.

Of course, when the machine is overloaded, we’ll still experience
increased latency and lower bandwidth, but that should be less acute
than with the previous setting.

Please report any problems you may have!

Ludo’.




Added tag(s) fixed. Request was from ludo <at> gnu.org (Ludovic Courtès) to control <at> debbugs.gnu.org. (Tue, 25 Apr 2017 10:12:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 26201 <at> debbugs.gnu.org and <dian_cecht <at> zoho.com> Request was from ludo <at> gnu.org (Ludovic Courtès) to control <at> debbugs.gnu.org. (Tue, 25 Apr 2017 10:12:04 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Wed, 03 May 2017 08:12:01 GMT) Full text and rfc822 format available.

Message #103 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: Tobias Geerinckx-Rice <me <at> tobias.gr>
Cc: 26201 <at> debbugs.gnu.org, guix-sysadmin <at> gnu.org
Subject: Re: bug#26201: hydra.gnu.org uses ‘guix publish’ for nars and narinfos
Date: Wed, 03 May 2017 04:11:31 -0400
Reviving an old thread...

Tobias Geerinckx-Rice <me <at> tobias.gr> writes:

>> IMO, the best solution is to *never* generate nars on Hydra in response
>> to client requests, but rather to have the build slaves pack and
>> compress the nars, copy them to Hydra, and then serve them as static
>> files using nginx.
>
> A true mirror at last! Do we have the disc space for that?
>
> And could Hydra actually handle compressing *everything*, without an
> infinitely growing back-log? I don't have access to any statistics, but
> I'm guessing that a fair number of package+versions are never actually
> requested, and hence never compressed. This would change that.

Actually, IIUC, the build slaves are _already_ compressing everything,
and they always have.  They compress the build outputs for transmission
back to the master machine.  In the current framework, the master
machine immediately decompresses them upon receipt, and this compression
and decompression is considered an internal detail of the network
transport.

Currently, the master machine stores all build outputs uncompressed in
/gnu/store, and then later recompresses them for transmission to users
and other build slaves.  The needless decompression and recompression is
a tremendous amount of wasted work on our master machine.  That it's all
stored uncompressed is also a significant waste of disk space, which
leads to significant additional costs during garbage collection.

Essentially, my proposal is for the build slaves to be modified to
prepare the compressed NARs in a form suitable for delivery to end users
(and other build slaves) with minimal processing by our master node.
The master node would be significantly modified to receive, store, and
forward NARs explicitly, without ever decompressing them.  As far as I
can tell, this would mean strictly less work to do and less data to
store for every machine and in every case.

Ludovic has pointed out that we cannot do this because Hydra must add
its digital signature, and that this digital signature is stored within
the compressed NAR.  Therefore, we cannot avoid having the master
machine decompress and recompress every NAR that is delivered to users.

In my opinion, we should change the way we sign NARs.  Signatures should
be external to the NARs, not internal.  Not only would this allow us to
decentralize production of our NARs, but more importantly, it would
enable a community of independent builders to add their signatures to a
common pool of NARs.  Having a common pool of NARs enables us to store
these NARs in a shared distribution network without duplication.  We
cannot even have a common pool of NARs if they contain
build-farm-specific data such as signatures.

Thoughts?

      Mark




Information forwarded to bug-guix <at> gnu.org:
bug#26201; Package guix. (Wed, 03 May 2017 09:26:01 GMT) Full text and rfc822 format available.

Message #106 received at 26201 <at> debbugs.gnu.org (full text, mbox):

From: ludo <at> gnu.org (Ludovic Courtès)
To: Mark H Weaver <mhw <at> netris.org>
Cc: Tobias Geerinckx-Rice <me <at> tobias.gr>, 26201 <at> debbugs.gnu.org,
 guix-sysadmin <at> gnu.org
Subject: Re: bug#26201: hydra.gnu.org uses ‘guix publish’ for nars and narinfos
Date: Wed, 03 May 2017 11:25:38 +0200
Hello,

Mark H Weaver <mhw <at> netris.org> skribis:

> Actually, IIUC, the build slaves are _already_ compressing everything,
> and they always have.  They compress the build outputs for transmission
> back to the master machine.  In the current framework, the master
> machine immediately decompresses them upon receipt, and this compression
> and decompression is considered an internal detail of the network
> transport.
>
> Currently, the master machine stores all build outputs uncompressed in
> /gnu/store, and then later recompresses them for transmission to users
> and other build slaves.  The needless decompression and recompression is
> a tremendous amount of wasted work on our master machine.  That it's all
> stored uncompressed is also a significant waste of disk space, which
> leads to significant additional costs during garbage collection.
>
> Essentially, my proposal is for the build slaves to be modified to
> prepare the compressed NARs in a form suitable for delivery to end users
> (and other build slaves) with minimal processing by our master node.
> The master node would be significantly modified to receive, store, and
> forward NARs explicitly, without ever decompressing them.  As far as I
> can tell, this would mean strictly less work to do and less data to
> store for every machine and in every case.

I agree that the redundant compression/decompression is terrible.  Yet
I’m not sure how to architect a solution where compression is performed
by build machines.  The main issue is that offloading and publication
are two independent mechanisms, as things are.

Maybe each build machine for a build farm use-case we could have a
“semi-offloading” mechanism whereby the master spawns a remote build
without retrieving its result, something akin to:

  GUIX_DAEMON_SOCKET=ssh://build-machine.example.org \
  guix build /gnu/store/…-foo.drv

In addition, the build machine would publish its result via ‘guix
publish’, which the master could then simply mirror and cache with
nginx.

There’s the issue of signatures, but perhaps we could have a more
sophisticated PKI and have the master delegate to build machines…

Then there are other issues such as that of synchronizing the TTL of a
narinfo and its corresponding nar, which --cache addresses.

Tricky!

> Ludovic has pointed out that we cannot do this because Hydra must add
> its digital signature, and that this digital signature is stored within
> the compressed NAR.  Therefore, we cannot avoid having the master
> machine decompress and recompress every NAR that is delivered to users.
>
> In my opinion, we should change the way we sign NARs.  Signatures should
> be external to the NARs, not internal.  Not only would this allow us to
> decentralize production of our NARs, but more importantly, it would
> enable a community of independent builders to add their signatures to a
> common pool of NARs.  Having a common pool of NARs enables us to store
> these NARs in a shared distribution network without duplication.  We
> cannot even have a common pool of NARs if they contain
> build-farm-specific data such as signatures.

Currently the signature is in the narinfos, not in nars proper¹.  So we
can already add signatures on an externally provided nar, for instance.

There’s a silly limitation currently, which is that the signature is
computed over all the fields of the narinfo.  That’s silly because it
means that if you change, say, the compression format or the URL of the
nar, then the signature becomes invalid.  We should fix that at some
point.

Ludo’.

¹ For ‘guix publish’.  ‘guix archive --export’ appends a signature to
  the nar set.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 31 May 2017 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 325 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.