GNU bug report logs - #48468
substitute server connection timeout

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: guix; Severity: important; Reported by: Mathieu Othacehe <othacehe@HIDDEN>; dated Sun, 16 May 2021 17:58:02 UTC; Maintainer for guix is bug-guix@HIDDEN.

Message received at 48468 <at> debbugs.gnu.org:


Received: (at 48468) by debbugs.gnu.org; 10 Dec 2022 10:56:04 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Dec 10 05:56:04 2022
Received: from localhost ([127.0.0.1]:42078 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1p3xWV-00067e-Us
	for submit <at> debbugs.gnu.org; Sat, 10 Dec 2022 05:56:04 -0500
Received: from eggs.gnu.org ([209.51.188.92]:48908)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1p3xWU-00067H-QX
 for 48468 <at> debbugs.gnu.org; Sat, 10 Dec 2022 05:56:03 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>) id 1p3xWP-0002VM-5W
 for 48468 <at> debbugs.gnu.org; Sat, 10 Dec 2022 05:55:57 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To:
 From; bh=YD/ILx6kPDmpu3G7BGvmOwwpa1gjGAnaH2rwXlq5sdM=; b=LYilsh4IhxJCfcKZreWs
 0KQ1aIAC6wv3DPn/lzIDFns7L03mwsN3Utgn341YHPfKpl/Mj9IxP3o1zsVlFiru4L52usD/dRpzC
 qoPKcKY8Wn+S4LzjUdKqfuLwevKkmXgn9f0bygC8n/QxrQGq5M32jE2lEaDVaz4KXthUapUjfXEyL
 oqTbW00IhxTGIsHI44dDtGmkaoAmpFdfc4Hi07oe0dq+SL98UNBt7zexZImcMLIvs0qyFKCT0rtEx
 GThPh85vw9IVR0rja8uwKqCSWLDtEo+/XRRL3jVCyNmkcKzZFE4siS0MF9IGjH2imeSBf98SwO8O4
 ERGw6somlMXViQ==;
Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201] helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>)
 id 1p3xWM-0004Fc-S8; Sat, 10 Dec 2022 05:55:56 -0500
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: Mathieu Othacehe <othacehe@HIDDEN>
Subject: Re: bug#48468: substitute server connection timeout
References: <87lf8e4l42.fsf@HIDDEN> <87im2bfj3i.fsf@HIDDEN>
 <87sg108vl5.fsf@HIDDEN> <87lenmrnos.fsf@HIDDEN>
 <87cz8v8s66.fsf@HIDDEN> <87cz8vjpv2.fsf@HIDDEN>
 <878rjj8eun.fsf@HIDDEN> <87wn72fay3.fsf@HIDDEN>
Date: Sat, 10 Dec 2022 11:55:52 +0100
In-Reply-To: <87wn72fay3.fsf@HIDDEN> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?=
 =?utf-8?Q?s?= message of "Thu, 08 Dec 2022 11:26:28 +0100")
Message-ID: <87fsdn5xzb.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 48468
Cc: 48468 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hi,

Ludovic Court=C3=A8s <ludo@HIDDEN> skribis:

> Mathieu Othacehe <othacehe@HIDDEN> skribis:

[...]

>> curl https://ci.guix.gnu.org/lwv2pl0m6dkf6bkzip755w5p71g5akq4.narinfo
>> We're baking it: /lwv2pl0m6dkf6bkzip755w5p71g5akq4.narinfo
>
> Yes, but derivations (.drv) are depend only on =E2=80=9Csources=E2=80=9D =
(like
> *-guile-builder, *.patch, *.scm) and on other derivations, all of which
> are typically less than 1=C2=A0MiB.
>
> Derivations don=E2=80=99t depend on derivation outputs like that of qemu-=
minimal
> above.
>
>> So any build that requires a heavy substitute, heavier than the cache
>> bypass threshold at least, will fail on the workers, as it would fail
>> locally.
>
> The main =E2=80=98guix publish=E2=80=99 instance on berlin has cache-bypa=
ss-threshold
> set to 100=C2=A0MiB; I don=E2=80=99t think we have any source or .drv tha=
t is this
> big, or even a tenth of it.  :-)

On closer inspection, we do get bigger sources for those system test
derivations:

--8<---------------cut here---------------start------------->8---
$ du  $(guix gc -R /gnu/store/2laznragkfjf0jn6n7b8c53vgmb8maqp-installed-ex=
tlinux-os.drv ) |sort -k1 -n |tail
6304	/gnu/store/mvvf60w8p0lx8vlixryb1a1gfib401rf-guix-b129026/gnu/packages
6956	/gnu/store/9cs4i3hygldsan8ahf0p7x5kgzmqz9gr-guix-packages-base-source/=
gnu/packages/patches
7000	/gnu/store/mvvf60w8p0lx8vlixryb1a1gfib401rf-guix-b129026/gnu
7716	/gnu/store/z0gc56x1w83wn0pi4mz8nhyjdr0d10fs-guix
12352	/gnu/store/mvvf60w8p0lx8vlixryb1a1gfib401rf-guix-b129026
22548	/gnu/store/ky96vygv1hpa8iz6677qz96binbz201h-packages
22684	/gnu/store/rrg9ki4bdjffvmgapjcqwjwjwkhb0qhl-doc
39516	/gnu/store/9cs4i3hygldsan8ahf0p7x5kgzmqz9gr-guix-packages-base-source=
/gnu/packages
39524	/gnu/store/9cs4i3hygldsan8ahf0p7x5kgzmqz9gr-guix-packages-base-source=
/gnu
39528	/gnu/store/9cs4i3hygldsan8ahf0p7x5kgzmqz9gr-guix-packages-base-source
--8<---------------cut here---------------end--------------->8---

The items at the bottom weigh around 40=C2=A0MiB.

That=E2=80=99s still below the 100=C2=A0MiB cache bypass threshold of the m=
ain =E2=80=98guix
publish=E2=80=99 instance though.

Ludo=E2=80=99.




Information forwarded to bug-guix@HIDDEN:
bug#48468; Package guix. Full text available.

Message received at 48468 <at> debbugs.gnu.org:


Received: (at 48468) by debbugs.gnu.org; 8 Dec 2022 10:26:39 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Dec 08 05:26:39 2022
Received: from localhost ([127.0.0.1]:55883 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1p3E6x-0008Nw-3E
	for submit <at> debbugs.gnu.org; Thu, 08 Dec 2022 05:26:39 -0500
Received: from eggs.gnu.org ([209.51.188.92]:58260)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1p3E6v-0008Np-Cb
 for 48468 <at> debbugs.gnu.org; Thu, 08 Dec 2022 05:26:38 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>) id 1p3E6q-0008Pw-3l
 for 48468 <at> debbugs.gnu.org; Thu, 08 Dec 2022 05:26:32 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To:
 From; bh=1OwmOglqgnVtHkHRwkKwAXDWfPcjCygSAB5v1BC1iP4=; b=V9Q4lME8a3zVPYYVrWX/
 1qs32wxPu+w+D0j8tXPsS00VxuHKpF4Qt60HhOnckfpnH8p6zDiXjM55pS5CP7Oj/3i/gORGlVlpg
 dIAoo3BOFxGbhHW9o5E9q0wclFR6o5lZ1YIi8nYdoAhronnrdY5kXXxpsGoiG7vZSm1j5LrzAppEl
 lc1t8rMLXzqRnbW9MNP7KPDbJDTWxqzGHzLeHr+8IkaOQikBJLT79P1wO8kML12Np0lAXV2t4qMQl
 n7nvKVAtNLVFKgQbLInM6Qtm958LFKKPOauNC1kuiW7WYnvNY+j00GZv1rcpukgg8I4ZCt3hqx4Eu
 5SRu729RBVB0Lw==;
Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201] helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>)
 id 1p3E6p-0003x0-Gz; Thu, 08 Dec 2022 05:26:31 -0500
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: Mathieu Othacehe <othacehe@HIDDEN>
Subject: Re: bug#48468: substitute server connection timeout
References: <87lf8e4l42.fsf@HIDDEN> <87im2bfj3i.fsf@HIDDEN>
 <87sg108vl5.fsf@HIDDEN> <87lenmrnos.fsf@HIDDEN>
 <87cz8v8s66.fsf@HIDDEN> <87cz8vjpv2.fsf@HIDDEN>
 <878rjj8eun.fsf@HIDDEN>
X-URL: http://www.fdn.fr/~lcourtes/
X-Revolutionary-Date: Octidi 18 Frimaire an 231 de la =?utf-8?Q?R=C3=A9vol?=
 =?utf-8?Q?ution=2C?= jour du Lierre
X-PGP-Key-ID: 0x090B11993D9AEBB5
X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc
X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4  0CFB 090B 1199 3D9A EBB5
X-OS: x86_64-pc-linux-gnu
Date: Thu, 08 Dec 2022 11:26:28 +0100
In-Reply-To: <878rjj8eun.fsf@HIDDEN> (Mathieu Othacehe's message of "Wed, 07
 Dec 2022 15:31:44 +0100")
Message-ID: <87wn72fay3.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 48468
Cc: 48468 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hi,

Mathieu Othacehe <othacehe@HIDDEN> skribis:

>> /gnu/store/qmzr030rzgikdxv3g9msqv0l8qp5j6y2-btrfs-raid-root-os.drv,
>> which was marked as failed earlier today due to missing .drv.  It=E2=80=
=99s a
>> 4KiB file, and the cache-bypass-threshold is =E2=80=98guix publish=E2=80=
=99 is typically
>> set to something much higher than that.  So =E2=80=98guix publish=E2=80=
=99 won=E2=80=99t return
>> 404 in that case.
>
> Yes but that derivation also depends on other derivations, for instance
> qemu-minimal and if I try:
>
> mathieu@berlin ~$ guix build qemu-minimal
> ...
> /gnu/store/lwv2pl0m6dkf6bkzip755w5p71g5akq4-qemu-minimal-7.1.0
>
>
> and then, from my machine.
>
> curl https://ci.guix.gnu.org/lwv2pl0m6dkf6bkzip755w5p71g5akq4.narinfo
> We're baking it: /lwv2pl0m6dkf6bkzip755w5p71g5akq4.narinfo

Yes, but derivations (.drv) are depend only on =E2=80=9Csources=E2=80=9D (l=
ike
*-guile-builder, *.patch, *.scm) and on other derivations, all of which
are typically less than 1=C2=A0MiB.

Derivations don=E2=80=99t depend on derivation outputs like that of qemu-mi=
nimal
above.

> So any build that requires a heavy substitute, heavier than the cache
> bypass threshold at least, will fail on the workers, as it would fail
> locally.

The main =E2=80=98guix publish=E2=80=99 instance on berlin has cache-bypass=
-threshold
set to 100=C2=A0MiB; I don=E2=80=99t think we have any source or .drv that =
is this
big, or even a tenth of it.  :-)

The =E2=80=98guix publish=E2=80=99 instance spawned in (cuirass remote) doe=
sn=E2=80=99t use
caching at all, so there=E2=80=99s no bypass threshold.

The only clue we have so far is a long chain of successful .drv
substitutions followed by a failing one:

--8<---------------cut here---------------start------------->8---
Downloading http://141.80.167.131/nar/zstd/9nm35401i8j42559iadi3iz3kmhmj7pr=
-guix-system-tests.drv...

[K guix-system-tests.drv  1KiB             0B/s 00:00 [                  ] =
  0.0%
[K guix-system-tests.drv  1KiB         1.3MiB/s 00:00 [##################] =
100.0%
[K guix-system-tests.drv  1KiB         408KiB/s 00:00 [##################] =
100.0%

@ substituter-succeeded /gnu/store/9nm35401i8j42559iadi3iz3kmhmj7pr-guix-sy=
stem-tests.drv
fetching path `/gnu/store/9zf6inizcb09m136c44dj35fmzf2g3hs-guix-system-test=
s-modules.drv'...
@ substituter-started /gnu/store/9zf6inizcb09m136c44dj35fmzf2g3hs-guix-syst=
em-tests-modules.drv substitute
Downloading http://141.80.167.131/nar/zstd/9zf6inizcb09m136c44dj35fmzf2g3hs=
-guix-system-tests-modules.drv...

[K guix-system-tests-modules.drv  543B     0B/s 00:00 [                  ] =
  0.0%
[K guix-system-tests-modules.drv  543B 851KiB/s 00:00 [##################] =
100.0%
[K guix-system-tests-modules.drv  543B 484KiB/s 00:00 [##################] =
100.0%

@ substituter-succeeded /gnu/store/9zf6inizcb09m136c44dj35fmzf2g3hs-guix-sy=
stem-tests-modules.drv
cannot build missing derivation ?/gnu/store/zq7idl0j51fdzqhhqm9ql90d0f2326k=
7-btrfs-root-on-subvolume-os.drv?
--8<---------------cut here---------------end--------------->8---

(From <https://ci.guix.gnu.org/build/224849/log/raw>.)

Plausible explanations that come to mind:

  1. =E2=80=98guix publish=E2=80=99 returning 404, but not due to baking.  =
Instead the
     .drv is simply not in store, hence 404.

  2. Client timeout (=E2=80=98guix publish=E2=80=99 fails to reply on time).

  3. Cached 404 in nginx (though we=E2=80=99re not supposed to cache those I
     think?), or timeout in nginx (again due to =E2=80=98guix publish=E2=80=
=99 being too
     slow and =E2=80=98proxy_read_timeout=E2=80=99 is reached, currently 10=
s).

We should see if we can get useful info from nginx/publish logs.

Thanks,
Ludo=E2=80=99.




Information forwarded to bug-guix@HIDDEN:
bug#48468; Package guix. Full text available.

Message received at 48468 <at> debbugs.gnu.org:


Received: (at 48468) by debbugs.gnu.org; 7 Dec 2022 14:32:14 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Dec 07 09:32:14 2022
Received: from localhost ([127.0.0.1]:50524 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1p2vT3-0001UP-Ki
	for submit <at> debbugs.gnu.org; Wed, 07 Dec 2022 09:32:14 -0500
Received: from eggs.gnu.org ([209.51.188.92]:59208)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <othacehe@HIDDEN>) id 1p2vT1-0001UJ-S1
 for 48468 <at> debbugs.gnu.org; Wed, 07 Dec 2022 09:32:12 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <othacehe@HIDDEN>)
 id 1p2vSv-0006eU-Fe; Wed, 07 Dec 2022 09:32:05 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To:
 From; bh=+QIVLCTr0wnscpqQ49rkieXQI15PIBR56NfsJYUJpWc=; b=CM3BAH5dr+8eOgJD7zhX
 vS5iPZtcvhuXiW6rJ4xeoeE32LixDkl6KPlcha2IbJv/hVOCGtN405dJeiWeUk8RQQw5JUN7oNEsB
 wuLEjM0o9UYiafKwIhMB1YZeT0Lc/nw6qPxfA30MQEe6YkjeovMmb3jxnbR+TpL8wWIcZb/sjqYWP
 my/3fwmI8bnyc+eAeNX5IozFXoP3Ukqw7lQYExEMpc+j3pfvceSSmzyNVARj7PfA9P5Lc1vX+X52c
 6zipxQWcIgGbkXd4jtWhvd/OwGtzH9PopfxcM+Qo6Bh3iglTLrRuPS9907ixBt+wzajrdvFNJUqfV
 aEmcFFPliIUB6w==;
Received: from 2a02-8429-81d2-3d01-94c9-8097-ea5c-2774.rev.sfr.net
 ([2a02:8429:81d2:3d01:94c9:8097:ea5c:2774] helo=meije)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <othacehe@HIDDEN>)
 id 1p2vSf-0005IU-4F; Wed, 07 Dec 2022 09:32:03 -0500
From: Mathieu Othacehe <othacehe@HIDDEN>
To: Ludovic =?utf-8?Q?Court=C3=A8s?= <ludo@HIDDEN>
Subject: Re: bug#48468: substitute server connection timeout
References: <87lf8e4l42.fsf@HIDDEN> <87im2bfj3i.fsf@HIDDEN>
 <87sg108vl5.fsf@HIDDEN> <87lenmrnos.fsf@HIDDEN>
 <87cz8v8s66.fsf@HIDDEN> <87cz8vjpv2.fsf@HIDDEN>
Date: Wed, 07 Dec 2022 15:31:44 +0100
In-Reply-To: <87cz8vjpv2.fsf@HIDDEN> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?=
 =?utf-8?Q?s?= message of "Wed, 07 Dec 2022 14:38:25 +0100")
Message-ID: <878rjj8eun.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 48468
Cc: 48468 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)


Hello,

> /gnu/store/qmzr030rzgikdxv3g9msqv0l8qp5j6y2-btrfs-raid-root-os.drv,
> which was marked as failed earlier today due to missing .drv.  It=E2=80=
=99s a
> 4KiB file, and the cache-bypass-threshold is =E2=80=98guix publish=E2=80=
=99 is typically
> set to something much higher than that.  So =E2=80=98guix publish=E2=80=
=99 won=E2=80=99t return
> 404 in that case.

Yes but that derivation also depends on other derivations, for instance
qemu-minimal and if I try:

--8<---------------cut here---------------start------------->8---
mathieu@berlin ~$ guix build qemu-minimal
...
/gnu/store/lwv2pl0m6dkf6bkzip755w5p71g5akq4-qemu-minimal-7.1.0
--8<---------------cut here---------------end--------------->8---

and then, from my machine.

--8<---------------cut here---------------start------------->8---
curl https://ci.guix.gnu.org/lwv2pl0m6dkf6bkzip755w5p71g5akq4.narinfo
We're baking it: /lwv2pl0m6dkf6bkzip755w5p71g5akq4.narinfo
--8<---------------cut here---------------end--------------->8---

wget exhibits the same behaviour and returns 404.

So any build that requires a heavy substitute, heavier than the cache
bypass threshold at least, will fail on the workers, as it would fail
locally.

That's not really a surprise as baking substitutes takes time and there
is a time window between the moment Cuirass triggers NAR baking and the
moment the NAR is baked, where every user will get a 404.

Mathieu




Information forwarded to bug-guix@HIDDEN:
bug#48468; Package guix. Full text available.

Message received at 48468 <at> debbugs.gnu.org:


Received: (at 48468) by debbugs.gnu.org; 7 Dec 2022 13:38:38 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Dec 07 08:38:38 2022
Received: from localhost ([127.0.0.1]:50421 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1p2udC-0000vj-9k
	for submit <at> debbugs.gnu.org; Wed, 07 Dec 2022 08:38:38 -0500
Received: from eggs.gnu.org ([209.51.188.92]:45806)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1p2udA-0000vd-Bd
 for 48468 <at> debbugs.gnu.org; Wed, 07 Dec 2022 08:38:37 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>) id 1p2ud5-0006eW-1B
 for 48468 <at> debbugs.gnu.org; Wed, 07 Dec 2022 08:38:31 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To:
 From; bh=VMpl2g1kW8uavHNeZi6SkTR2Agv4PQwDjU2wki/AnD8=; b=XwMPTTe/8xc9QlNfP/b9
 e+awrw7HWTl7bp9yYqWy6gtXg+4D7v0TBkxRIHfZAW7LwnUk0D933R79koxRm3pmPlbhg4RJ/BPL/
 x0+x9Fb8dmYUOZqNHx1n0ez2u3PdZ1sg5ax7Fqi9BLpDsx1/3Rzq4XOyhYjx7TMh3IIZWAliQx7vr
 iIjUCfMddm0WgzJ9BtcU1/f0mLZnVdxDUojmdXCahyntERo05nV3nJO0NvBQnI6zvnvmKzfnplQ4S
 SFISVBIdVte8laVhNu+IbpOgikPxUpIFTpF911in6naeHirjeG4d8QjRLwB4KG0m1BYyP/8DXi5/a
 PGl1MRmPJtbAQg==;
Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201] helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>)
 id 1p2ud1-0000l0-Na; Wed, 07 Dec 2022 08:38:27 -0500
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: Mathieu Othacehe <othacehe@HIDDEN>
Subject: Re: bug#48468: substitute server connection timeout
References: <87lf8e4l42.fsf@HIDDEN> <87im2bfj3i.fsf@HIDDEN>
 <87sg108vl5.fsf@HIDDEN> <87lenmrnos.fsf@HIDDEN>
 <87cz8v8s66.fsf@HIDDEN>
Date: Wed, 07 Dec 2022 14:38:25 +0100
In-Reply-To: <87cz8v8s66.fsf@HIDDEN> (Mathieu Othacehe's message of "Wed, 07
 Dec 2022 10:44:01 +0100")
Message-ID: <87cz8vjpv2.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 48468
Cc: 48468 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hi,

Mathieu Othacehe <othacehe@HIDDEN> skribis:

>> You mentioned on IRC that nginx logs show that =E2=80=98guix publish=E2=
=80=99 times out.
>> Looking at /var/log/nginx/error.log, I see =E2=80=9CConnection reset by =
peer=E2=80=9D
>> and =E2=80=9CBroken pipe=E2=80=9D, which could indicate that the client =
closed the
>> connection (which was open) prematurely, maybe due to an internal
>> timeout.
>
> Could it be that the client is receiving 404 because the baking of some
> NAR was deferred to a worker, and then it closes the connection?

Unlikely.  Take
/gnu/store/qmzr030rzgikdxv3g9msqv0l8qp5j6y2-btrfs-raid-root-os.drv,
which was marked as failed earlier today due to missing .drv.  It=E2=80=99s=
 a
4KiB file, and the cache-bypass-threshold is =E2=80=98guix publish=E2=80=99=
 is typically
set to something much higher than that.  So =E2=80=98guix publish=E2=80=99 =
won=E2=80=99t return
404 in that case.

If you find a =E2=80=9Cmissing derivation=E2=80=9D error in Cuirass, how wo=
uld you
search logs to find what happened?  I=E2=80=99m not sure where to look for
useful debugging info.

Thanks,
Ludo=E2=80=99.




Information forwarded to bug-guix@HIDDEN:
bug#48468; Package guix. Full text available.

Message received at 48468 <at> debbugs.gnu.org:


Received: (at 48468) by debbugs.gnu.org; 7 Dec 2022 09:44:19 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Wed Dec 07 04:44:19 2022
Received: from localhost ([127.0.0.1]:48909 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1p2qyQ-0001cx-SL
	for submit <at> debbugs.gnu.org; Wed, 07 Dec 2022 04:44:19 -0500
Received: from eggs.gnu.org ([209.51.188.92]:60034)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <othacehe@HIDDEN>) id 1p2qyN-0001cr-Av
 for 48468 <at> debbugs.gnu.org; Wed, 07 Dec 2022 04:44:17 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <othacehe@HIDDEN>)
 id 1p2qyH-00045e-Do; Wed, 07 Dec 2022 04:44:09 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To:
 From; bh=9xYEt4HDXobSZZuvXN7OeOueQs4B+Q050cNQ+t23BMU=; b=J5g3m6XVh69+J0o9HdVS
 XyMdTuApKqnAkH2l6dVuJZNt2dxK8ACt99ixsNi02All0LRjr0o7ExMth6hPfTL0a470gmF8opUC6
 ge3ukhNWtnkFQ01kOWb1jLDAIV6uF4bvSOEzKUJm2bbpuh70FtFzR9F9CcoQFZl+V0q7SQR83Ev7E
 b9rFP/dzaSw7M5+f9TAhtSgifKlQDqJsZhhXkPHLhI3eUwYBpnn2J6HVk5Ze9y8YBKcjUDSK9RDal
 7V0uLyqyRNSM6YuiGxZyWT0Ttr17EG9nq47IHUfCjCetWAQgBVOEX162KMIKQ6b1Gpl7dDUK50xvL
 lP52ox9k7f2DBA==;
Received: from 2a02-8429-81d2-3d01-94c9-8097-ea5c-2774.rev.sfr.net
 ([2a02:8429:81d2:3d01:94c9:8097:ea5c:2774] helo=meije)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <othacehe@HIDDEN>)
 id 1p2qyC-0005rX-DF; Wed, 07 Dec 2022 04:44:08 -0500
From: Mathieu Othacehe <othacehe@HIDDEN>
To: Ludovic =?utf-8?Q?Court=C3=A8s?= <ludo@HIDDEN>
Subject: Re: bug#48468: substitute server connection timeout
References: <87lf8e4l42.fsf@HIDDEN> <87im2bfj3i.fsf@HIDDEN>
 <87sg108vl5.fsf@HIDDEN> <87lenmrnos.fsf@HIDDEN>
Date: Wed, 07 Dec 2022 10:44:01 +0100
In-Reply-To: <87lenmrnos.fsf@HIDDEN> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?=
 =?utf-8?Q?s?= message of "Mon, 05 Dec 2022 14:21:23 +0100")
Message-ID: <87cz8v8s66.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 48468
Cc: 48468 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)


Hello,

> You mentioned on IRC that nginx logs show that =E2=80=98guix publish=E2=
=80=99 times out.
> Looking at /var/log/nginx/error.log, I see =E2=80=9CConnection reset by p=
eer=E2=80=9D
> and =E2=80=9CBroken pipe=E2=80=9D, which could indicate that the client c=
losed the
> connection (which was open) prematurely, maybe due to an internal
> timeout.

Could it be that the client is receiving 404 because the baking of some
NAR was deferred to a worker, and then it closes the connection?

I think that's what I had in mind with the patch 2/2 of this patchset:
https://issues.guix.gnu.org/50040.

Thanks,

Mathieu




Information forwarded to bug-guix@HIDDEN:
bug#48468; Package guix. Full text available.

Message received at 48468 <at> debbugs.gnu.org:


Received: (at 48468) by debbugs.gnu.org; 5 Dec 2022 13:21:36 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Dec 05 08:21:36 2022
Received: from localhost ([127.0.0.1]:35994 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1p2BPc-0000p9-4z
	for submit <at> debbugs.gnu.org; Mon, 05 Dec 2022 08:21:36 -0500
Received: from eggs.gnu.org ([209.51.188.92]:42810)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1p2BPa-0000p3-M6
 for 48468 <at> debbugs.gnu.org; Mon, 05 Dec 2022 08:21:35 -0500
Received: from fencepost.gnu.org ([2001:470:142:3::e])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>) id 1p2BPS-0007KC-PN
 for 48468 <at> debbugs.gnu.org; Mon, 05 Dec 2022 08:21:29 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To:
 From; bh=QdgzazPdBaky6oWPFshw7JxEXLHN3x9Z1jEgar2OWp4=; b=Tq3PRLGnbvwXUXkn+Lk9
 B5HPonNF5S2wASbMfDEfvPQUEYjRYcGs/Vb2UiTKVQVErsa4yeKa947UwMyAdKHYTaU6aUFRUjv8q
 3lc1SYUgsZhpII9RmacW81iqvxh/3CQB0XrS1mipklFkd2P8v0hizfjcnhVjq7J/VzQ5cx4ID5Suq
 Pl+Ot0mHbAlFJ+4M/gHEWdW5TJpQtA4um4Gdow3HWzUzaiarW+PWfQ5ZH78ews2btAbkmYlIAhH1U
 mGgOT5B/0W2wFCaWAdlhv3XEq8uT/Ap1CARcP/lUF6LgrC939OUpe5z7+qWhAouyjr8CK3YnwJC6k
 HZWJ+mNW5gO9sw==;
Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@HIDDEN>)
 id 1p2BPR-0006qF-UV; Mon, 05 Dec 2022 08:21:26 -0500
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: Mathieu Othacehe <othacehe@HIDDEN>
Subject: Re: bug#48468: substitute server connection timeout
References: <87lf8e4l42.fsf@HIDDEN> <87im2bfj3i.fsf@HIDDEN>
 <87sg108vl5.fsf@HIDDEN>
Date: Mon, 05 Dec 2022 14:21:23 +0100
In-Reply-To: <87sg108vl5.fsf@HIDDEN> (Mathieu Othacehe's message of "Tue, 29
 Jun 2021 18:49:26 +0200")
Message-ID: <87lenmrnos.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 48468
Cc: 48468 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hi,

Mathieu Othacehe <othacehe@HIDDEN> skribis:

> I hoped that it would decrease the load of the Cuirass publish server
> that was flooded by worker connections. While the situation is better,
> there are still a lot of substitute timeout errors on Cuirass.
>
> I have also observed those timeouts using the publish server behind
> ci.guix.gnu.org. This makes me think that there's something else that is
> preventing the server from honoring every connection request in less
> than 5 seconds (the timeout duration).

I don=E2=80=99t see this when substituting from https://ci.guix.gnu.org the=
se
days.

You mentioned on IRC that nginx logs show that =E2=80=98guix publish=E2=80=
=99 times out.
Looking at /var/log/nginx/error.log, I see =E2=80=9CConnection reset by pee=
r=E2=80=9D
and =E2=80=9CBroken pipe=E2=80=9D, which could indicate that the client clo=
sed the
connection (which was open) prematurely, maybe due to an internal
timeout.

What I=E2=80=99d like to know is whether those timeouts you mention appear
during connection establishment (connect(2) on the client side doesn=E2=80=
=99t
complete within 5 seconds) or after connection establishment.

Connection establishment is handled by the kernel and =E2=80=98guix publish=
=E2=80=99 is
not involved.  However, the listen(2) call made in (web server http)
sets a maximum backlog of 128 connections; if at some point 128
connections are already queued, then the 129th client will have
connect(2) take some time.  Seems unlikely, but who knows.

Ludo=E2=80=99.




Information forwarded to bug-guix@HIDDEN:
bug#48468; Package guix. Full text available.

Message received at 48468 <at> debbugs.gnu.org:


Received: (at 48468) by debbugs.gnu.org; 29 Jun 2021 16:49:37 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jun 29 12:49:37 2021
Received: from localhost ([127.0.0.1]:55196 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1lyGvY-00024d-RT
	for submit <at> debbugs.gnu.org; Tue, 29 Jun 2021 12:49:37 -0400
Received: from eggs.gnu.org ([209.51.188.92]:34882)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <othacehe@HIDDEN>) id 1lyGvY-00024S-3x
 for 48468 <at> debbugs.gnu.org; Tue, 29 Jun 2021 12:49:36 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:34964)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <othacehe@HIDDEN>)
 id 1lyGvQ-0001sO-LC; Tue, 29 Jun 2021 12:49:29 -0400
Received: from [2a01:e0a:19b:d9a0:f2f7:a404:c3d3:f8b4] (port=51100 helo=meije)
 by fencepost.gnu.org with esmtpsa
 (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1)
 (envelope-from <othacehe@HIDDEN>)
 id 1lyGvQ-0002tg-DC; Tue, 29 Jun 2021 12:49:28 -0400
From: Mathieu Othacehe <othacehe@HIDDEN>
To: Ludovic =?utf-8?Q?Court=C3=A8s?= <ludo@HIDDEN>
Subject: Re: bug#48468: substitute server connection timeout
References: <87lf8e4l42.fsf@HIDDEN> <87im2bfj3i.fsf@HIDDEN>
Date: Tue, 29 Jun 2021 18:49:26 +0200
In-Reply-To: <87im2bfj3i.fsf@HIDDEN> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?=
 =?utf-8?Q?s?= message of "Fri, 18 Jun 2021 14:33:21 +0200")
Message-ID: <87sg108vl5.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 48468
Cc: 48468 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)


Hey,

> But maybe I=E2=80=99m not looking at the right thing.
>
> Do you have evidence or a reproducer?

Yes, adding the following debug message here:

--8<---------------cut here---------------start------------->8---
--- a/guix/scripts/substitute.scm
+++ b/guix/scripts/substitute.scm
@@ -416,6 +416,7 @@ server certificates."
         ;; and retry.  We might also get 'bad-response or a similar
         ;; exception from (web response) later on, once we've sent the
         ;; request, or a ERROR/INVALID-SESSION from GnuTLS.
+        (pk key args)
         (if (or (and (eq? key 'system-error)
                      (=3D EPIPE (system-error-errno `(,key ,@args))))
                 (and (eq? key 'gnutls-error)
--8<---------------cut here---------------end--------------->8---

and using substitutes from a publish server without keep alive support
(before 0b8fa24), prints something like:

--8<---------------cut here---------------start------------->8---
downloading from http://192.168.1.51:8080/nar/gzip/lkmiyfzi1bq571yn4gypbcs5=
vn4fpma7-texlive-cm-51265 ...
 texlive-cm-51265                                                          =
                                                                           =
                                                                           =
                      7.8MiB/s 00:00 | 1.7MiB transferred

substituting /gnu/store/l788x07ska5vffayz0gayv4hsx5flxal-module-import-comp=
iled...

;;; (bad-header (read-header-line #<eof>))
downloading from http://192.168.1.51:8080/nar/gzip/l788x07ska5vffayz0gayv4h=
sx5flxal-module-import-compiled ...
 module-import-compiled                                                    =
                                                                           =
                                                                           =
                       7.5MiB/s 00:00 | 85KiB transferred

substituting /gnu/store/1s1lrnxlkjwxshk5q2w97ig3clc6n06f-ruby-2.6.5...

;;; (bad-header (read-header-line #<eof>))
--8<---------------cut here---------------end--------------->8---

That's because the connection is closed by the publish server each time
a NAR is sent. This particular behaviour is fixed by:
https://issues.guix.gnu.org/48556.

I hoped that it would decrease the load of the Cuirass publish server
that was flooded by worker connections. While the situation is better,
there are still a lot of substitute timeout errors on Cuirass.

I have also observed those timeouts using the publish server behind
ci.guix.gnu.org. This makes me think that there's something else that is
preventing the server from honoring every connection request in less
than 5 seconds (the timeout duration).

Thanks,

Mathieu




Information forwarded to bug-guix@HIDDEN:
bug#48468; Package guix. Full text available.

Message received at 48468 <at> debbugs.gnu.org:


Received: (at 48468) by debbugs.gnu.org; 18 Jun 2021 12:33:31 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Jun 18 08:33:31 2021
Received: from localhost ([127.0.0.1]:56007 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1luDgh-000360-EP
	for submit <at> debbugs.gnu.org; Fri, 18 Jun 2021 08:33:31 -0400
Received: from eggs.gnu.org ([209.51.188.92]:58296)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1luDgf-00035o-Ah
 for 48468 <at> debbugs.gnu.org; Fri, 18 Jun 2021 08:33:29 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:45758)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <ludo@HIDDEN>) id 1luDga-0002no-4Q
 for 48468 <at> debbugs.gnu.org; Fri, 18 Jun 2021 08:33:24 -0400
Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=54544 helo=ribbon)
 by fencepost.gnu.org with esmtpsa
 (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1)
 (envelope-from <ludo@HIDDEN>)
 id 1luDgZ-0004v0-0B; Fri, 18 Jun 2021 08:33:23 -0400
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: Mathieu Othacehe <othacehe@HIDDEN>
Subject: Re: bug#48468: substitute server connection timeout
References: <87lf8e4l42.fsf@HIDDEN>
Date: Fri, 18 Jun 2021 14:33:21 +0200
In-Reply-To: <87lf8e4l42.fsf@HIDDEN> (Mathieu Othacehe's message of "Sun, 16
 May 2021 19:57:49 +0200")
Message-ID: <87im2bfj3i.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 48468
Cc: 48468 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hi,

Mathieu Othacehe <othacehe@HIDDEN> skribis:

> Investigating it, I found that the connection is closed and opened
> multiple times in the call-with-cached-connection procedure of the (guix
> script substitute) module.
>
> It looks like its because a 'bad-headers exception is raised when trying
> to parse an eof object:
>
> ;;; (error bad-header (read-header-line #<eof>))
>
>
> I'm not sure where this eof comes from. There is this comment in the
> http-multiple-get procedure in (guix http-client):
>
> ;; Swallow networking errors that could occur due to connection reuse
> ;; and the like; they will be handled down the road when trying to
> ;; read responses.
> (false-if-networking-error
>  (begin
>    (for-each (cut write-request <> buffer) batch)
>    (put-bytevector p (get))
>    (force-output p))))
>
> which would suggest that connection reuse could cause networking errors?

Trying to see exactly which bit is at fault here.  First,
=E2=80=98http-multiple-get=E2=80=99 seems to be working as expected when pa=
ssed a fresh
connection:

--8<---------------cut here---------------start------------->8---
scheme@(guix http-client)> (http-multiple-get (string->uri "https://ci.guix=
.gnu.org")
					      (lambda (request response port result)
						(get-bytevector-n port (response-content-length response))
						(cons #t result))
					      '()
					      (make-list
					       5000
					       (build-request (build-uri 'https
										   #:host "ci.guix.gnu.org"
										   #:path "/d7gpr41qpsifri54vp2lzjs99zkylscq.narinfo"))))
connecting (5000 requests left)...connecting (4400 requests left)...connect=
ing (3800 requests left)...connecting (3200 requests left)...connecting (26=
00 requests left)...connecting (2000 requests left)...connecting (1400 requ=
ests left)...connecting (800 requests left)...connecting (200 requests left=
)...$143 =3D (#t #t =E2=80=A6)
scheme@(guix http-client)> (length $143)
$144 =3D 5000
--8<---------------cut here---------------end--------------->8---

But maybe I=E2=80=99m not looking at the right thing.

Do you have evidence or a reproducer?

Thanks,
Ludo=E2=80=99.




Information forwarded to bug-guix@HIDDEN:
bug#48468; Package guix. Full text available.

Message received at 48468 <at> debbugs.gnu.org:


Received: (at 48468) by debbugs.gnu.org; 29 May 2021 21:44:43 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat May 29 17:44:43 2021
Received: from localhost ([127.0.0.1]:57852 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1ln6l8-0004KU-TX
	for submit <at> debbugs.gnu.org; Sat, 29 May 2021 17:44:43 -0400
Received: from eggs.gnu.org ([209.51.188.92]:40360)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1ln6l8-0004KI-BF
 for 48468 <at> debbugs.gnu.org; Sat, 29 May 2021 17:44:42 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:55892)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <ludo@HIDDEN>)
 id 1ln6l3-0001I6-1J; Sat, 29 May 2021 17:44:37 -0400
Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=41666 helo=ribbon)
 by fencepost.gnu.org with esmtpsa
 (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1)
 (envelope-from <ludo@HIDDEN>)
 id 1ln6l2-0003Ey-Nx; Sat, 29 May 2021 17:44:36 -0400
From: =?utf-8?Q?Ludovic_Court=C3=A8s?= <ludo@HIDDEN>
To: Christopher Baines <mail@HIDDEN>
Subject: Re: bug#48468: substitute server connection timeout
References: <87lf8e4l42.fsf@HIDDEN> <87eee6cz75.fsf@HIDDEN>
Date: Sat, 29 May 2021 23:44:35 +0200
In-Reply-To: <87eee6cz75.fsf@HIDDEN> (Christopher Baines's message of
 "Sun, 16 May 2021 19:26:22 +0100")
Message-ID: <87v97143lo.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 48468
Cc: Mathieu Othacehe <othacehe@HIDDEN>, 48468 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

Hi,

Christopher Baines <mail@HIDDEN> skribis:

> Mathieu Othacehe <othacehe@HIDDEN> writes:

[...]

>> Stracing this publish server shows that connection reuse seems to be
>> broken:
>>
>> --8<---------------cut here---------------start------------->8---
>> accept4(9, {sa_family=3DAF_INET, sin_port=3Dhtons(41742), sin_addr=3Dine=
t_addr("141.80.167.185")}, [112->16], 0) =3D 21
>> accept4(9, {sa_family=3DAF_INET, sin_port=3Dhtons(41744), sin_addr=3Dine=
t_addr("141.80.167.185")}, [112->16], 0) =3D 21
>> accept4(9, {sa_family=3DAF_INET, sin_port=3Dhtons(41746), sin_addr=3Dine=
t_addr("141.80.167.185")}, [112->16], 0) =3D 25
>> accept4(9, {sa_family=3DAF_INET, sin_port=3Dhtons(41748), sin_addr=3Dine=
t_addr("141.80.167.185")}, [112->16], 0) =3D 24
>> accept4(9, {sa_family=3DAF_INET, sin_port=3Dhtons(41750), sin_addr=3Dine=
t_addr("141.80.167.185")}, [112->16], 0) =3D 21

Ouch.

>> Investigating it, I found that the connection is closed and opened
>> multiple times in the call-with-cached-connection procedure of the (guix
>> script substitute) module.
>>
>> It looks like its because a 'bad-headers exception is raised when trying
>> to parse an eof object:
>>
>> --8<---------------cut here---------------start------------->8---
>> ;;; (error bad-header (read-header-line #<eof>))
>> --8<---------------cut here---------------end--------------->8---
>>
>> I'm not sure where this eof comes from. There is this comment in the
>> http-multiple-get procedure in (guix http-client):
>>
>> --8<---------------cut here---------------start------------->8---
>> ;; Swallow networking errors that could occur due to connection reuse
>> ;; and the like; they will be handled down the road when trying to
>> ;; read responses.
>> (false-if-networking-error
>>  (begin
>>    (for-each (cut write-request <> buffer) batch)
>>    (put-bytevector p (get))
>>    (force-output p))))
>> --8<---------------cut here---------------end--------------->8---
>>
>> which would suggest that connection reuse could cause networking errors?

[...]

> That's on the server side, the actual problem is probably on the client
> side, as I guess there are possibly places where closed connections
> aren't handled properly.

D=E2=80=99oh.  So we should be able to use it by talking to a bare =E2=80=
=98guix
publish=E2=80=99?  I tried and failed to reproduce it, but that=E2=80=99s p=
robably not
deterministic.

> This reminds me I sent some patches relating to closing connections,
> this could well be related [1].
>
> 1: https://issues.guix.gnu.org/47174

This one will properly close connections when the servers asks for it,
but would that make a difference?

Thanks,
Ludo=E2=80=99.




Information forwarded to bug-guix@HIDDEN:
bug#48468; Package guix. Full text available.
Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 48468 <at> debbugs.gnu.org:


Received: (at 48468) by debbugs.gnu.org; 21 May 2021 13:30:31 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri May 21 09:30:30 2021
Received: from localhost ([127.0.0.1]:36065 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1lk5EU-0001jq-Jk
	for submit <at> debbugs.gnu.org; Fri, 21 May 2021 09:30:30 -0400
Received: from eggs.gnu.org ([209.51.188.92]:38876)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <othacehe@HIDDEN>) id 1lk5EM-0001jV-EN
 for 48468 <at> debbugs.gnu.org; Fri, 21 May 2021 09:30:28 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:43390)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <othacehe@HIDDEN>)
 id 1lk5EH-00057D-04; Fri, 21 May 2021 09:30:17 -0400
Received: from [2a01:e0a:19b:d9a0:f2f7:a404:c3d3:f8b4] (port=50402 helo=meije)
 by fencepost.gnu.org with esmtpsa
 (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1)
 (envelope-from <othacehe@HIDDEN>)
 id 1lk5EG-0003ny-MQ; Fri, 21 May 2021 09:30:16 -0400
From: Mathieu Othacehe <othacehe@HIDDEN>
To: Christopher Baines <mail@HIDDEN>
Subject: Re: bug#48468: substitute server connection timeout
References: <87lf8e4l42.fsf@HIDDEN> <87eee6cz75.fsf@HIDDEN>
 <87eee59zzm.fsf@HIDDEN> <878s4c85zu.fsf@HIDDEN>
Date: Fri, 21 May 2021 15:30:15 +0200
In-Reply-To: <878s4c85zu.fsf@HIDDEN> (Mathieu Othacehe's message of "Tue, 18
 May 2021 16:35:17 +0200")
Message-ID: <87cztkqkns.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 48468
Cc: ludo@HIDDEN, 48468 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)


Hey,

I posted a patchset adding keep-alive support to guix publish earlier:
https://issues.guix.gnu.org/48556.

Thanks,

Mathieu




Information forwarded to bug-guix@HIDDEN:
bug#48468; Package guix. Full text available.

Message received at 48468 <at> debbugs.gnu.org:


Received: (at 48468) by debbugs.gnu.org; 18 May 2021 14:35:34 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue May 18 10:35:34 2021
Received: from localhost ([127.0.0.1]:56847 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1lj0oo-0007lD-C0
	for submit <at> debbugs.gnu.org; Tue, 18 May 2021 10:35:34 -0400
Received: from eggs.gnu.org ([209.51.188.92]:57368)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <othacehe@HIDDEN>) id 1lj0ol-0007kl-Bf
 for 48468 <at> debbugs.gnu.org; Tue, 18 May 2021 10:35:31 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:37814)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <othacehe@HIDDEN>)
 id 1lj0of-0005tX-Go; Tue, 18 May 2021 10:35:25 -0400
Received: from [2a01:e0a:19b:d9a0:f2f7:a404:c3d3:f8b4] (port=34756 helo=meije)
 by fencepost.gnu.org with esmtpsa
 (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1)
 (envelope-from <othacehe@HIDDEN>)
 id 1lj0oe-0002Rn-CL; Tue, 18 May 2021 10:35:24 -0400
From: Mathieu Othacehe <othacehe@HIDDEN>
To: Christopher Baines <mail@HIDDEN>
Subject: Re: bug#48468: substitute server connection timeout
References: <87lf8e4l42.fsf@HIDDEN> <87eee6cz75.fsf@HIDDEN>
 <87eee59zzm.fsf@HIDDEN>
Date: Tue, 18 May 2021 16:35:17 +0200
In-Reply-To: <87eee59zzm.fsf@HIDDEN> (Mathieu Othacehe's message of "Mon, 17
 May 2021 16:49:49 +0200")
Message-ID: <878s4c85zu.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 48468
Cc: Ludovic =?utf-8?Q?Court?= =?utf-8?Q?=C3=A8s?= <ludo@HIDDEN>,
 bug-guix@HIDDEN, 48468 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)


Hey,

> I'll have a closer look, thanks for your help.

So this snippet in the http-write procedure of the (guix scripts
publish) module:

--8<---------------cut here---------------start------------->8---
          (swallow-zlib-error
           (close-port port))
--8<---------------cut here---------------end--------------->8---

is closing the client port unconditionally, which means that guix
publish cannot keep connections alive, unless sitting behind an Nginx
proxy.

I'm trying to turn the close-port call into a maybe-close-port with the
following procedure:

--8<---------------cut here---------------start------------->8---
(define (maybe-close-port port)
    (cond
     ((keep-alive? response)
      (poll-set-add! (http-poll-set server) port *events*))
     (else
      (close-port port))))
--8<---------------cut here---------------end--------------->8---

however this is terribly hacky, as I need to access the private poll-set
from (web server http).

Ludo, do you have a better idea?

Thanks,

Mathieu




Information forwarded to bug-guix@HIDDEN:
bug#48468; Package guix. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 18 May 2021 14:35:30 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue May 18 10:35:30 2021
Received: from localhost ([127.0.0.1]:56844 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1lj0ok-0007kx-5f
	for submit <at> debbugs.gnu.org; Tue, 18 May 2021 10:35:30 -0400
Received: from lists.gnu.org ([209.51.188.17]:35134)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <othacehe@HIDDEN>) id 1lj0og-0007km-Ml
 for submit <at> debbugs.gnu.org; Tue, 18 May 2021 10:35:28 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:51018)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <othacehe@HIDDEN>) id 1lj0og-0003Ay-F8
 for bug-guix@HIDDEN; Tue, 18 May 2021 10:35:26 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:37814)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <othacehe@HIDDEN>)
 id 1lj0of-0005tX-Go; Tue, 18 May 2021 10:35:25 -0400
Received: from [2a01:e0a:19b:d9a0:f2f7:a404:c3d3:f8b4] (port=34756 helo=meije)
 by fencepost.gnu.org with esmtpsa
 (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1)
 (envelope-from <othacehe@HIDDEN>)
 id 1lj0oe-0002Rn-CL; Tue, 18 May 2021 10:35:24 -0400
From: Mathieu Othacehe <othacehe@HIDDEN>
To: Christopher Baines <mail@HIDDEN>
Subject: Re: bug#48468: substitute server connection timeout
References: <87lf8e4l42.fsf@HIDDEN> <87eee6cz75.fsf@HIDDEN>
 <87eee59zzm.fsf@HIDDEN>
Date: Tue, 18 May 2021 16:35:17 +0200
In-Reply-To: <87eee59zzm.fsf@HIDDEN> (Mathieu Othacehe's message of "Mon, 17
 May 2021 16:49:49 +0200")
Message-ID: <878s4c85zu.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: submit
Cc: Ludovic =?utf-8?Q?Court?= =?utf-8?Q?=C3=A8s?= <ludo@HIDDEN>,
 bug-guix@HIDDEN, 48468 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)


Hey,

> I'll have a closer look, thanks for your help.

So this snippet in the http-write procedure of the (guix scripts
publish) module:

--8<---------------cut here---------------start------------->8---
          (swallow-zlib-error
           (close-port port))
--8<---------------cut here---------------end--------------->8---

is closing the client port unconditionally, which means that guix
publish cannot keep connections alive, unless sitting behind an Nginx
proxy.

I'm trying to turn the close-port call into a maybe-close-port with the
following procedure:

--8<---------------cut here---------------start------------->8---
(define (maybe-close-port port)
    (cond
     ((keep-alive? response)
      (poll-set-add! (http-poll-set server) port *events*))
     (else
      (close-port port))))
--8<---------------cut here---------------end--------------->8---

however this is terribly hacky, as I need to access the private poll-set
from (web server http).

Ludo, do you have a better idea?

Thanks,

Mathieu




Information forwarded to bug-guix@HIDDEN:
bug#48468; Package guix. Full text available.

Message received at 48468 <at> debbugs.gnu.org:


Received: (at 48468) by debbugs.gnu.org; 17 May 2021 14:50:10 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon May 17 10:50:10 2021
Received: from localhost ([127.0.0.1]:53399 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1lieZK-00085F-9k
	for submit <at> debbugs.gnu.org; Mon, 17 May 2021 10:50:10 -0400
Received: from eggs.gnu.org ([209.51.188.92]:40592)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <othacehe@HIDDEN>) id 1lieZE-00084J-FE
 for 48468 <at> debbugs.gnu.org; Mon, 17 May 2021 10:50:04 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:45226)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <othacehe@HIDDEN>)
 id 1lieZ8-0003sN-6g; Mon, 17 May 2021 10:49:55 -0400
Received: from [109.190.253.11] (port=55430 helo=meije)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <othacehe@HIDDEN>)
 id 1lieZ7-0002KA-Mq; Mon, 17 May 2021 10:49:54 -0400
From: Mathieu Othacehe <othacehe@HIDDEN>
To: Christopher Baines <mail@HIDDEN>
Subject: Re: bug#48468: substitute server connection timeout
References: <87lf8e4l42.fsf@HIDDEN> <87eee6cz75.fsf@HIDDEN>
Date: Mon, 17 May 2021 16:49:49 +0200
In-Reply-To: <87eee6cz75.fsf@HIDDEN> (Christopher Baines's message of
 "Sun, 16 May 2021 19:26:22 +0100")
Message-ID: <87eee59zzm.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 48468
Cc: bug-guix@HIDDEN, 48468 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)


Hey,

> That's on the server side, the actual problem is probably on the client
> side, as I guess there are possibly places where closed connections
> aren't handled properly. This reminds me I sent some patches relating to
> closing connections, this could well be related [1].

Oh, you're right, the Nginx server probably makes the difference
here. The http-write procedure of the Guile (web server http) module
seems to handle keep-alive connections. However, the wrapping http-write
procedure of (guix scripts publish) may not.

I'll have a closer look, thanks for your help.

Mathieu




Information forwarded to bug-guix@HIDDEN:
bug#48468; Package guix. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 17 May 2021 14:49:59 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon May 17 10:49:59 2021
Received: from localhost ([127.0.0.1]:53395 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1lieZD-00084T-38
	for submit <at> debbugs.gnu.org; Mon, 17 May 2021 10:49:59 -0400
Received: from lists.gnu.org ([209.51.188.17]:36806)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <othacehe@HIDDEN>) id 1lieZ9-00084K-Q9
 for submit <at> debbugs.gnu.org; Mon, 17 May 2021 10:49:57 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:34242)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <othacehe@HIDDEN>) id 1lieZ9-0007kn-Dt
 for bug-guix@HIDDEN; Mon, 17 May 2021 10:49:55 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:45226)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <othacehe@HIDDEN>)
 id 1lieZ8-0003sN-6g; Mon, 17 May 2021 10:49:55 -0400
Received: from [109.190.253.11] (port=55430 helo=meije)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <othacehe@HIDDEN>)
 id 1lieZ7-0002KA-Mq; Mon, 17 May 2021 10:49:54 -0400
From: Mathieu Othacehe <othacehe@HIDDEN>
To: Christopher Baines <mail@HIDDEN>
Subject: Re: bug#48468: substitute server connection timeout
References: <87lf8e4l42.fsf@HIDDEN> <87eee6cz75.fsf@HIDDEN>
Date: Mon, 17 May 2021 16:49:49 +0200
In-Reply-To: <87eee6cz75.fsf@HIDDEN> (Christopher Baines's message of
 "Sun, 16 May 2021 19:26:22 +0100")
Message-ID: <87eee59zzm.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: submit
Cc: bug-guix@HIDDEN, 48468 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)


Hey,

> That's on the server side, the actual problem is probably on the client
> side, as I guess there are possibly places where closed connections
> aren't handled properly. This reminds me I sent some patches relating to
> closing connections, this could well be related [1].

Oh, you're right, the Nginx server probably makes the difference
here. The http-write procedure of the Guile (web server http) module
seems to handle keep-alive connections. However, the wrapping http-write
procedure of (guix scripts publish) may not.

I'll have a closer look, thanks for your help.

Mathieu




Information forwarded to bug-guix@HIDDEN:
bug#48468; Package guix. Full text available.

Message received at 48468 <at> debbugs.gnu.org:


Received: (at 48468) by debbugs.gnu.org; 16 May 2021 18:26:30 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun May 16 14:26:30 2021
Received: from localhost ([127.0.0.1]:51455 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1liLTB-0002mu-Ma
	for submit <at> debbugs.gnu.org; Sun, 16 May 2021 14:26:30 -0400
Received: from mira.cbaines.net ([212.71.252.8]:38702)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mail@HIDDEN>) id 1liLT9-0002ml-B0
 for 48468 <at> debbugs.gnu.org; Sun, 16 May 2021 14:26:27 -0400
Received: from localhost (unknown [IPv6:2a02:8010:68c1:0:8ac0:b4c7:f5c8:7caa])
 by mira.cbaines.net (Postfix) with ESMTPSA id 64FE527BC78;
 Sun, 16 May 2021 19:26:26 +0100 (BST)
Received: from capella (localhost [127.0.0.1])
 by localhost (OpenSMTPD) with ESMTP id 835715d7;
 Sun, 16 May 2021 18:26:25 +0000 (UTC)
References: <87lf8e4l42.fsf@HIDDEN>
User-agent: mu4e 1.4.15; emacs 27.1
From: Christopher Baines <mail@HIDDEN>
To: Mathieu Othacehe <othacehe@HIDDEN>
Subject: Re: bug#48468: substitute server connection timeout
In-reply-to: <87lf8e4l42.fsf@HIDDEN>
Date: Sun, 16 May 2021 19:26:22 +0100
Message-ID: <87eee6cz75.fsf@HIDDEN>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-=";
 micalg=pgp-sha512; protocol="application/pgp-signature"
X-Spam-Score: 0.0 (/)
X-Debbugs-Envelope-To: 48468
Cc: bug-guix@HIDDEN, 48468 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -1.0 (-)

--=-=-=
Content-Type: text/plain


Mathieu Othacehe <othacehe@HIDDEN> writes:

> Hello,
>
> We recently have a lot of those errors on Cuirass:
>
> --8<---------------cut here---------------start------------->8---
> guix substitute: warning: while fetching http://141.80.167.131:5557/nar/g7ka09613k5v1vlznh87yg35905ggw51-python2-scipy-1.2.2-guile-builder: server is somewhat slow
> guix substitute: warning: try `--no-substitutes' if the problem persists
> guix substitute: error: connect*: Connection timed out
> --8<---------------cut here---------------end--------------->8---
>
> which means that the workers are failing to connect to the Cuirass
> remote-server publish server on berlin at 141.80.167.131:5557.
>
> Stracing this publish server shows that connection reuse seems to be
> broken:
>
> --8<---------------cut here---------------start------------->8---
> accept4(9, {sa_family=AF_INET, sin_port=htons(41742), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41744), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41746), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 25
> accept4(9, {sa_family=AF_INET, sin_port=htons(41748), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 24
> accept4(9, {sa_family=AF_INET, sin_port=htons(41750), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41752), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41754), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 25
> accept4(9, {sa_family=AF_INET, sin_port=htons(41756), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41758), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 26
> accept4(9, {sa_family=AF_INET, sin_port=htons(41760), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 24
> accept4(9, {sa_family=AF_INET, sin_port=htons(41762), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41764), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41766), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41768), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 22
> accept4(9, {sa_family=AF_INET, sin_port=htons(41770), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41772), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41774), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41776), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41778), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41780), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41782), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> --8<---------------cut here---------------end--------------->8---
>
> Investigating it, I found that the connection is closed and opened
> multiple times in the call-with-cached-connection procedure of the (guix
> script substitute) module.
>
> It looks like its because a 'bad-headers exception is raised when trying
> to parse an eof object:
>
> --8<---------------cut here---------------start------------->8---
> ;;; (error bad-header (read-header-line #<eof>))
> --8<---------------cut here---------------end--------------->8---
>
> I'm not sure where this eof comes from. There is this comment in the
> http-multiple-get procedure in (guix http-client):
>
> --8<---------------cut here---------------start------------->8---
> ;; Swallow networking errors that could occur due to connection reuse
> ;; and the like; they will be handled down the road when trying to
> ;; read responses.
> (false-if-networking-error
>  (begin
>    (for-each (cut write-request <> buffer) batch)
>    (put-bytevector p (get))
>    (force-output p))))
> --8<---------------cut here---------------end--------------->8---
>
> which would suggest that connection reuse could cause networking errors?
>
> What also puzzles me it that the main guix publish server on berlin does
> not seem to present this issue. That would indicate that this error is
> caused by how the Cuirass remote-server publish server is started or
> configured.
>
> Ludo, Chris, any idea?

While I've been working in this area, I've actually been trying to pick
apart the connection caching, since the single thread assumption doesn't
hold in the Guix Build Coordinator.

Anyway, I do have a theory. Assuming I'm correct in saying that there's
no nginx between the client and publish server here, I think that's your
configuration difference.

For ci.guix.gnu.org, as well as data.guix.gnu.org, it's NGinx which is
keeping connections alive. I'm not sure the Guile code for the publish
server does similarly, so talking to it directly might be different.

That's on the server side, the actual problem is probably on the client
side, as I guess there are possibly places where closed connections
aren't handled properly. This reminds me I sent some patches relating to
closing connections, this could well be related [1].

1: https://issues.guix.gnu.org/47174

--=-=-=
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmChY89fFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XfG9w/+KvzCHwfnB+eeEd6i0xTvTfc8+oLxb+GS
j1TSbXSGR02ByGMGIbJeAMmNPV20V9Lzi24a49451fskUZbCGeg6aS2rO3c9QpUX
jbF7//NW8zwgmhO/Hh9dvB4v+q01+VwrzE3c6ssAVO6772ZrTJqGA7oCU9UqD1w6
dQYUto8iSxa95UhFcNFMtGcBAevdoN4Ku9OpkqIt8uViCED+YM6a9Ej4d9f9oC8h
vxw6EU0YsCrobJQJ8KW56N9uR4oeI4IqWevou2LbSAmcHEWmGHAF6XDgxu0594Ip
YOFuqso4nuRxU3ffzgEupMbJA07yrwf0EB4lreqsSfU8gtZupFMLgyc8PviyXjw3
XPpo3PAukKhsVZyQlnTc6ldatWWVHIeTPy8vxdr6GvSQnzliaQmzpHj7xVqkfS3f
R1gz5hEXoS0o/CXpVkpnaYMC8m+Y2ujM5AbEUBVo6itR6mmUCCF5D9q/BtosjQvu
8wrLes42iAqwlnNeg8UGKra/cwJ0YGVrOInsOyXX97jz3UVjkxV7jq53+vkw3fGZ
eyHDugAJ8DKw9js8BX88AmXaV/Art4M0qG6D6Zp+MvIUbU9z+oNeTjAXgGewxntw
sfIdxHju+DNs5x5je3pTzyRSDPEfle8Xvd58DbGyAIoP8v7C7LYPmxVID7R0D/cs
ccIQKCj/kqg=
=yVQQ
-----END PGP SIGNATURE-----
--=-=-=--




Information forwarded to bug-guix@HIDDEN:
bug#48468; Package guix. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 16 May 2021 18:26:32 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun May 16 14:26:32 2021
Received: from localhost ([127.0.0.1]:51458 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1liLTE-0002n9-5G
	for submit <at> debbugs.gnu.org; Sun, 16 May 2021 14:26:32 -0400
Received: from lists.gnu.org ([209.51.188.17]:40602)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <mail@HIDDEN>) id 1liLTD-0002n2-2G
 for submit <at> debbugs.gnu.org; Sun, 16 May 2021 14:26:31 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:51284)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <mail@HIDDEN>) id 1liLTC-00028q-TL
 for bug-guix@HIDDEN; Sun, 16 May 2021 14:26:30 -0400
Received: from mira.cbaines.net ([212.71.252.8]:56692)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <mail@HIDDEN>)
 id 1liLTA-0005Ms-Op; Sun, 16 May 2021 14:26:30 -0400
Received: from localhost (unknown [IPv6:2a02:8010:68c1:0:8ac0:b4c7:f5c8:7caa])
 by mira.cbaines.net (Postfix) with ESMTPSA id 64FE527BC78;
 Sun, 16 May 2021 19:26:26 +0100 (BST)
Received: from capella (localhost [127.0.0.1])
 by localhost (OpenSMTPD) with ESMTP id 835715d7;
 Sun, 16 May 2021 18:26:25 +0000 (UTC)
References: <87lf8e4l42.fsf@HIDDEN>
User-agent: mu4e 1.4.15; emacs 27.1
From: Christopher Baines <mail@HIDDEN>
To: Mathieu Othacehe <othacehe@HIDDEN>
Subject: Re: bug#48468: substitute server connection timeout
In-reply-to: <87lf8e4l42.fsf@HIDDEN>
Date: Sun, 16 May 2021 19:26:22 +0100
Message-ID: <87eee6cz75.fsf@HIDDEN>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-=";
 micalg=pgp-sha512; protocol="application/pgp-signature"
Received-SPF: pass client-ip=212.71.252.8; envelope-from=mail@HIDDEN;
 helo=mira.cbaines.net
X-Spam_score_int: -6
X-Spam_score: -0.7
X-Spam_bar: /
X-Spam_report: (-0.7 / 5.0 requ) BAYES_00=-1.9, NORMAL_HTTP_TO_IP=0.001,
 NUMERIC_HTTP_ADDR=1.242, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001,
 WEIRD_PORT=0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.4 (-)
X-Debbugs-Envelope-To: submit
Cc: bug-guix@HIDDEN, 48468 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.4 (--)

--=-=-=
Content-Type: text/plain


Mathieu Othacehe <othacehe@HIDDEN> writes:

> Hello,
>
> We recently have a lot of those errors on Cuirass:
>
> --8<---------------cut here---------------start------------->8---
> guix substitute: warning: while fetching http://141.80.167.131:5557/nar/g7ka09613k5v1vlznh87yg35905ggw51-python2-scipy-1.2.2-guile-builder: server is somewhat slow
> guix substitute: warning: try `--no-substitutes' if the problem persists
> guix substitute: error: connect*: Connection timed out
> --8<---------------cut here---------------end--------------->8---
>
> which means that the workers are failing to connect to the Cuirass
> remote-server publish server on berlin at 141.80.167.131:5557.
>
> Stracing this publish server shows that connection reuse seems to be
> broken:
>
> --8<---------------cut here---------------start------------->8---
> accept4(9, {sa_family=AF_INET, sin_port=htons(41742), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41744), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41746), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 25
> accept4(9, {sa_family=AF_INET, sin_port=htons(41748), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 24
> accept4(9, {sa_family=AF_INET, sin_port=htons(41750), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41752), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41754), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 25
> accept4(9, {sa_family=AF_INET, sin_port=htons(41756), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41758), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 26
> accept4(9, {sa_family=AF_INET, sin_port=htons(41760), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 24
> accept4(9, {sa_family=AF_INET, sin_port=htons(41762), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41764), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41766), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41768), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 22
> accept4(9, {sa_family=AF_INET, sin_port=htons(41770), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41772), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41774), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41776), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41778), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41780), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> accept4(9, {sa_family=AF_INET, sin_port=htons(41782), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
> --8<---------------cut here---------------end--------------->8---
>
> Investigating it, I found that the connection is closed and opened
> multiple times in the call-with-cached-connection procedure of the (guix
> script substitute) module.
>
> It looks like its because a 'bad-headers exception is raised when trying
> to parse an eof object:
>
> --8<---------------cut here---------------start------------->8---
> ;;; (error bad-header (read-header-line #<eof>))
> --8<---------------cut here---------------end--------------->8---
>
> I'm not sure where this eof comes from. There is this comment in the
> http-multiple-get procedure in (guix http-client):
>
> --8<---------------cut here---------------start------------->8---
> ;; Swallow networking errors that could occur due to connection reuse
> ;; and the like; they will be handled down the road when trying to
> ;; read responses.
> (false-if-networking-error
>  (begin
>    (for-each (cut write-request <> buffer) batch)
>    (put-bytevector p (get))
>    (force-output p))))
> --8<---------------cut here---------------end--------------->8---
>
> which would suggest that connection reuse could cause networking errors?
>
> What also puzzles me it that the main guix publish server on berlin does
> not seem to present this issue. That would indicate that this error is
> caused by how the Cuirass remote-server publish server is started or
> configured.
>
> Ludo, Chris, any idea?

While I've been working in this area, I've actually been trying to pick
apart the connection caching, since the single thread assumption doesn't
hold in the Guix Build Coordinator.

Anyway, I do have a theory. Assuming I'm correct in saying that there's
no nginx between the client and publish server here, I think that's your
configuration difference.

For ci.guix.gnu.org, as well as data.guix.gnu.org, it's NGinx which is
keeping connections alive. I'm not sure the Guile code for the publish
server does similarly, so talking to it directly might be different.

That's on the server side, the actual problem is probably on the client
side, as I guess there are possibly places where closed connections
aren't handled properly. This reminds me I sent some patches relating to
closing connections, this could well be related [1].

1: https://issues.guix.gnu.org/47174

--=-=-=
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQKlBAEBCgCPFiEEPonu50WOcg2XVOCyXiijOwuE9XcFAmChY89fFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDNF
ODlFRUU3NDU4RTcyMEQ5NzU0RTBCMjVFMjhBMzNCMEI4NEY1NzcRHG1haWxAY2Jh
aW5lcy5uZXQACgkQXiijOwuE9XfG9w/+KvzCHwfnB+eeEd6i0xTvTfc8+oLxb+GS
j1TSbXSGR02ByGMGIbJeAMmNPV20V9Lzi24a49451fskUZbCGeg6aS2rO3c9QpUX
jbF7//NW8zwgmhO/Hh9dvB4v+q01+VwrzE3c6ssAVO6772ZrTJqGA7oCU9UqD1w6
dQYUto8iSxa95UhFcNFMtGcBAevdoN4Ku9OpkqIt8uViCED+YM6a9Ej4d9f9oC8h
vxw6EU0YsCrobJQJ8KW56N9uR4oeI4IqWevou2LbSAmcHEWmGHAF6XDgxu0594Ip
YOFuqso4nuRxU3ffzgEupMbJA07yrwf0EB4lreqsSfU8gtZupFMLgyc8PviyXjw3
XPpo3PAukKhsVZyQlnTc6ldatWWVHIeTPy8vxdr6GvSQnzliaQmzpHj7xVqkfS3f
R1gz5hEXoS0o/CXpVkpnaYMC8m+Y2ujM5AbEUBVo6itR6mmUCCF5D9q/BtosjQvu
8wrLes42iAqwlnNeg8UGKra/cwJ0YGVrOInsOyXX97jz3UVjkxV7jq53+vkw3fGZ
eyHDugAJ8DKw9js8BX88AmXaV/Art4M0qG6D6Zp+MvIUbU9z+oNeTjAXgGewxntw
sfIdxHju+DNs5x5je3pTzyRSDPEfle8Xvd58DbGyAIoP8v7C7LYPmxVID7R0D/cs
ccIQKCj/kqg=
=yVQQ
-----END PGP SIGNATURE-----
--=-=-=--




Information forwarded to bug-guix@HIDDEN:
bug#48468; Package guix. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 16 May 2021 17:57:56 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun May 16 13:57:56 2021
Received: from localhost ([127.0.0.1]:51426 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1liL1Y-00024J-5i
	for submit <at> debbugs.gnu.org; Sun, 16 May 2021 13:57:56 -0400
Received: from lists.gnu.org ([209.51.188.17]:56062)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <othacehe@HIDDEN>) id 1liL1W-00024A-Al
 for submit <at> debbugs.gnu.org; Sun, 16 May 2021 13:57:54 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:44674)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <othacehe@HIDDEN>) id 1liL1V-0000JG-NQ
 for bug-guix@HIDDEN; Sun, 16 May 2021 13:57:53 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:57356)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <othacehe@HIDDEN>) id 1liL1V-0004Qj-Ga
 for bug-guix@HIDDEN; Sun, 16 May 2021 13:57:53 -0400
Received: from [2a01:e34:ed27:e500:a880:d241:574d:84e1] (port=51494 helo=meije)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <othacehe@HIDDEN>) id 1liL1T-0007vp-Vn
 for bug-guix@HIDDEN; Sun, 16 May 2021 13:57:52 -0400
From: Mathieu Othacehe <othacehe@HIDDEN>
To: bug-guix@HIDDEN
Subject: substitute server connection timeout
Date: Sun, 16 May 2021 19:57:49 +0200
Message-ID: <87lf8e4l42.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)


Hello,

We recently have a lot of those errors on Cuirass:

--8<---------------cut here---------------start------------->8---
guix substitute: warning: while fetching http://141.80.167.131:5557/nar/g7ka09613k5v1vlznh87yg35905ggw51-python2-scipy-1.2.2-guile-builder: server is somewhat slow
guix substitute: warning: try `--no-substitutes' if the problem persists
guix substitute: error: connect*: Connection timed out
--8<---------------cut here---------------end--------------->8---

which means that the workers are failing to connect to the Cuirass
remote-server publish server on berlin at 141.80.167.131:5557.

Stracing this publish server shows that connection reuse seems to be
broken:

--8<---------------cut here---------------start------------->8---
accept4(9, {sa_family=AF_INET, sin_port=htons(41742), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41744), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41746), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 25
accept4(9, {sa_family=AF_INET, sin_port=htons(41748), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 24
accept4(9, {sa_family=AF_INET, sin_port=htons(41750), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41752), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41754), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 25
accept4(9, {sa_family=AF_INET, sin_port=htons(41756), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41758), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 26
accept4(9, {sa_family=AF_INET, sin_port=htons(41760), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 24
accept4(9, {sa_family=AF_INET, sin_port=htons(41762), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41764), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41766), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41768), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 22
accept4(9, {sa_family=AF_INET, sin_port=htons(41770), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41772), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41774), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41776), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41778), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41780), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
accept4(9, {sa_family=AF_INET, sin_port=htons(41782), sin_addr=inet_addr("141.80.167.185")}, [112->16], 0) = 21
--8<---------------cut here---------------end--------------->8---

Investigating it, I found that the connection is closed and opened
multiple times in the call-with-cached-connection procedure of the (guix
script substitute) module.

It looks like its because a 'bad-headers exception is raised when trying
to parse an eof object:

--8<---------------cut here---------------start------------->8---
;;; (error bad-header (read-header-line #<eof>))
--8<---------------cut here---------------end--------------->8---

I'm not sure where this eof comes from. There is this comment in the
http-multiple-get procedure in (guix http-client):

--8<---------------cut here---------------start------------->8---
;; Swallow networking errors that could occur due to connection reuse
;; and the like; they will be handled down the road when trying to
;; read responses.
(false-if-networking-error
 (begin
   (for-each (cut write-request <> buffer) batch)
   (put-bytevector p (get))
   (force-output p))))
--8<---------------cut here---------------end--------------->8---

which would suggest that connection reuse could cause networking errors?

What also puzzles me it that the main guix publish server on berlin does
not seem to present this issue. That would indicate that this error is
caused by how the Cuirass remote-server publish server is started or
configured.

Ludo, Chris, any idea?

I will keep searching anyway :)

Thanks,

Mathieu




Acknowledgement sent to Mathieu Othacehe <othacehe@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-guix@HIDDEN. Full text available.
Report forwarded to bug-guix@HIDDEN:
bug#48468; Package guix. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Sat, 10 Dec 2022 11:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.