GNU bug report logs - #37309
'ssh-daemon' fails to start

Previous Next

Package: guix;

Reported by: Giovanni Biscuolo <g <at> xelera.eu>

Date: Thu, 5 Sep 2019 13:19:03 UTC

Severity: important

Tags: fixed, unreproducible

Merged with 30993, 33299, 34580

Done: maxim.cournoyer <at> gmail.com

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 37309 in the body.
You can then email your comments to 37309 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#37309; Package guix. (Thu, 05 Sep 2019 13:19:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Giovanni Biscuolo <g <at> xelera.eu>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Thu, 05 Sep 2019 13:19:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Giovanni Biscuolo <g <at> xelera.eu>
To: bug-guix <at> gnu.org
Subject: ‘ssh-daemon’ service fails to start at boot
Date: Thu, 05 Sep 2019 15:18:24 +0200
[Message part 1 (text/plain, inline)]
Hi,

following a recent discussion on guix-sysadmin I have to confirm the
ssh-daemon issue since it is still happening on some of the machines I
administer

Previous possibly related bug reports are
https://issues.guix.gnu.org/issue/30993 and
https://issues.guix.gnu.org/issue/32197

Unfortunately this issue is *not* well reproducible, it depends on some
mysterious (to me) timing factor; AFAIU it does *not* depend on the
shepherd version, probably it depends on "something" related to IPv6
(read below the details)

Andreas Enge <andreas <at> enge.fr> writes:

[...]

> My impression is that the problem is still there. I am quite certain it
> happened when I rebooted dover, since I had to connect on the serial console
> to manually restart the ssh service.

I'm sure it happened when milano-guix-1 was rebooted due to data centre
maintenance and happened yesterday to one of my personal Guix machines at
office

[...]

My situation is similar to the one observed by Andreas

> Well, it is in /var/log/messages:
> Aug  3 21:11:38 localhost sshd[360]: Server listening on 0.0.0.0 port 22.
> Aug  3 21:11:55 localhost shepherd[1]: Service ssh-daemon could not be started.

--8<---------------cut here---------------start------------->8---
[...]
Sep  4 21:46:02 localhost shepherd[1]: Service syslogd has been started.
[...]
Sep  4 21:46:03 localhost shepherd[1]: Service loopback has been started.
[...]
Sep  4 21:46:22 localhost vmunix: [    0.226337] PCI: Using configuration type 1 for base access
Sep  4 21:46:09 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to 255.255.255.255 port 67
[...]
Sep  4 21:46:24 localhost shepherd[1]: Service networking has been started.
[...]
Sep  4 21:46:12 localhost sshd[577]: Server listening on 0.0.0.0 port 22.
[...]
Sep  4 21:46:30 localhost vmunix: [    0.250107] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 10 *11 12 14 15)
Sep  4 21:46:13 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to 255.255.255.255 port 67
[...]
Sep  4 21:46:16 localhost dhclient: DHCPACK of 10.38.2.16 from 10.38.2.1
[...]
Sep  4 21:46:33 localhost shepherd[1]: Service ssh-daemon could not be started.
[...]
Sep  4 21:46:47 localhost vmunix: [    0.731142] Segment Routing with IPv6
--8<---------------cut here---------------end--------------->8---

Please note the timing of the dhclient and the sshd processes: I
inserted them as printed in /var/log/messages but they are not
time-sequential: does it means something or is irrelevant?

So the sshd process started (as far as I cen see there is no trace it
was stopped) and pretty soon shepherd noticed ssh-daemon was not
started.

Logging in from the console I see the ssh-daemon is stopped but enabled:

--8<---------------cut here---------------start------------->8---
Status of ssh-daemon:
  It is stopped.
  It is enabled.
  Provides (ssh-daemon).
  Requires (syslogd loopback).
  Conflicts with ().
  Will be respawned.
--8<---------------cut here---------------end--------------->8---

[...]

If I start it via `sudo herd start ssh-daemon` it immediatly starts,
like in Andreas experience:

> Aug  3 21:13:10 localhost sshd[385]: Server listening on 0.0.0.0 port 22.
> Aug  3 21:13:10 localhost sshd[385]: Server listening on :: port 22.
> Aug  3 21:13:11 localhost shepherd[1]: Service ssh-daemon has been started.

--8<---------------cut here---------------start------------->8---
Sep  5 13:38:55 localhost sshd[745]: Server listening on 0.0.0.0 port 22.
Sep  5 13:38:55 localhost sshd[745]: Server listening on :: port 22.
Sep  5 13:38:55 localhost shepherd[1]: Service ssh-daemon has been started.
--8<---------------cut here---------------end--------------->8---

Please notice the difference from above: this time the sshd server is
also listening on the IPv6 address :: while in the above log it was only
listening on the 0.0.0.0 IPv4 address

Does the failure have something to do with IPv6 not available when sshd
starts for the first time after a reboot?

Please have a look at the following /var/log/message excerpt from my
system after a succesfull ssh-daemon start soon after a reboot (no
"manual" intervention):

--8<---------------cut here---------------start------------->8---
Sep  5 14:45:00 localhost vmunix: [    0.247544] pci 0000:00:14.0: reg 0x10: [mem 0xf7c20000-0xf7c2ffff 64bit]
Sep  5 14:44:45 localhost sshd[574]: Server listening on 0.0.0.0 port 22.
[...]
Sep  5 14:44:47 localhost sshd[574]: Server listening on :: port 22.
[...]
Sep  5 14:45:05 localhost shepherd[1]: Service ssh-daemon has been started.
--8<---------------cut here---------------end--------------->8---

Bingo? This time ssh was started also on :: and it works right after a reboot.

It really seems it has something to do with IPv6 but I cannot understand
exactly what :-S (do I have to disable IPv6 in my configs?)

For completeness, I have to say that the issue happened yesterday after
a `guix system reconfigure`, this is my current system generation:

--8<---------------cut here---------------start------------->8---
Generation 8    Sep 04 2019 17:19:08    (current)
  file name: /var/guix/profiles/system-8-link
  canonical file name: /gnu/store/iw2ayn696f8ipmd5gzw9fxljf9h8w4pr-system
  label: GNU with Linux-Libre 5.2.11
  bootloader: grub-efi
  root device: UUID: 26bd54ec-4e74-4b3a-96ff-58f2f34e4a1a
  kernel: /gnu/store/xgl60ivx8p5p79zjbf08p4x09881wf4s-linux-libre-5.2.11/bzImage
--8<---------------cut here---------------end--------------->8---

Reconfigured with this guix version:

--8<---------------cut here---------------start------------->8---
g <at> batondor ~$ sudo -i guix describe 
Generation 6    Sep 04 2019 17:17:02    (current)
  guix 5ee1c04
    repository URL: https://git.savannah.gnu.org/git/guix.git
    branch: master
    commit: 5ee1c0459eebdd3b7771abaeab0f0b52ff86fdd5
--8<---------------cut here---------------end--------------->8---

This is the shepherd version:

--8<---------------cut here---------------start------------->8---
g <at> batondor ~$ shepherd --version
shepherd (GNU Shepherd) 0.6.1
--8<---------------cut here---------------end--------------->8---

Thanks! Gio'

-- 
Giovanni Biscuolo

Xelera IT Infrastructures
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#37309; Package guix. (Sun, 08 Sep 2019 04:20:01 GMT) Full text and rfc822 format available.

Message #8 received at 37309 <at> debbugs.gnu.org (full text, mbox):

From: iyzsong <at> member.fsf.org (宋文武)
To: Giovanni Biscuolo <g <at> xelera.eu>
Cc: Ludovic Courtès <ludo <at> gnu.org>, 37309 <at> debbugs.gnu.org
Subject: Re: bug#37309: ‘ssh-daemon’ service fails
 to start at boot
Date: Sun, 08 Sep 2019 12:19:14 +0800
Giovanni Biscuolo <g <at> xelera.eu> writes:

> Hi,
>
> following a recent discussion on guix-sysadmin I have to confirm the
> ssh-daemon issue since it is still happening on some of the machines I
> administer
>
> Previous possibly related bug reports are
> https://issues.guix.gnu.org/issue/30993 and
> https://issues.guix.gnu.org/issue/32197
>
> Unfortunately this issue is *not* well reproducible, it depends on some
> mysterious (to me) timing factor; AFAIU it does *not* depend on the
> shepherd version, probably it depends on "something" related to IPv6
> (read below the details)

Hello, thank you for this report, it's reproducible with my box that has
an old hard disk, and disable IPv6 for sshd does fix the issue for me...

>
> Andreas Enge <andreas <at> enge.fr> writes:
>
> [...]
>
>> My impression is that the problem is still there. I am quite certain it
>> happened when I rebooted dover, since I had to connect on the serial console
>> to manually restart the ssh service.
>
> I'm sure it happened when milano-guix-1 was rebooted due to data centre
> maintenance and happened yesterday to one of my personal Guix machines at
> office
>
> [...]
>
> My situation is similar to the one observed by Andreas
>
>> Well, it is in /var/log/messages:
>> Aug  3 21:11:38 localhost sshd[360]: Server listening on 0.0.0.0 port 22.
>> Aug  3 21:11:55 localhost shepherd[1]: Service ssh-daemon could not be started.
>
> [...]
> Sep  4 21:46:02 localhost shepherd[1]: Service syslogd has been started.
> [...]
> Sep  4 21:46:03 localhost shepherd[1]: Service loopback has been started.
> [...]
> Sep  4 21:46:22 localhost vmunix: [    0.226337] PCI: Using configuration type 1 for base access
> Sep  4 21:46:09 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to 255.255.255.255 port 67
> [...]
> Sep  4 21:46:24 localhost shepherd[1]: Service networking has been started.
> [...]
> Sep  4 21:46:12 localhost sshd[577]: Server listening on 0.0.0.0 port 22.
> [...]
> Sep  4 21:46:30 localhost vmunix: [    0.250107] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 10 *11 12 14 15)
> Sep  4 21:46:13 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to 255.255.255.255 port 67
> [...]
> Sep  4 21:46:16 localhost dhclient: DHCPACK of 10.38.2.16 from 10.38.2.1
> [...]
> Sep  4 21:46:33 localhost shepherd[1]: Service ssh-daemon could not be started.
> [...]
> Sep  4 21:46:47 localhost vmunix: [    0.731142] Segment Routing with IPv6
>
>
> Please note the timing of the dhclient and the sshd processes: I
> inserted them as printed in /var/log/messages but they are not
> time-sequential: does it means something or is irrelevant?
>
> So the sshd process started (as far as I cen see there is no trace it
> was stopped) and pretty soon shepherd noticed ssh-daemon was not
> started.
>
> Logging in from the console I see the ssh-daemon is stopped but enabled:
>
> Status of ssh-daemon:
>   It is stopped.
>   It is enabled.
>   Provides (ssh-daemon).
>   Requires (syslogd loopback).
>   Conflicts with ().
>   Will be respawned.
>
>
> [...]

Yes, I think when 'ssh-daemon' failed to start, shepherd should respawn
it until success or disable it, but by look at the code of
'make-forkexec-constructor', when using 'pid-file' (as 'ssh-ademon'
does), and a timeout (default to 5s %pid-file-timeout) is reached, the
processes got a 'SIGTERM' and return '#f' as its running state, which
won't be respawn (it's not a pid number) I guess...

To ludo: Is my analysis correct?  It's not clear to me how to fix it so
'ssh-daemon' can be respawn though...

>
> If I start it via `sudo herd start ssh-daemon` it immediatly starts,
> like in Andreas experience:
>
>> Aug  3 21:13:10 localhost sshd[385]: Server listening on 0.0.0.0 port 22.
>> Aug  3 21:13:10 localhost sshd[385]: Server listening on :: port 22.
>> Aug  3 21:13:11 localhost shepherd[1]: Service ssh-daemon has been started.
>
> Sep  5 13:38:55 localhost sshd[745]: Server listening on 0.0.0.0 port 22.
> Sep  5 13:38:55 localhost sshd[745]: Server listening on :: port 22.
> Sep  5 13:38:55 localhost shepherd[1]: Service ssh-daemon has been started.
>
>
> Please notice the difference from above: this time the sshd server is
> also listening on the IPv6 address :: while in the above log it was only
> listening on the 0.0.0.0 IPv4 address
>
> Does the failure have something to do with IPv6 not available when sshd
> starts for the first time after a reboot?

I agree, as adding '(extra-content "ListenAddress 0.0.0.0")' to my
'openssh-configuration' to skip the ipv6 listen fix this issue for me.

A proper fix should be respawn 'ssh-daemon' and start it after 'ipv6
available' (i don't know what this mean yet..).




Severity set to 'important' from 'normal' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Thu, 26 Sep 2019 20:24:02 GMT) Full text and rfc822 format available.

Merged 30993 33299 34580 37309. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Thu, 26 Sep 2019 20:29:04 GMT) Full text and rfc822 format available.

Changed bug title to ''ssh-daemon' fails to start' from '‘ssh-daemon’ service fails to start at boot' Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Thu, 26 Sep 2019 20:29:04 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#37309; Package guix. (Tue, 26 Nov 2019 18:35:02 GMT) Full text and rfc822 format available.

Message #17 received at 37309 <at> debbugs.gnu.org (full text, mbox):

From: Jelle Licht <jlicht <at> fsfe.org>
To: 宋文武 <iyzsong <at> member.fsf.org>, Giovanni Biscuolo
 <g <at> xelera.eu>
Cc: 37309 <at> debbugs.gnu.org
Subject: Re: bug#37309: ‘ssh-daemon’ service fails
 to start at boot
Date: Tue, 26 Nov 2019 19:34:52 +0100
Hey 宋文武, Giovanni,

iyzsong <at> member.fsf.org (宋文武) writes:

> [...]
> Yes, I think when 'ssh-daemon' failed to start, shepherd should respawn
> it until success or disable it, but by look at the code of
> 'make-forkexec-constructor', when using 'pid-file' (as 'ssh-ademon'
> does), and a timeout (default to 5s %pid-file-timeout) is reached, the
> processes got a 'SIGTERM' and return '#f' as its running state, which
> won't be respawn (it's not a pid number) I guess...
>
> To ludo: Is my analysis correct?  It's not clear to me how to fix it so
> 'ssh-daemon' can be respawn though...

I think I am also running into a similar issue on my spinning rust based
T400. Is there a workaround available that does the above, or is that
analysis of the situation not correct either?

Thanks,

Jelle




Information forwarded to bug-guix <at> gnu.org:
bug#37309; Package guix. (Fri, 29 Nov 2019 08:42:02 GMT) Full text and rfc822 format available.

Message #20 received at 37309 <at> debbugs.gnu.org (full text, mbox):

From: Giovanni Biscuolo <g <at> xelera.eu>
To: Jelle Licht <jlicht <at> fsfe.org>,
 宋文武 <iyzsong <at> member.fsf.org>
Cc: 37309 <at> debbugs.gnu.org
Subject: Re: bug#37309: ‘ssh-daemon’ service fails
 to start at boot
Date: Fri, 29 Nov 2019 09:40:37 +0100
[Message part 1 (text/plain, inline)]
Hi Jelle,

Jelle Licht <jlicht <at> fsfe.org> writes:

[...]

> I think I am also running into a similar issue on my spinning rust based
> T400. Is there a workaround available that does the above,

I added `(extra-content "ListenAddress 0.0.0.0")` to my
openssh-configuration, to only listen on IPv4 addresses:

--8<---------------cut here---------------start------------->8---
(service openssh-service-type
		  (openssh-configuration
		   (port-number 22)
		   (extra-content "ListenAddress 0.0.0.0")
		   (authorized-keys
		    `(("g" ,(local-file "keys/ssh/g.pub"))
		      ("hydra",(local-file "keys/ssh/hydra.pub"))))))
--8<---------------cut here---------------end--------------->8---

I tried to reboot several times one machine I can use for testing and it
works for me: please can you try and report if this also works for you?

[...]

Thanks! Gio'

-- 
Giovanni Biscuolo

Xelera IT Infrastructures
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#37309; Package guix. (Fri, 29 Nov 2019 09:52:01 GMT) Full text and rfc822 format available.

Message #23 received at 37309 <at> debbugs.gnu.org (full text, mbox):

From: Jelle Licht <jlicht <at> fsfe.org>
To: Giovanni Biscuolo <g <at> xelera.eu>, 宋文武
 <iyzsong <at> member.fsf.org>
Cc: 37309 <at> debbugs.gnu.org
Subject: Re: bug#37309: ‘ssh-daemon’ service fails
 to start at boot
Date: Fri, 29 Nov 2019 10:51:46 +0100
Hi Giovanni,


Giovanni Biscuolo <g <at> xelera.eu> writes:

> Hi Jelle,
>
> Jelle Licht <jlicht <at> fsfe.org> writes:
>
> [...]
>
>> I think I am also running into a similar issue on my spinning rust based
>> T400. Is there a workaround available that does the above,
>
> I added `(extra-content "ListenAddress 0.0.0.0")` to my
> openssh-configuration, to only listen on IPv4 addresses:
>
> --8<---------------cut here---------------start------------->8---
> (service openssh-service-type
> 		  (openssh-configuration
> 		   (port-number 22)
> 		   (extra-content "ListenAddress 0.0.0.0")
> 		   (authorized-keys
> 		    `(("g" ,(local-file "keys/ssh/g.pub"))
> 		      ("hydra",(local-file "keys/ssh/hydra.pub"))))))
> --8<---------------cut here---------------end--------------->8---
>
> I tried to reboot several times one machine I can use for testing and it
> works for me: please can you try and report if this also works for you?

This, in combination with setting the pid-file-timeout to 30 seconds,
made everything work! I guess it is a combination of fun IPv6
interactions with extremely slow and busy spinning rust.

Thank you!

This does still like a workaround instead of a proper fix though; is
there something we can do to mitigate these issues in the first place?

- Jelle




Information forwarded to bug-guix <at> gnu.org:
bug#37309; Package guix. (Tue, 03 Dec 2019 20:16:02 GMT) Full text and rfc822 format available.

Message #26 received at 37309 <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: 37309 <at> debbugs.gnu.org
Subject: [PATCH] services: openssh: Restrict to IPv4.
Date: Tue,  3 Dec 2019 15:12:51 -0500
This works around <https://issues.guix.info/issue/30993>.

* gnu/services/ssh.scm (<openssh-configuration>)[address-family]: New field.
(openssh-config-file): Use it.
* doc/guix.texi: Document it.
---
 doc/guix.texi        | 10 ++++++++++
 gnu/services/ssh.scm | 16 +++++++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/doc/guix.texi b/doc/guix.texi
index 39eb25385c..cf0e141baf 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -13913,6 +13913,16 @@ This is a symbol specifying the logging level: @code{quiet}, @code{fatal},
 @code{error}, @code{info}, @code{verbose}, @code{debug}, etc.  See the man
 page for @file{sshd_config} for the full list of level names.
 
+@item @code{address-family} (default: @code{'inet})
+This is a symbol specifying which type of internet addresses should be
+handled by @command{sshd}.  The options are @code{inet} (IPv4),
+@code{inet6} (IPv6), or @code{any}, which selects both @code{inet} and
+@code{inet6}.  The upstream default in @code{any}.  However, we
+currently default to @code{inet} due to a nondeterministic
+@command{sshd} startup failure when using IPv6 on Guix.  See
+@uref{https://issues.guix.info/issue/30993, the bug report} for more
+information on this temporary limitation.
+
 @item @code{extra-content} (default: @code{""})
 This field can be used to append arbitrary text to the configuration file.  It
 is especially useful for elaborate configurations that cannot be expressed
diff --git a/gnu/services/ssh.scm b/gnu/services/ssh.scm
index d2dbb8f80d..7e25810eff 100644
--- a/gnu/services/ssh.scm
+++ b/gnu/services/ssh.scm
@@ -4,6 +4,7 @@
 ;;; Copyright © 2016 Julien Lepiller <julien <at> lepiller.eu>
 ;;; Copyright © 2017 Clément Lassieur <clement <at> lassieur.org>
 ;;; Copyright © 2019 Ricardo Wurmus <rekado <at> elephly.net>
+;;; Copyright © 2019 Leo Famulari <leo <at> famulari.name>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -340,7 +341,16 @@ The other options should be self-descriptive."
   ;; proposed in <https://bugs.gnu.org/27155>.  Keep it internal/undocumented
   ;; for now.
   (%auto-start?          openssh-auto-start?
-                         (default #t)))
+                         (default #t))
+
+  ;; Symbol
+  ;; XXX: This shouldn't be required, but due to limitations with IPv6
+  ;; on Guix, sshd often fails to start when it attempts to bind to both
+  ;; 0.0.0.0 and ::, because the IPv6 interface is not ready in time.
+  ;; Accepted options are inet (IPv4), inet6 (IPv6), or any (both).
+  ;; <https://issues.guix.info/issue/30993>
+  (address-family        openssh-configuration-address-family
+                         (default 'inet)))
 
 (define %openssh-accounts
   (list (user-group (name "sshd") (system? #t))
@@ -468,6 +478,10 @@ of user-name/file-like tuples."
                       (symbol->string
                        (openssh-configuration-log-level config))))
 
+           (format port "AddressFamily ~a\n"
+                   #$(symbol->string
+                      (openssh-configuration-address-family config)))
+
            ;; Add '/etc/authorized_keys.d/%u', which we populate.
            (format port "AuthorizedKeysFile \
  .ssh/authorized_keys .ssh/authorized_keys2 /etc/ssh/authorized_keys.d/%u\n")
-- 
2.24.0





Information forwarded to bug-guix <at> gnu.org:
bug#37309; Package guix. (Tue, 03 Dec 2019 21:54:02 GMT) Full text and rfc822 format available.

Message #29 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Julien Lepiller <julien <at> lepiller.eu>
To: bug-guix <at> gnu.org,Leo Famulari <leo <at> famulari.name>,37309 <at> debbugs.gnu.org
Subject: Re: bug#37309: [PATCH] services: openssh: Restrict to IPv4.
Date: Tue, 03 Dec 2019 22:53:11 +0100
Le 3 décembre 2019 21:12:51 GMT+01:00, Leo Famulari <leo <at> famulari.name> a écrit :
>This works around <https://issues.guix.info/issue/30993>.
>
>* gnu/services/ssh.scm (<openssh-configuration>)[address-family]: New
>field.
>(openssh-config-file): Use it.
>* doc/guix.texi: Document it.
>---
> doc/guix.texi        | 10 ++++++++++
> gnu/services/ssh.scm | 16 +++++++++++++++-
> 2 files changed, 25 insertions(+), 1 deletion(-)
>
>diff --git a/doc/guix.texi b/doc/guix.texi
>index 39eb25385c..cf0e141baf 100644
>--- a/doc/guix.texi
>+++ b/doc/guix.texi
>@@ -13913,6 +13913,16 @@ This is a symbol specifying the logging level:
>@code{quiet}, @code{fatal},
>@code{error}, @code{info}, @code{verbose}, @code{debug}, etc.  See the
>man
> page for @file{sshd_config} for the full list of level names.
> 
>+@item @code{address-family} (default: @code{'inet})
>+This is a symbol specifying which type of internet addresses should be
>+handled by @command{sshd}.  The options are @code{inet} (IPv4),
>+@code{inet6} (IPv6), or @code{any}, which selects both @code{inet} and
>+@code{inet6}.  The upstream default in @code{any}.  However, we
default *is*
>+currently default to @code{inet} due to a nondeterministic
>+@command{sshd} startup failure when using IPv6 on Guix.  See
>+@uref{https://issues.guix.info/issue/30993, the bug report} for more
>+information on this temporary limitation.
>+
> @item @code{extra-content} (default: @code{""})
>This field can be used to append arbitrary text to the configuration
>file.  It
>is especially useful for elaborate configurations that cannot be
>expressed
>diff --git a/gnu/services/ssh.scm b/gnu/services/ssh.scm
>index d2dbb8f80d..7e25810eff 100644
>--- a/gnu/services/ssh.scm
>+++ b/gnu/services/ssh.scm
>@@ -4,6 +4,7 @@
> ;;; Copyright © 2016 Julien Lepiller <julien <at> lepiller.eu>
> ;;; Copyright © 2017 Clément Lassieur <clement <at> lassieur.org>
> ;;; Copyright © 2019 Ricardo Wurmus <rekado <at> elephly.net>
>+;;; Copyright © 2019 Leo Famulari <leo <at> famulari.name>
> ;;;
> ;;; This file is part of GNU Guix.
> ;;;
>@@ -340,7 +341,16 @@ The other options should be self-descriptive."
>;; proposed in <https://bugs.gnu.org/27155>.  Keep it
>internal/undocumented
>   ;; for now.
>   (%auto-start?          openssh-auto-start?
>-                         (default #t)))
>+                         (default #t))
>+
>+  ;; Symbol
>+  ;; XXX: This shouldn't be required, but due to limitations with IPv6
>+  ;; on Guix, sshd often fails to start when it attempts to bind to
>both
>+  ;; 0.0.0.0 and ::, because the IPv6 interface is not ready in time.
>+  ;; Accepted options are inet (IPv4), inet6 (IPv6), or any (both).
>+  ;; <https://issues.guix.info/issue/30993>
>+  (address-family        openssh-configuration-address-family
>+                         (default 'inet)))
> 
> (define %openssh-accounts
>   (list (user-group (name "sshd") (system? #t))
>@@ -468,6 +478,10 @@ of user-name/file-like tuples."
>                       (symbol->string
>                        (openssh-configuration-log-level config))))
> 
>+           (format port "AddressFamily ~a\n"
>+                   #$(symbol->string
>+                      (openssh-configuration-address-family config)))
>+
>            ;; Add '/etc/authorized_keys.d/%u', which we populate.
>            (format port "AuthorizedKeysFile \
>.ssh/authorized_keys .ssh/authorized_keys2
>/etc/ssh/authorized_keys.d/%u\n")





Information forwarded to bug-guix <at> gnu.org:
bug#37309; Package guix. (Tue, 03 Dec 2019 21:54:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#37309; Package guix. (Wed, 04 Dec 2019 13:42:02 GMT) Full text and rfc822 format available.

Message #35 received at 37309 <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: Julien Lepiller <julien <at> lepiller.eu>
Cc: 37309 <at> debbugs.gnu.org
Subject: Re: bug#37309: [PATCH] services: openssh: Restrict to IPv4.
Date: Wed, 4 Dec 2019 08:41:35 -0500
On Tue, Dec 03, 2019 at 10:53:11PM +0100, Julien Lepiller wrote:
> Le 3 décembre 2019 21:12:51 GMT+01:00, Leo Famulari <leo <at> famulari.name> a écrit :
> >+@item @code{address-family} (default: @code{'inet})
> >+This is a symbol specifying which type of internet addresses should be
> >+handled by @command{sshd}.  The options are @code{inet} (IPv4),
> >+@code{inet6} (IPv6), or @code{any}, which selects both @code{inet} and
> >+@code{inet6}.  The upstream default in @code{any}.  However, we
> default *is*

Thanks!

This patch did make sshd work for me again.

However, as part of trying to debug this issue, I changed my system
configuration so that it uses dhcp-client-service and
wpa-supplicant-service instead of using Wicd. And now I can't reproduce
the bug anymore.

I guess that either 1) wpa_supplicant brings the network interfaces up
faster or 2) the state of the network interfaces is more accurately
captured with these services (in the sense of, is the network up?).

Tricky...

Does the patch help anybody else?




Information forwarded to bug-guix <at> gnu.org:
bug#37309; Package guix. (Tue, 10 Dec 2019 16:48:01 GMT) Full text and rfc822 format available.

Message #38 received at 37309 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Leo Famulari <leo <at> famulari.name>
Cc: Julien Lepiller <julien <at> lepiller.eu>, 37309 <at> debbugs.gnu.org
Subject: Re: bug#37309: [PATCH] services: openssh: Restrict to IPv4.
Date: Tue, 10 Dec 2019 17:47:25 +0100
Hi Leo,

Leo Famulari <leo <at> famulari.name> skribis:

> On Tue, Dec 03, 2019 at 10:53:11PM +0100, Julien Lepiller wrote:
>> Le 3 décembre 2019 21:12:51 GMT+01:00, Leo Famulari <leo <at> famulari.name> a écrit :
>> >+@item @code{address-family} (default: @code{'inet})
>> >+This is a symbol specifying which type of internet addresses should be
>> >+handled by @command{sshd}.  The options are @code{inet} (IPv4),
>> >+@code{inet6} (IPv6), or @code{any}, which selects both @code{inet} and
>> >+@code{inet6}.  The upstream default in @code{any}.  However, we
>> default *is*
>
> Thanks!
>
> This patch did make sshd work for me again.
>
> However, as part of trying to debug this issue, I changed my system
> configuration so that it uses dhcp-client-service and
> wpa-supplicant-service instead of using Wicd. And now I can't reproduce
> the bug anymore.
>
> I guess that either 1) wpa_supplicant brings the network interfaces up
> faster or 2) the state of the network interfaces is more accurately
> captured with these services (in the sense of, is the network up?).

Did anyone manage to get an strace log as was discussed in
<https://issues.guix.gnu.org/issue/30993>?

That would allow us to know where this is hanging exactly (probably
bind(2) on an IPv6 address.)

Thanks,
Ludo’.




Added tag(s) fixed. Request was from maxim.cournoyer <at> gmail.com to control <at> debbugs.gnu.org. (Tue, 18 Aug 2020 04:09:01 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 30993 <at> debbugs.gnu.org and Leo Famulari <leo <at> famulari.name> Request was from maxim.cournoyer <at> gmail.com to control <at> debbugs.gnu.org. (Tue, 18 Aug 2020 04:09:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 15 Sep 2020 11:24:04 GMT) Full text and rfc822 format available.

bug unarchived. Request was from Christopher Lemmer Webber <cwebber <at> dustycloud.org> to control <at> debbugs.gnu.org. (Fri, 27 Nov 2020 22:59:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#37309; Package guix. (Fri, 27 Nov 2020 23:02:01 GMT) Full text and rfc822 format available.

Message #49 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Christopher Lemmer Webber <cwebber <at> dustycloud.org>
To: Giovanni Biscuolo <g <at> xelera.eu>
Cc: bug-guix <at> gnu.org, 37309 <at> debbugs.gnu.org
Subject: Re: bug#37309: ‘ssh-daemon’ service fails
 to start at boot
Date: Fri, 27 Nov 2020 18:00:48 -0500
Giovanni Biscuolo writes:

> Hi,
>
> following a recent discussion on guix-sysadmin I have to confirm the
> ssh-daemon issue since it is still happening on some of the machines I
> administer
>
> Previous possibly related bug reports are
> https://issues.guix.gnu.org/issue/30993 and
> https://issues.guix.gnu.org/issue/32197
>
> Unfortunately this issue is *not* well reproducible, it depends on some
> mysterious (to me) timing factor; AFAIU it does *not* depend on the
> shepherd version, probably it depends on "something" related to IPv6
> (read below the details)

This issue continues to plauge me, and has ever since I started to use
GuixSD.  However it is much worse now that I am running Guix on
servers... I frequently have to log in via Linode's (nonfree!) web
console on every server that is rebooted and kick herd to restart
openssh.  Once I do that it's fine.

I don't think my linode machine is on "spinning rust" so I don't think
this is the cause.  IPv6, maybe?  Dunno what.

However I think that it's probably really a dependency issue somewhere;
herd is starting opensshd before some other dependent service is
spawned.  But what?  Maybe something authentication related like
networking, or something.  But hm, networking is required...

I'm assuming others must be experiencing this still too... right?

Would really like to see it fixed.  It's one of the few things holding
me back from recommending Guix on servers to others.

Do others have any idea?

I noticed the lsh daemon requires networking.  Why doesn't openssh?

What about the following "fix"?

diff --git a/gnu/services/ssh.scm b/gnu/services/ssh.scm
index 1891db0487..c9bd62bab7 100644
--- a/gnu/services/ssh.scm
+++ b/gnu/services/ssh.scm
@@ -508,7 +508,7 @@ of user-name/file-like tuples."
 
   (list (shepherd-service
          (documentation "OpenSSH server.")
-         (requirement '(syslogd loopback))
+         (requirement '(syslogd networking loopback))
          (provision '(ssh-daemon ssh sshd))
          (start #~(make-forkexec-constructor #$openssh-command
                                              #:pid-file #$pid-file))




Information forwarded to bug-guix <at> gnu.org:
bug#37309; Package guix. (Fri, 27 Nov 2020 23:02:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#37309; Package guix. (Sat, 28 Nov 2020 01:09:02 GMT) Full text and rfc822 format available.

Message #55 received at 37309 <at> debbugs.gnu.org (full text, mbox):

From: Marius Bakke <marius <at> gnu.org>
To: Christopher Lemmer Webber <cwebber <at> dustycloud.org>, Giovanni Biscuolo
 <g <at> xelera.eu>
Cc: 37309 <at> debbugs.gnu.org
Subject: Re: bug#37309: ‘ssh-daemon’ service fails
 to start at boot
Date: Sat, 28 Nov 2020 02:08:34 +0100
[Message part 1 (text/plain, inline)]
Christopher Lemmer Webber <cwebber <at> dustycloud.org> skriver:

> Giovanni Biscuolo writes:
>
>> Hi,
>>
>> following a recent discussion on guix-sysadmin I have to confirm the
>> ssh-daemon issue since it is still happening on some of the machines I
>> administer
>>
>> Previous possibly related bug reports are
>> https://issues.guix.gnu.org/issue/30993 and
>> https://issues.guix.gnu.org/issue/32197
>>
>> Unfortunately this issue is *not* well reproducible, it depends on some
>> mysterious (to me) timing factor; AFAIU it does *not* depend on the
>> shepherd version, probably it depends on "something" related to IPv6
>> (read below the details)
>
> This issue continues to plauge me, and has ever since I started to use
> GuixSD.  However it is much worse now that I am running Guix on
> servers... I frequently have to log in via Linode's (nonfree!) web
> console on every server that is rebooted and kick herd to restart
> openssh.  Once I do that it's fine.

Can you share an excerpt of /var/log/messages (ideally the whole boot
sequence) from when SSH failed to start?

> I don't think my linode machine is on "spinning rust" so I don't think
> this is the cause.  IPv6, maybe?  Dunno what.
>
> However I think that it's probably really a dependency issue somewhere;
> herd is starting opensshd before some other dependent service is
> spawned.  But what?  Maybe something authentication related like
> networking, or something.  But hm, networking is required...
>
> I'm assuming others must be experiencing this still too... right?

FWIW I have never encountered this.  :-/

> Would really like to see it fixed.  It's one of the few things holding
> me back from recommending Guix on servers to others.
>
> Do others have any idea?
>
> I noticed the lsh daemon requires networking.  Why doesn't openssh?

It's really for legacy reasons, from before we had the Guix System
installer.  Then a common way to install was to run dhclient and
"herd start ssh-daemon" manually on the live image, so people could
do the installation over SSH:

  https://issues.guix.gnu.org/26548#5

Nowadays, the installer gives a nice and quick way to deploy a minimal
system, and I suspect the SSH method has fallen out of favor.

> What about the following "fix"?

[...]

>    (list (shepherd-service
>           (documentation "OpenSSH server.")
> -         (requirement '(syslogd loopback))
> +         (requirement '(syslogd networking loopback))

If it works for you, let's do this.  It would be good to find the
underlying cause though...

Not sure what to do about the installer however: perhaps create
yet-another undocumented field of openssh-service-type that makes the
networking requirement optional?
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#37309; Package guix. (Thu, 03 Dec 2020 20:40:02 GMT) Full text and rfc822 format available.

Message #58 received at 37309 <at> debbugs.gnu.org (full text, mbox):

From: Leo Famulari <leo <at> famulari.name>
To: Marius Bakke <marius <at> gnu.org>
Cc: Christopher Lemmer Webber <cwebber <at> dustycloud.org>,
 Giovanni Biscuolo <g <at> xelera.eu>, 37309 <at> debbugs.gnu.org
Subject: Re: bug#37309: ‘ssh-daemon’
 service fails to start at boot
Date: Thu, 3 Dec 2020 15:38:59 -0500
[Message part 1 (text/plain, inline)]
On Sat, Nov 28, 2020 at 02:08:34AM +0100, Marius Bakke wrote:
> Christopher Lemmer Webber <cwebber <at> dustycloud.org> skriver:
> > I'm assuming others must be experiencing this still too... right?
> 
> FWIW I have never encountered this.  :-/

I reenabled IPv6 listening for sshd after updating to 1.2.0 and things
are working for now. The problem has always been intermittent for me in
the past.

Chris, are you using an old Thinkpad too?
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#37309; Package guix. (Thu, 03 Dec 2020 21:58:01 GMT) Full text and rfc822 format available.

Message #61 received at 37309 <at> debbugs.gnu.org (full text, mbox):

From: Christopher Lemmer Webber <cwebber <at> dustycloud.org>
To: Leo Famulari <leo <at> famulari.name>
Cc: Giovanni Biscuolo <g <at> xelera.eu>, 37309 <at> debbugs.gnu.org,
 Marius Bakke <marius <at> gnu.org>
Subject: Re: bug#37309: ‘ssh-daemon’ service fails
 to start at boot
Date: Thu, 03 Dec 2020 16:56:40 -0500
Leo Famulari writes:

> On Sat, Nov 28, 2020 at 02:08:34AM +0100, Marius Bakke wrote:
>> Christopher Lemmer Webber <cwebber <at> dustycloud.org> skriver:
>> > I'm assuming others must be experiencing this still too... right?
>> 
>> FWIW I have never encountered this.  :-/
>
> I reenabled IPv6 listening for sshd after updating to 1.2.0 and things
> are working for now. The problem has always been intermittent for me in
> the past.
>
> Chris, are you using an old Thinkpad too?

I did experience it on an old thinkpad, though in this case it's
happening on the Linode server I'm running.  Not particularly old, but
probably shared by many users and thus slower in some way.

That's part of what makes me think this is some kind of race
condition...




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 01 Jan 2021 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 116 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.