GNU bug report logs - #70761
Guix deploy cannot reboot remote machines without an error

Previous Next

Package: guix;

Reported by: Richard Sent <richard <at> freakingpenguin.com>

Date: Fri, 3 May 2024 21:59:02 UTC

Severity: normal

To reply to this bug, email your comments to 70761 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#70761; Package guix. (Fri, 03 May 2024 21:59:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Richard Sent <richard <at> freakingpenguin.com>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Fri, 03 May 2024 21:59:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Richard Sent <richard <at> freakingpenguin.com>
To: bug-guix <at> gnu.org
Subject: Guix deploy cannot reboot remote machines without an error
Date: Fri, 03 May 2024 17:57:38 -0400
Hi Guix,

One neat feature of guix deploy is the ability to run a command on a
list of remote machines. One command that would commonly be run is
reboot, so that upgrades to, say, the Linux kernel take effect.

While the command itself /does/ run and the system /does/ restart, guix
deploy doesn't gracefully handle the connection loss. The first machine
rebooted will throw an error, halting the reboot of the rest of the
machines in the list.

tmux and screen aren't really compatible with '-x -- <command>' since
you can't start and detach from sessions and the & in '-x -- nohup
<command> &' gets swallowed by the host shell. I haven't found a
workaround. Even if they did work, I suspect there is a race condition
between "host closes session" and "remote restarts".

We shouldn't assume that any command may potentially close the SSH
session and catch errors by default.

One solution could be adding an alternative to -x that nohup's the
command and attempts to cleanly close the SSH session. If the session
errors out after the command is nohup'd (e.g. reboot race condition),
catch the SSH error and exit.

Alternatively we could add a --reboot flag, although I prefer the more
general solution.

Perhaps this can be the impetus for implementing the "deploy-hook"
functionality described at https://issues.guix.gnu.org/53486. In a
particularly fancy world, we could combine rebooting with pre and post
reboot command execution, but now I'm thinking of pies 🥧 in skies ⛅.

Or maybe I'm completely wrong and this is possible (sorry!), in which
case we probably could add a quick mention of it in the manual.

--8<---------------cut here---------------start------------->8---
gibraltar :( rsent$ guix deploy rsent/machines/lan.scm --no-grafts -x -- reboot
guix deploy: warning: <machine-ssh-configuration> without a 'host-key' is deprecated
guix deploy: sending 1 store item (0 MiB) to 'horizon.local'...
;;; [2024/05/03 17:16:45.032361, 0] [GSSH ERROR] Parent session is not connected: #<unknown channel (freed) 7fbe66aa71a0>
Backtrace:
          16 (primitive-load "/home/richard/.config/guix/current/bin/guix")
In guix/ui.scm:
   2312:7 15 (run-guix . _)
  2275:10 14 (run-guix-command _ . _)
In ice-9/boot-9.scm:
  1752:10 13 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
In guix/status.scm:
    839:4 12 (call-with-status-report _ _)
In ice-9/boot-9.scm:
  1752:10 11 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
In guix/store.scm:
   666:37 10 (thunk)
   1302:8  9 (call-with-build-handler _ _)
   1302:8  8 (call-with-build-handler #<procedure 7fbe69be4690 at guix/ui.scm:1222:2 (continue store things mode)> _)
In guix/scripts/deploy.scm:
   274:23  7 (_)
In srfi/srfi-1.scm:
   460:18  6 (fold #<procedure 7fbe78c25540 at guix/scripts/deploy.scm:274:28 (machine result)> #t (#<<machine> operating-system: #<<operating-system> ke…>))
In guix/scripts/deploy.scm:
    214:2  5 (_ #<<machine> operating-system: #<<operating-system> kernel: #<package linux <at> 6.8.8 nongnu/packages/linux.scm:118 7fbe6343c0b0> kernel-loada…> …)
In guix/store.scm:
  2182:25  4 (run-with-store #<store-connection 256.100 7fbe78cf8960> #<procedure 7fbe66b423c0 at guix/remote.scm:119:2 (state)> #:guile-for-build _ # _ # _)
In guix/remote.scm:
    72:20  3 (_ _)
In unknown file:
           2 (channel-get-exit-status #<unknown channel (freed) 7fbe66aa71a0>)
In ice-9/boot-9.scm:
  1685:16  1 (raise-exception _ #:continuable? _)
  1685:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Throw to key `guile-ssh-error' with args `("channel-get-exit-status" "Parent session is not connected" #<unknown channel (freed) 7fbe66aa71a0> #f)'.
--8<---------------cut here---------------end--------------->8---


-- 
Take it easy,
Richard Sent
Making my computer weirder one commit at a time.




This bug report was last modified 14 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.