GNU bug report logs - #44387
SLURM client version must match daemon version

Previous Next

Package: guix;

Reported by: Ludovic Courtès <ludovic.courtes <at> inria.fr>

Date: Mon, 2 Nov 2020 09:12:02 UTC

Severity: normal

To reply to this bug, email your comments to 44387 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to rekado <at> elephly.net, bug-guix <at> gnu.org:
bug#44387; Package guix. (Mon, 02 Nov 2020 09:12:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ludovic Courtès <ludovic.courtes <at> inria.fr>:
New bug report received and forwarded. Copy sent to rekado <at> elephly.net, bug-guix <at> gnu.org. (Mon, 02 Nov 2020 09:12:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludovic.courtes <at> inria.fr>
To: <bug-guix <at> gnu.org>
Subject: SLURM client version must match daemon version
Date: Mon, 02 Nov 2020 10:10:55 +0100
Hello,

We’ve noticed the problem below on clusters running a foreign distro
when slurmd is version 19.x and our clients are version 20.x:

--8<---------------cut here---------------start------------->8---
[courtes <at> devel01 ~]$ guix time-machine --commit=2f107f273de3db1d01bdec66b13334edef7ad036 -- package -A slurm
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
python-slurm-magic      0.0-0.73dd1a2   out     gnu/packages/parallel.scm:225:4
slurm   20.02.5 out     gnu/packages/parallel.scm:109:2
slurm-drmaa     1.1.1   out     gnu/packages/parallel.scm:194:2
[courtes <at> devel01 ~]$ guix time-machine --commit=2f107f273de3db1d01bdec66b13334edef7ad036 -- environment --ad-hoc slurm -- squeue
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
slurm_load_jobs error: Zero Bytes were transmitted or received
[courtes <at> devel01 ~]$ guix time-machine --commit=09b00a62b297edb92ac4dde6f4838261ac0cad16 -- package -A slurm
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
python-slurm-magic      0.0-0.73dd1a2   out     gnu/packages/parallel.scm:225:4
slurm   19.05.3-2       out     gnu/packages/parallel.scm:109:2
slurm-drmaa     1.1.1   out     gnu/packages/parallel.scm:194:2
[courtes <at> devel01 ~]$ guix time-machine --commit=09b00a62b297edb92ac4dde6f4838261ac0cad16 -- environment --ad-hoc slurm -- squeue
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
[courtes <at> devel01 ~]$ /usr/bin/squeue --version
slurm 19.05.2
--8<---------------cut here---------------end--------------->8---

It means that we cannot generally use the Guix-provided SLURM on
clusters running foreign distros.

<https://slurm.schedmd.com/troubleshoot.html#network> reads:

  Slurm daemons will support RPCs and state files from the two previous
  major releases (e.g. a version 17.11.x SlurmDBD will support slurmctld
  daemons and commands with a version of 17.11.x, 17.02.x or 16.05.x).

Looking at <https://download.schedmd.com/slurm/>, there’s been quite a
few releases between 19.05.3-2 and 20.02.5, which may explain the
problem I described.


Apparently the only .so in Open MPI linked against SLURM is
‘lib/openmpi/mca_pmix_s1.so’.  The diff suggests that the two versions are
not ABI-compatible, so one wouldn’t be able to use ‘--with-graft’ to
graft one version in lieu of the other:

--8<---------------cut here---------------start------------->8---
[courtes <at> devel01 ~]$ guix time-machine --commit=09b00a62b297edb92ac4dde6f4838261ac0cad16 -- build slurm
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
/gnu/store/37b7qnwck4pg51qia4w002i62g156xgw-slurm-19.05.3-2
[courtes <at> devel01 ~]$ guix time-machine --commit=2f107f273de3db1d01bdec66b13334edef7ad036 -- build slurm
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
/gnu/store/7n6aks2wcmn2pxv03q8ij38hsj9zfzk9-slurm-20.02.5
[courtes <at> devel01 ~]$ abidiff --stat /gnu/store/37b7qnwck4pg51qia4w002i62g156xgw-slurm-19.05.3-2/lib/slurm/libslurmfull.so /gnu/store/7n6aks2wcmn2pxv03q8ij38hsj9zfzk9-slurm-20.02.5/lib/slurm/libslurmfull.so
Functions changes summary: 0 Removed, 0 Changed, 0 Added function
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
Function symbols changes summary: 80 Removed, 162 Added function symbols not referenced by debug info
Variable symbols changes summary: 3 Removed, 0 Added variable symbols not referenced by debug info
--8<---------------cut here---------------end--------------->8---

What can we do about it?

At least, we should package several known-useful versions, so that
people can use ‘--with-input=slurm <at> X=slurm <at> Y’ (if needed) or explicitly
refer to the version they want in their profile.  I’ll work on that.

Anything else?

I heard that PMIx, a scheduler-independent API, will eventually
supersede SLURM in Open MPI.  Let’s see if that loosens version
requirements.

Thanks,
Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#44387; Package guix. (Mon, 02 Nov 2020 14:37:02 GMT) Full text and rfc822 format available.

Message #8 received at 44387 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 44387 <at> debbugs.gnu.org
Cc: Ricardo Wurmus <rekado <at> elephly.net>
Subject: Re: bug#44387: SLURM client version must match daemon version
Date: Mon, 02 Nov 2020 15:36:50 +0100
Ludovic Courtès <ludovic.courtes <at> inria.fr> skribis:

> At least, we should package several known-useful versions, so that
> people can use ‘--with-input=slurm <at> X=slurm <at> Y’ (if needed) or explicitly
> refer to the version they want in their profile.  I’ll work on that.

I’ve reintroduced version 19.05:

  https://git.savannah.gnu.org/cgit/guix.git/commit/?id=e1bd62eb5ce0f2410b2607f157989588791b43e0




Information forwarded to bug-guix <at> gnu.org:
bug#44387; Package guix. (Mon, 02 Nov 2020 16:27:02 GMT) Full text and rfc822 format available.

Message #11 received at 44387 <at> debbugs.gnu.org (full text, mbox):

From: Ricardo Wurmus <rekado <at> elephly.net>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 44387 <at> debbugs.gnu.org
Subject: Re: bug#44387: SLURM client version must match daemon version
Date: Mon, 02 Nov 2020 17:27:54 +0100
Ludovic Courtès <ludo <at> gnu.org> writes:

> Ludovic Courtès <ludovic.courtes <at> inria.fr> skribis:
>
>> At least, we should package several known-useful versions, so that
>> people can use ‘--with-input=slurm <at> X=slurm <at> Y’ (if needed) or explicitly
>> refer to the version they want in their profile.  I’ll work on that.
>
> I’ve reintroduced version 19.05:
>
>   https://git.savannah.gnu.org/cgit/guix.git/commit/?id=e1bd62eb5ce0f2410b2607f157989588791b43e0

Good call.  It seems like a good idea to keep older major versions
around.

There’s a similar problem with postgres, which needs (or used to need)
more than one version to upgrade existing data from an older version.

-- 
Ricardo




This bug report was last modified 3 years and 169 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.