GNU bug report logs -
#44387
SLURM client version must match daemon version
Previous Next
To reply to this bug, email your comments to 44387 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
rekado <at> elephly.net, bug-guix <at> gnu.org
:
bug#44387
; Package
guix
.
(Mon, 02 Nov 2020 09:12:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Ludovic Courtès <ludovic.courtes <at> inria.fr>
:
New bug report received and forwarded. Copy sent to
rekado <at> elephly.net, bug-guix <at> gnu.org
.
(Mon, 02 Nov 2020 09:12:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello,
We’ve noticed the problem below on clusters running a foreign distro
when slurmd is version 19.x and our clients are version 20.x:
--8<---------------cut here---------------start------------->8---
[courtes <at> devel01 ~]$ guix time-machine --commit=2f107f273de3db1d01bdec66b13334edef7ad036 -- package -A slurm
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
python-slurm-magic 0.0-0.73dd1a2 out gnu/packages/parallel.scm:225:4
slurm 20.02.5 out gnu/packages/parallel.scm:109:2
slurm-drmaa 1.1.1 out gnu/packages/parallel.scm:194:2
[courtes <at> devel01 ~]$ guix time-machine --commit=2f107f273de3db1d01bdec66b13334edef7ad036 -- environment --ad-hoc slurm -- squeue
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
slurm_load_jobs error: Zero Bytes were transmitted or received
[courtes <at> devel01 ~]$ guix time-machine --commit=09b00a62b297edb92ac4dde6f4838261ac0cad16 -- package -A slurm
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
python-slurm-magic 0.0-0.73dd1a2 out gnu/packages/parallel.scm:225:4
slurm 19.05.3-2 out gnu/packages/parallel.scm:109:2
slurm-drmaa 1.1.1 out gnu/packages/parallel.scm:194:2
[courtes <at> devel01 ~]$ guix time-machine --commit=09b00a62b297edb92ac4dde6f4838261ac0cad16 -- environment --ad-hoc slurm -- squeue
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
[courtes <at> devel01 ~]$ /usr/bin/squeue --version
slurm 19.05.2
--8<---------------cut here---------------end--------------->8---
It means that we cannot generally use the Guix-provided SLURM on
clusters running foreign distros.
<https://slurm.schedmd.com/troubleshoot.html#network> reads:
Slurm daemons will support RPCs and state files from the two previous
major releases (e.g. a version 17.11.x SlurmDBD will support slurmctld
daemons and commands with a version of 17.11.x, 17.02.x or 16.05.x).
Looking at <https://download.schedmd.com/slurm/>, there’s been quite a
few releases between 19.05.3-2 and 20.02.5, which may explain the
problem I described.
Apparently the only .so in Open MPI linked against SLURM is
‘lib/openmpi/mca_pmix_s1.so’. The diff suggests that the two versions are
not ABI-compatible, so one wouldn’t be able to use ‘--with-graft’ to
graft one version in lieu of the other:
--8<---------------cut here---------------start------------->8---
[courtes <at> devel01 ~]$ guix time-machine --commit=09b00a62b297edb92ac4dde6f4838261ac0cad16 -- build slurm
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
/gnu/store/37b7qnwck4pg51qia4w002i62g156xgw-slurm-19.05.3-2
[courtes <at> devel01 ~]$ guix time-machine --commit=2f107f273de3db1d01bdec66b13334edef7ad036 -- build slurm
Mise à jour du canal « guix » depuis le dépôt Git « https://git.savannah.gnu.org/git/guix.git »...
/gnu/store/7n6aks2wcmn2pxv03q8ij38hsj9zfzk9-slurm-20.02.5
[courtes <at> devel01 ~]$ abidiff --stat /gnu/store/37b7qnwck4pg51qia4w002i62g156xgw-slurm-19.05.3-2/lib/slurm/libslurmfull.so /gnu/store/7n6aks2wcmn2pxv03q8ij38hsj9zfzk9-slurm-20.02.5/lib/slurm/libslurmfull.so
Functions changes summary: 0 Removed, 0 Changed, 0 Added function
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
Function symbols changes summary: 80 Removed, 162 Added function symbols not referenced by debug info
Variable symbols changes summary: 3 Removed, 0 Added variable symbols not referenced by debug info
--8<---------------cut here---------------end--------------->8---
What can we do about it?
At least, we should package several known-useful versions, so that
people can use ‘--with-input=slurm <at> X=slurm <at> Y’ (if needed) or explicitly
refer to the version they want in their profile. I’ll work on that.
Anything else?
I heard that PMIx, a scheduler-independent API, will eventually
supersede SLURM in Open MPI. Let’s see if that loosens version
requirements.
Thanks,
Ludo’.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#44387
; Package
guix
.
(Mon, 02 Nov 2020 14:37:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 44387 <at> debbugs.gnu.org (full text, mbox):
Ludovic Courtès <ludovic.courtes <at> inria.fr> skribis:
> At least, we should package several known-useful versions, so that
> people can use ‘--with-input=slurm <at> X=slurm <at> Y’ (if needed) or explicitly
> refer to the version they want in their profile. I’ll work on that.
I’ve reintroduced version 19.05:
https://git.savannah.gnu.org/cgit/guix.git/commit/?id=e1bd62eb5ce0f2410b2607f157989588791b43e0
Information forwarded
to
bug-guix <at> gnu.org
:
bug#44387
; Package
guix
.
(Mon, 02 Nov 2020 16:27:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 44387 <at> debbugs.gnu.org (full text, mbox):
Ludovic Courtès <ludo <at> gnu.org> writes:
> Ludovic Courtès <ludovic.courtes <at> inria.fr> skribis:
>
>> At least, we should package several known-useful versions, so that
>> people can use ‘--with-input=slurm <at> X=slurm <at> Y’ (if needed) or explicitly
>> refer to the version they want in their profile. I’ll work on that.
>
> I’ve reintroduced version 19.05:
>
> https://git.savannah.gnu.org/cgit/guix.git/commit/?id=e1bd62eb5ce0f2410b2607f157989588791b43e0
Good call. It seems like a good idea to keep older major versions
around.
There’s a similar problem with postgres, which needs (or used to need)
more than one version to upgrade existing data from an older version.
--
Ricardo
This bug report was last modified 4 years and 109 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.