GNU bug report logs - #39588
gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpich, python-mpi4py-mpich

Previous Next

Package: guix-patches;

Reported by: Maurice Brémond <Maurice.Bremond <at> inria.fr>

Date: Thu, 13 Feb 2020 14:36:02 UTC

Severity: normal

Done: Ludovic Courtès <ludovic.courtes <at> inria.fr>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 39588 in the body.
You can then email your comments to 39588 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Thu, 13 Feb 2020 14:36:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Maurice Brémond <Maurice.Bremond <at> inria.fr>:
New bug report received and forwarded. Copy sent to guix-patches <at> gnu.org. (Thu, 13 Feb 2020 14:36:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Maurice Brémond <Maurice.Bremond <at> inria.fr>
To: guix-patches <at> gnu.org
Subject: gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpich,
 python-mpi4py-mpich
Date: Thu, 13 Feb 2020 11:44:07 +0100
[0001-gnu-Add-mpich-scalapack-mpich-mumps-mpich-pt-scotch-.patch (text/x-diff, inline)]
From 4ba3d43d97389c76a06c342d4c4ee91f02d8aaf1 Mon Sep 17 00:00:00 2001
From: Maurice Bremond <Maurice.Bremond <at> inria.fr>
Date: Thu, 23 Jan 2020 15:48:41 +0100
Subject: [PATCH] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich.

---
 gnu/packages/maths.scm |  34 +++++++++++++
 gnu/packages/mpi.scm   | 108 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 142 insertions(+)

diff --git a/gnu/packages/maths.scm b/gnu/packages/maths.scm
index 8f4478b6bb..feba11663f 100644
--- a/gnu/packages/maths.scm
+++ b/gnu/packages/maths.scm
@@ -643,6 +643,14 @@ singular value problems.")
     (license (license:non-copyleft "file://LICENSE"
                                    "See LICENSE in the distribution."))))
 
+(define-public scalapack-mpich
+  (package
+    (inherit scalapack)
+    (name "scalapack-mpich")
+    (inputs
+     `(("mpi" ,mpich)
+       ,@(alist-delete "mpi" (package-inputs scalapack))))))
+
 (define-public gnuplot
   (package
     (name "gnuplot")
@@ -2371,6 +2379,24 @@ sparse system of linear equations A x = b using Gaussian elimination.")
     (inputs
      (alist-delete "pt-scotch" (package-inputs mumps-openmpi)))))
 
+(define-public mumps-mpich
+  (package (inherit mumps)
+    (name "mumps-mpich")
+    (inputs
+     `(("mpi" ,mpich)
+       ("scalapack" ,scalapack-mpich)
+       ("pt-scotch" ,pt-scotch-mpich)
+       ,@(alist-delete "scotch" (package-inputs mumps))))
+    (arguments
+     (substitute-keyword-arguments (package-arguments mumps)
+       ((#:phases phases)
+        `(modify-phases ,phases
+           (replace 'check
+             (lambda _
+               ((assoc-ref ,phases 'check)
+                #:exec-prefix '("mpirun" "-genv" "LD_LIBRARY_PATH" "../lib" "-n" "2" ))))))))
+    (synopsis "Multifrontal sparse direct solver (with MPI)")))
+
 (define-public ruby-asciimath
   (package
     (name "ruby-asciimath")
@@ -2729,6 +2755,14 @@ YACC = bison -pscotchyy -y -b y
                (invoke "make" "ptcheck")))))))
     (synopsis "Programs and libraries for graph algorithms (with MPI)")))
 
+(define-public pt-scotch-mpich
+  (package
+    (inherit pt-scotch)
+    (name "pt-scotch-mpich")
+    (inputs
+     `(("mpi" ,mpich)
+       ,@(alist-delete "mpi" (package-inputs pt-scotch))))))
+
 (define-public pt-scotch32
   (package (inherit scotch32)
     (name "pt-scotch32")
diff --git a/gnu/packages/mpi.scm b/gnu/packages/mpi.scm
index 00e0d12eab..3f644e7dbb 100644
--- a/gnu/packages/mpi.scm
+++ b/gnu/packages/mpi.scm
@@ -29,16 +29,23 @@
   #:use-module ((guix licenses)
                 #:hide (expat))
   #:use-module (guix download)
+  #:use-module (guix git-download)
   #:use-module (guix utils)
   #:use-module (guix deprecation)
   #:use-module (guix build-system gnu)
   #:use-module (guix build-system python)
   #:use-module (gnu packages)
+  #:use-module (gnu packages autogen)
+  #:use-module (gnu packages autotools)
+  #:use-module (gnu packages base)
+  #:use-module (gnu packages bash)
   #:use-module (gnu packages fabric-management)
+  #:use-module (gnu packages gawk)
   #:use-module (gnu packages gcc)
   #:use-module (gnu packages java)
   #:use-module (gnu packages libevent)
   #:use-module (gnu packages linux)
+  #:use-module (gnu packages m4)
   #:use-module (gnu packages pciutils)
   #:use-module (gnu packages xorg)
   #:use-module (gnu packages gtk)
@@ -47,6 +54,7 @@
   #:use-module (gnu packages ncurses)
   #:use-module (gnu packages parallel)
   #:use-module (gnu packages pkg-config)
+  #:use-module (gnu packages version-control)
   #:use-module (gnu packages valgrind)
   #:use-module (srfi srfi-1)
   #:use-module (ice-9 match))
@@ -393,3 +401,103 @@ supports point-to-point and collective communications of any picklable Python
 object as well as optimized communications of Python objects (such as NumPy
 arrays) that expose a buffer interface.")
     (license bsd-3)))
+
+(define-public mpich
+   (package
+     (name "mpich")
+     (version "3.4a2")
+     (source (origin
+               (method git-fetch)
+               (uri (git-reference
+                     (url "https://github.com/pmodels/mpich")
+                     (commit "644051d13dc20aecd460ba3db088756659c3dad3") ; tag v3.4a2
+                     (recursive? #t)))
+               (sha256
+                (base32
+                 "02ildr7wh40q1qaq5k8npb6vw6kv9szmrm3lspr6skqa5csmrrxw"))))
+     (build-system gnu-build-system)
+     (inputs
+      `(("libnuma" ,numactl))) ; for ch4:ucx
+     (native-inputs
+      `(("gawk" ,gawk)
+        ("bash" ,bash)
+        ("diffutils" ,diffutils)
+        ("git" ,git)
+        ("sed" ,sed)
+        ("perl" ,perl)
+        ("patch" ,patch)
+        ("findutils" ,findutils)
+        ("m4" ,m4)
+        ("grep" ,grep)
+        ("which" ,which)
+        ("gfortran" ,gfortran)
+        ("gcc" ,gcc)
+        ("gnu-make" ,gnu-make)
+        ("autoconf" ,autoconf)
+        ("automake" ,automake)
+        ("libtool" ,libtool)
+        ("autogen" ,autogen)
+        ("zlib" ,(@ (gnu packages compression) zlib))))
+     (outputs '("out" "debug"))
+     (arguments
+      `(#:modules ((ice-9 match)
+                   (ice-9 popen)
+                   (srfi srfi-1)
+                   ,@%gnu-build-system-modules)
+        #:configure-flags
+        '("--disable-dependency-tracking"
+          "--enable-debuginfo"
+          "--with-device=ch4:ucx") ; --with-device=ch4:ofi segfault in tests
+        #:phases
+        (modify-phases %standard-phases
+          (add-after 'unpack 'patch-sources
+            (lambda _
+              (substitute* "./maint/gen_subcfg_m4"
+                (("/usr/bin/env") (which "env")))
+              (substitute* "src/glue/romio/all_romio_symbols"
+                (("/usr/bin/env") (which "env")))
+              (substitute* (find-files "." "buildiface")
+                (("/usr/bin/env") (which "env")))
+              (substitute* "maint/extracterrmsgs"
+                (("/usr/bin/env") (which "env")))
+              (substitute* (find-files "." "f77tof90")
+                (("/usr/bin/env") (which "env")))
+              (substitute* (find-files "." "\\.sh$")
+                (("/bin/sh") (which "sh")))))
+          (add-after 'bootstrap 'patch-after-bootstrap
+                     (lambda _
+                       (use-modules (ice-9 popen)
+                                    (ice-9 rdelim))
+                       (substitute* "maint/configure"
+                                    (("/bin/sh") (which "sh")))
+                       (with-directory-excursion
+                        "maint"
+                        (invoke "sh" "./configure")
+                        (let ((cvardirs
+                               (let* ((p (open-pipe* OPEN_READ
+                                                     "cat" "cvardirs"))
+                                      (l (read-line p)))
+                                 (and (zero? (close-pipe p)) l))))
+                          (invoke "perl" "extractcvars" "--dirs" cvardirs))))))))
+     (home-page "https://www.mpich.org/")
+     (synopsis "MPICH is a high performance and widely portable
+implementation of the Message Passing Interface (MPI) standard.")
+     (description "MPICH is a high-performance and widely portable
+implementation of the Message Passing Interface (MPI) standard (MPI-1,
+MPI-2 and MPI-3). The goals of MPICH are: (1) to provide an MPI
+implementation that efficiently supports different computation and
+communication platforms including commodity clusters (desktop systems,
+shared-memory systems, multicore architectures), high-speed
+networks (10 Gigabit Ethernet, InfiniBand, Myrinet, Quadrics) and
+proprietary high-end computing systems (Blue Gene, Cray) and (2) to
+enable cutting-edge research in MPI through an easy-to-extend modular
+framework for other derived implementations.")
+     (license bsd-2)))
+
+(define-public python-mpi4py-mpich
+  (package
+    (inherit python-mpi4py)
+    (name "python-mpi4py-mpich")
+    (inputs
+     `(("mpi" ,mpich)
+       ,@(alist-delete "mpi" (package-inputs python-mpi4py))))))
-- 
2.17.1





Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Mon, 17 Feb 2020 17:27:01 GMT) Full text and rfc822 format available.

Message #8 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Maurice Brémond <Maurice.Bremond <at> inria.fr>
Cc: 39588 <at> debbugs.gnu.org
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Mon, 17 Feb 2020 18:26:05 +0100
Hi Maurice,

Thanks for the patches!  We like to have one patch per package, so I
started with MPICH:

Maurice Brémond <Maurice.Bremond <at> inria.fr> skribis:

> +(define-public mpich
> +   (package
> +     (name "mpich")
> +     (version "3.4a2")
> +     (source (origin
> +               (method git-fetch)
> +               (uri (git-reference
> +                     (url "https://github.com/pmodels/mpich")
> +                     (commit "644051d13dc20aecd460ba3db088756659c3dad3") ; tag v3.4a2
> +                     (recursive? #t)))
> +               (sha256
> +                (base32
> +                 "02ildr7wh40q1qaq5k8npb6vw6kv9szmrm3lspr6skqa5csmrrxw"))))

I ended up modifying the package:

  • To use 3.3 instead of 3.4a, the latter being listed as “alpha”;

  • To build from an official tarball rather than from a Git checkout,
    so that the GNU build system is already bootstrapped;

  • To ensure that the bundled copies of hwloc and ucx are not used.

I pushed the result here:

  https://git.savannah.gnu.org/cgit/guix.git/commit/?id=c70261bfb993cebc23cd80042de3f52a8b7932a4

Let me know if I broke something.

As for the “-mpich” packages: they look good to me, though I’m not
entirely sure whether we should create “-mpich” variants for each of
them.  Ideally ‘--with-inputs’ would be enough, but I don’t know.  At
the same time, those variants don’t cost us much, so if they’re useful,
why not.

Thoughts, HPC folks?

Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Mon, 17 Feb 2020 18:22:02 GMT) Full text and rfc822 format available.

Message #11 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Maurice Brémond <Maurice.Bremond <at> inria.fr>,
 39588 <at> debbugs.gnu.org
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Mon, 17 Feb 2020 19:20:44 +0100
Hi,

Thank you Maurice for the packages! :-)


On Mon, 17 Feb 2020 at 18:27, Ludovic Courtès <ludo <at> gnu.org> wrote:

> As for the “-mpich” packages: they look good to me, though I’m not
> entirely sure whether we should create “-mpich” variants for each of
> them.  Ideally ‘--with-inputs’ would be enough, but I don’t know.  At
> the same time, those variants don’t cost us much, so if they’re useful,
> why not.

Is it not related to "package parameters" or the discussion we had
about rebuilding everything with another compiler?
Other said, '--with-inputs' will do the job for explicit packages but
not the implicit ones.

One easy move should to generalize -- if possible -- what is done in
'with-python2' or 'with-ocaml4.07'. But I am not convinced it is easy
because it is clearly dependant on the build system.

On the other hand, I gave a look at spack (after the discussion at
FOSDEM) and how they do. The WIP branch [1] about the solver is
interesting: possibly catch incompatibilities earlier using solver
(SAT or other) and specifications. But I am not convinced neither it
is the way to go because it adds a lot of complexity for a gain that
could be discussed. ;-)


[1] https://github.com/spack/spack/tree/features/solver/lib/spack/spack/solver


Well, for these particular patches, the variants are ok.
But we should think about how to ease the variant generation of all the chain.


Cheers,
simon




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Tue, 18 Feb 2020 18:02:02 GMT) Full text and rfc822 format available.

Message #14 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: Maurice Brémond <Maurice.Bremond <at> inria.fr>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 39588 <at> debbugs.gnu.org, zimoun <zimon.toutoune <at> gmail.com>
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Tue, 18 Feb 2020 18:58:11 +0100
Hi Ludovic & Simon!

I agree, the *-mpich packages are not necessary and this asks the
question of why those variants and why not others. Finally, I could keep
them on my own channel for convenience.

If I understand, in this case, the usage of --with-input is
possible because implicit packages are very likely to not use mpi ?

In guix packages, mpi input is usually declared as
(("openmpi" . ,openmpi))
or
(("mpi" . ,openmpi))

So two flags are necessary for the transformation.

Doing this, I ran into problems with your patch...

You can try with my original patch just a transformation of
mumps-openmpi into mumps-mpich:

guix time-machine --url=https://gitlab.inria.fr/bremond/guix.git \
  --branch=add-mpich -- \
  environment -K --pure --ad-hoc mumps-openmpi \
  --with-input=mpi=mpich --with-input=openmpi=mpich --

This works for me, I can use a similar command to compile and execute a
program which uses mumps and I can see with ldd that mpich is used.

Then with the current mpich patch on savannah master:

guix time-machine --commit=c70261bfb993cebc23cd80042de3f52a8b7932a4 -- \
  environment -K --pure --ad-hoc mumps-openmpi \
  --with-input=mpi=mpich --with-input=openmpi=mpich --

This fails on my machine for the pt-scotch check (there is the same
error with scalapack check)

Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(586)..............: 
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................: 
MPID_nem_init(324).................: 
MPID_nem_tcp_init(175).............: 
MPID_nem_tcp_get_business_card(401): 
MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0)

If I go into the build directory and launch the check manually after
sourcing the environment-variables file, it works...

So it seems that this is related to guix and the guixbuild environment in
the definition of the package.

Maurice




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Tue, 18 Feb 2020 18:23:02 GMT) Full text and rfc822 format available.

Message #17 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Maurice Brémond <Maurice.Bremond <at> inria.fr>
Cc: Ludovic Courtès <ludo <at> gnu.org>, 39588 <at> debbugs.gnu.org
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Tue, 18 Feb 2020 19:22:17 +0100
Hi Maurice,

On Tue, 18 Feb 2020 at 18:58, Maurice Brémond <Maurice.Bremond <at> inria.fr> wrote:

> If I understand, in this case, the usage of --with-input is
> possible because implicit packages are very likely to not use mpi ?

Maybe I miss the issue. I have not look at mumps and related since... years. :-)
(Neither your patches. :-D)

If mumps depends explicitly on openmpi, then '--with-inputs' can
rewrite the direct dependencies, by providing say mpich instead of
openmpi.
If petsc* depends explicitly on openmpi and on mumps (which depends
explicitly on openmpi too), then '--with-inputs=openmpi=mpich'  will
*only* rewrite the dependency of petsc but not of mumps. So it ends
with petsc compiled with mpich and mumps with openmpi.

Still considering this (fictive) example, where:
 - petsc depends on openmpi(1) and mumps
 - mumps depends on openmpi(2)
The openmpi(2) is an implicit dependency for petsc and '--with-inputs'
does not work.

*because I know better PETSc than Scotch. ;-)



> You can try with my original patch just a transformation of
> mumps-openmpi into mumps-mpich:
>
> guix time-machine --url=https://gitlab.inria.fr/bremond/guix.git \
>   --branch=add-mpich -- \
>   environment -K --pure --ad-hoc mumps-openmpi \
>   --with-input=mpi=mpich --with-input=openmpi=mpich --
>
> This works for me, I can use a similar command to compile and execute a
> program which uses mumps and I can see with ldd that mpich is used.
>
> Then with the current mpich patch on savannah master:
>
> guix time-machine --commit=c70261bfb993cebc23cd80042de3f52a8b7932a4 -- \
>   environment -K --pure --ad-hoc mumps-openmpi \
>   --with-input=mpi=mpich --with-input=openmpi=mpich --
>
> This fails on my machine for the pt-scotch check (there is the same
> error with scalapack check)

Are 'pt-scotch' and 'scalapack' compiled with 'mpich' or 'openmpi'?

Because maybe "mumps-openmpi --with-input=openmpi=mpich" compiles
'mumps' using 'mpich' as MPI but compile 'pt-scotch' or 'scalapack'
with the default implementation which seems 'openmpi'.


Thank you for your work.

All the best,
simon




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Wed, 19 Feb 2020 12:12:01 GMT) Full text and rfc822 format available.

Message #20 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Maurice Brémond <Maurice.Bremond <at> inria.fr>
Cc: Ludovic Courtès <ludo <at> gnu.org>, 39588 <at> debbugs.gnu.org
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Wed, 19 Feb 2020 13:11:34 +0100
Hi Maurice,

On Wed, 19 Feb 2020 at 12:46, Maurice Brémond <Maurice.Bremond <at> inria.fr> wrote:

> >If mumps depends explicitly on openmpi, then '--with-inputs' can
> >rewrite the direct dependencies, by providing say mpich instead of
> >openmpi.
> >If petsc* depends explicitly on openmpi and on mumps (which depends
> >explicitly on openmpi too), then '--with-inputs=openmpi=mpich'  will
> >*only* rewrite the dependency of petsc but not of mumps. So it ends
> >with petsc compiled with mpich and mumps with openmpi.
> >
> >Still considering this (fictive) example, where:
> > - petsc depends on openmpi(1) and mumps
> > - mumps depends on openmpi(2)
> >The openmpi(2) is an implicit dependency for petsc and '--with-inputs'
> >does not work.

Sorry for the confusion, because what I said is *wrong*.
It is not the definition of an implicit inputs. The definition is:

--8<---------------cut here---------------start------------->8---
In addition, this build system ensures that the “standard” environment
for GNU packages is available. This includes tools such as GCC, libc,
Coreutils, Bash, Make, Diffutils, grep, and sed (see the (guix
build-system gnu) module for a complete list). We call these the
implicit inputs of a package, because package definitions do not have to
mention them.
--8<---------------cut here---------------end--------------->8---


> Ok thank you for the clarification, I understand better now.
>
> I misunderstood the documentation:
>
> https://guix.gnu.org/manual/en/html_node/Package-Transformation-Options.html
>
>    --with-input=package=replacement
>    [...]
>    This is a recursive, deep replacement. [...]

Well, you understood correctly. It is me that mix and add confusion, sorry.


> In the scalapack input I can see:
>      `(("mpi" ,openmpi)
>        ("fortran" ,gfortran)
>        ("lapack" ,lapack)))             ;for testing only
>
> So my assumption is that the --with-input transformation should work
> here as neither gfortran or lapack depends on mpi and to just build
> scalapack with mpich I tried:
>
> guix time-machine --commit=c70261bfb993cebc23cd80042de3f52a8b7932a4 -- build scalapack --with-input=openmpi=mpich

Hum, my MUA trims the long message.


Well, my point was: maybe it does not work because of the implicit inputs.
Now, mpi has bitten me so I will try this afternoon. :-)

Cheers,
simon




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Wed, 19 Feb 2020 13:36:02 GMT) Full text and rfc822 format available.

Message #23 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Maurice Brémond <Maurice.Bremond <at> inria.fr>
Cc: Ludovic Courtès <ludo <at> gnu.org>, 39588 <at> debbugs.gnu.org
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Wed, 19 Feb 2020 14:34:56 +0100
Hi Maurice,

On Tue, 18 Feb 2020 at 18:58, Maurice Brémond <Maurice.Bremond <at> inria.fr> wrote:

> guix time-machine --commit=c70261bfb993cebc23cd80042de3f52a8b7932a4 -- \
>   environment -K --pure --ad-hoc mumps-openmpi \
>   --with-input=mpi=mpich --with-input=openmpi=mpich --
>
> This fails on my machine for the pt-scotch check (there is the same
> error with scalapack check)
>
> Invalid error code (-2) (error ring index 127 invalid)
> INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
> Fatal error in PMPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(586)..............:
> MPID_Init(224).....................: channel initialization failed
> MPIDI_CH3_Init(105)................:
> MPID_nem_init(324).................:
> MPID_nem_tcp_init(175).............:
> MPID_nem_tcp_get_business_card(401):
> MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0)
>
> If I go into the build directory and launch the check manually after
> sourcing the environment-variables file, it works...
>
> So it seems that this is related to guix and the guixbuild environment in
> the definition of the package.

Considering mumps-openmpi (or scalapack), the package definition contains:

--8<---------------cut here---------------start------------->8---
(arguments
 `(#:configure-flags `("-DBUILD_SHARED_LIBS:BOOL=YES")
   #:phases (modify-phases %standard-phases
              (add-before 'check 'mpi-setup
        ,%openmpi-setup))))
--8<---------------cut here---------------end--------------->8---

so you have right. It seems being an environment issue.

The flag '--with-inputs=openmpi=mpich' changes the MPI implementation
but then at the checking phase, the environment variables (see
%openmpi-setup in gnu/packages/mpi.scm) are not necessarily set for
mpich.

The digression about implicit inputs was not relevant, sorry. :-)

Well, to restore the discussion about the variants '*-mpich' instead
of '*-openmpi', we could use a 'with-mpich' similar to 'with-python2'
or 'with-ocaml4.07' rewritting correctly the definition.


Hope that helps,
simon




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Wed, 19 Feb 2020 15:28:02 GMT) Full text and rfc822 format available.

Message #26 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: Maurice Brémond <Maurice.Bremond <at> inria.fr>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: Ludovic Courtès <ludo <at> gnu.org>, 39588 <at> debbugs.gnu.org
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Wed, 19 Feb 2020 12:45:55 +0100
[Message part 1 (text/plain, inline)]
Hi Simon,

>If mumps depends explicitly on openmpi, then '--with-inputs' can
>rewrite the direct dependencies, by providing say mpich instead of
>openmpi.
>If petsc* depends explicitly on openmpi and on mumps (which depends
>explicitly on openmpi too), then '--with-inputs=openmpi=mpich'  will
>*only* rewrite the dependency of petsc but not of mumps. So it ends
>with petsc compiled with mpich and mumps with openmpi.
>
>Still considering this (fictive) example, where:
> - petsc depends on openmpi(1) and mumps
> - mumps depends on openmpi(2)
>The openmpi(2) is an implicit dependency for petsc and '--with-inputs'
>does not work.
>
>*because I know better PETSc than Scotch. ;-)
>

Ok thank you for the clarification, I understand better now.

I misunderstood the documentation:

https://guix.gnu.org/manual/en/html_node/Package-Transformation-Options.html

   --with-input=package=replacement
   [...]
   This is a recursive, deep replacement. [...]

So I thought that implicit packages had something to do with other
inputs like native-inputs, but it was not clear for me.
(I must admit that I still do not understand very well the recursive term here)

I made a confusion also with the replacement, --with-input must be given
the name of the package and not the name of the dependency, so
--with-input=openmpi=mpich is sufficient.

In the scalapack input I can see:

     `(("mpi" ,openmpi)
       ("fortran" ,gfortran)
       ("lapack" ,lapack)))             ;for testing only

So my assumption is that the --with-input transformation should work
here as neither gfortran or lapack depends on mpi and to just build
scalapack with mpich I tried:

guix time-machine --commit=c70261bfb993cebc23cd80042de3f52a8b7932a4 -- build scalapack --with-input=openmpi=mpich

This fails with the same error:

[Message part 2 (text/plain, attachment)]
[Message part 3 (text/plain, inline)]

I also tried to compile my *-mpich packages in my own branch with the new
mpich package and get the same results :

guix time-machine --url=https://gitlab.inria.fr/bremond/guix.git \
 --branch=check-master-mpich -- \
 environment --pure --ad-hoc mumps-mpich 


Maurice


Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Thu, 20 Feb 2020 09:09:01 GMT) Full text and rfc822 format available.

Message #29 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: Maurice Brémond <Maurice.Bremond <at> inria.fr>,
 39588 <at> debbugs.gnu.org
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Thu, 20 Feb 2020 10:08:10 +0100
Hello!

zimoun <zimon.toutoune <at> gmail.com> skribis:

> On Mon, 17 Feb 2020 at 18:27, Ludovic Courtès <ludo <at> gnu.org> wrote:
>
>> As for the “-mpich” packages: they look good to me, though I’m not
>> entirely sure whether we should create “-mpich” variants for each of
>> them.  Ideally ‘--with-inputs’ would be enough, but I don’t know.  At
>> the same time, those variants don’t cost us much, so if they’re useful,
>> why not.
>
> Is it not related to "package parameters" or the discussion we had
> about rebuilding everything with another compiler?

There’s definitely a connection.

> Other said, '--with-inputs' will do the job for explicit packages but
> not the implicit ones.

Right, ‘--with-input’ could be “good enough”.

> One easy move should to generalize -- if possible -- what is done in
> 'with-python2' or 'with-ocaml4.07'. But I am not convinced it is easy
> because it is clearly dependant on the build system.
>
> On the other hand, I gave a look at spack (after the discussion at
> FOSDEM) and how they do. The WIP branch [1] about the solver is
> interesting: possibly catch incompatibilities earlier using solver
> (SAT or other) and specifications. But I am not convinced neither it
> is the way to go because it adds a lot of complexity for a gain that
> could be discussed. ;-)
>
>
> [1] https://github.com/spack/spack/tree/features/solver/lib/spack/spack/solver

I have yet to look more closely into this.  However, overall, while I
agree that some flexibility is welcome and actually needed, I’m
skeptical about the goal of potentially allowing for any combination, at
the expense of QA (the solver can check for incompatible options,
provided option compatibility is well specified, but it cannot check
whether something will run or even build at all.)

> Well, for these particular patches, the variants are ok.
> But we should think about how to ease the variant generation of all the chain.

Well again there are things like ‘package-input-rewriting’ that could
help: we could define a ‘package-with-mpich’ procedure.

Thanks,
Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Thu, 20 Feb 2020 09:39:02 GMT) Full text and rfc822 format available.

Message #32 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Maurice Brémond <Maurice.Bremond <at> inria.fr>
Cc: 39588 <at> debbugs.gnu.org, zimoun <zimon.toutoune <at> gmail.com>
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Thu, 20 Feb 2020 10:38:19 +0100
Hi Maurice,

Maurice Brémond <Maurice.Bremond <at> inria.fr> skribis:

> This fails on my machine for the pt-scotch check (there is the same
> error with scalapack check)
>
> Invalid error code (-2) (error ring index 127 invalid)
> INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
> Fatal error in PMPI_Init: Other MPI error, error stack:
> MPIR_Init_thread(586)..............: 
> MPID_Init(224).....................: channel initialization failed
> MPIDI_CH3_Init(105)................: 
> MPID_nem_init(324).................: 
> MPID_nem_tcp_init(175).............: 
> MPID_nem_tcp_get_business_card(401): 
> MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0)
>
> If I go into the build directory and launch the check manually after
> sourcing the environment-variables file, it works...

My version of the patch must have changed the default driver or
something along these lines.

tcp_init.c:373 in MPICH reads this:

--8<---------------cut here---------------start------------->8---
    /* If we don't have an IP address, try to get it from the name */
    if (!ifaddrFound) {
        mpi_errno = MPL_get_sockaddr(ifname_string, p_addr);
        MPIR_ERR_CHKANDJUMP2(mpi_errno, mpi_errno, MPI_ERR_OTHER, "**gethostbyname", "**gethostbyname %s %d", ifname_string, h_errno);
    }
--8<---------------cut here---------------end--------------->8---

‘MPL_get_sockaddr’ uses ‘getifaddrs’ to get the list of local
interfaces, which in turn is implemented in terms of netlink requests in
libc.

I tried to reproduce it without success, but I guess my example MPI
program is too simple to trigger the issue:

--8<---------------cut here---------------start------------->8---
$ guix build -f mpich.scm
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
La jena derivo estos konstruata:
   /gnu/store/mgcwnmicw696i3g98rljdg92ra6ilq4n-mpi-init.drv
building /gnu/store/mgcwnmicw696i3g98rljdg92ra6ilq4n-mpi-init.drv...
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
successfully built /gnu/store/mgcwnmicw696i3g98rljdg92ra6ilq4n-mpi-init.drv
/gnu/store/cmdh27sg2hqh2p0jhxz33xgfmsxd5hmz-mpi-init
$ /gnu/store/cmdh27sg2hqh2p0jhxz33xgfmsxd5hmz-mpi-init
$ cat mpich.scm
(use-modules (guix) (gnu))

(define code
  (plain-file "mpi.c" "
#include <mpi.h>

int main (int argc, char *argv[]) { return MPI_Init (&argc, &argv);} "))

(define toolchain (specification->package "gcc-toolchain"))
(define mpich (specification->package "mpich"))

(computed-file "mpi-init"
               (with-imported-modules '((guix build utils))
                 #~(begin
                     (use-modules (guix build utils))

                     (setenv "PATH"
                             (string-append #$(file-append toolchain "/bin") ":"
                                            #$(file-append mpich "/bin")))
                     (setenv "CPATH" #$(file-append mpich "/include"))
                     (setenv "LIBRARY_PATH"
                             (string-append #$(file-append mpich "/lib") ":"
                                            #$(file-append toolchain "/lib")))
                     (invoke "mpicc" "-o" #$output "-Wall" "-g"
                             #$code)

                     ;; Run the MPI code in the build environment.
                     (invoke #$output))))
--8<---------------cut here---------------end--------------->8---

Ideas?

Could you perhaps strace the pt-scotch test that fails so we can see if
there’s anything obvious, such as code that browses /sys or similar?

TIA,
Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Thu, 20 Feb 2020 10:24:02 GMT) Full text and rfc822 format available.

Message #35 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Maurice Brémond <Maurice.Bremond <at> inria.fr>,
 39588 <at> debbugs.gnu.org
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Thu, 20 Feb 2020 11:23:00 +0100
Hi Ludo,

On Thu, 20 Feb 2020 at 10:08, Ludovic Courtès <ludo <at> gnu.org> wrote:

> > Other said, '--with-inputs' will do the job for explicit packages but
> > not the implicit ones.
>
> Right, ‘--with-input’ could be “good enough”.

About openmpi->mpich, I am not sure it will work because of:

--8<---------------cut here---------------start------------->8---
#:phases (modify-phases %standard-phases
                  (add-before 'check 'mpi-setup
            ,%openmpi-setup))
--8<---------------cut here---------------end--------------->8---




> > On the other hand, I gave a look at spack (after the discussion at
> > FOSDEM) and how they do. The WIP branch [1] about the solver is
> > interesting: possibly catch incompatibilities earlier using solver
> > (SAT or other) and specifications. But I am not convinced neither it
> > is the way to go because it adds a lot of complexity for a gain that
> > could be discussed. ;-)
> >
> >
> > [1] https://github.com/spack/spack/tree/features/solver/lib/spack/spack/solver
>
> I have yet to look more closely into this.  However, overall, while I
> agree that some flexibility is welcome and actually needed, I’m
> skeptical about the goal of potentially allowing for any combination, at
> the expense of QA (the solver can check for incompatible options,
> provided option compatibility is well specified, but it cannot check
> whether something will run or even build at all.)

I agree. Need more thoughts. :-)


> > One easy move should to generalize -- if possible -- what is done in
> > 'with-python2' or 'with-ocaml4.07'. But I am not convinced it is easy
> > because it is clearly dependant on the build system.

> > Well, for these particular patches, the variants are ok.
> > But we should think about how to ease the variant generation of all the chain.
>
> Well again there are things like ‘package-input-rewriting’ that could
> help: we could define a ‘package-with-mpich’ procedure.

Yes. 'with-python2' and 'with-ocaml4.07' rewrite the build-system
(implicit inputs) and 'package-with-mpich' rewrites packages
('package-input-rewritting' so explicit ones) more tweak some
variables (environment and/or flags).
Sounds good. :-)


All the best,
simon




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Fri, 21 Feb 2020 08:05:01 GMT) Full text and rfc822 format available.

Message #38 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: Maurice Brémond <Maurice.Bremond <at> inria.fr>,
 39588 <at> debbugs.gnu.org
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Fri, 21 Feb 2020 09:03:54 +0100
Hi,

zimoun <zimon.toutoune <at> gmail.com> skribis:

> On Thu, 20 Feb 2020 at 10:08, Ludovic Courtès <ludo <at> gnu.org> wrote:
>
>> > Other said, '--with-inputs' will do the job for explicit packages but
>> > not the implicit ones.
>>
>> Right, ‘--with-input’ could be “good enough”.
>
> About openmpi->mpich, I am not sure it will work because of:
>
> #:phases (modify-phases %standard-phases
>                   (add-before 'check 'mpi-setup
>             ,%openmpi-setup))

That phase just sets environment variables that MPICH will happily
ignore.

Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Fri, 21 Feb 2020 08:41:02 GMT) Full text and rfc822 format available.

Message #41 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Maurice Brémond <Maurice.Bremond <at> inria.fr>,
 39588 <at> debbugs.gnu.org
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Fri, 21 Feb 2020 09:40:00 +0100
On Fri, 21 Feb 2020 at 09:03, Ludovic Courtès <ludo <at> gnu.org> wrote:
> zimoun <zimon.toutoune <at> gmail.com> skribis:
> > On Thu, 20 Feb 2020 at 10:08, Ludovic Courtès <ludo <at> gnu.org> wrote:
> >
> >> > Other said, '--with-inputs' will do the job for explicit packages but
> >> > not the implicit ones.
> >>
> >> Right, ‘--with-input’ could be “good enough”.
> >
> > About openmpi->mpich, I am not sure it will work because of:
> >
> > #:phases (modify-phases %standard-phases
> >                   (add-before 'check 'mpi-setup
> >             ,%openmpi-setup))
>
> That phase just sets environment variables that MPICH will happily
> ignore.

Yes "qui peut le plus peut le moins". ;-)
But if the package mpich requires environment variables too.

(I do not have a clean MPI installation at hand so it is difficult to
really test.)


Cheers,
simon




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Fri, 21 Feb 2020 08:47:02 GMT) Full text and rfc822 format available.

Message #44 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: Maurice Brémond <Maurice.Bremond <at> inria.fr>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 39588 <at> debbugs.gnu.org, zimoun <zimon.toutoune <at> gmail.com>
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Fri, 21 Feb 2020 09:46:38 +0100
[Message part 1 (text/plain, inline)]
Hi,

I made an strace on scalapack tests (blacs tests)

[XL.gz (application/gzip, attachment)]
[Message part 3 (text/plain, inline)]

I cannot see where it goes wrong, but it should be in the trace.

I also compiled another package I use with mpich, adjoinable-mpi, and it
is ok (as there is no checks inside it...). I can use it to run an ocean
model and everything works. So it is the same thing as your example I
think, the end user can use mpich, but the guix daemon cannot.

Maurice


Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Fri, 21 Feb 2020 09:02:02 GMT) Full text and rfc822 format available.

Message #47 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: Maurice Brémond <Maurice.Bremond <at> inria.fr>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: Ludovic Courtès <ludo <at> gnu.org>, 39588 <at> debbugs.gnu.org
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Fri, 21 Feb 2020 10:01:28 +0100
Hi Simon,

>The digression about implicit inputs was not relevant, sorry. :-)
no worry, it made me learn some guix internals that were obscure to me!

>(arguments
> `(#:configure-flags `("-DBUILD_SHARED_LIBS:BOOL=YES")
>   #:phases (modify-phases %standard-phases
>              (add-before 'check 'mpi-setup
>        ,%openmpi-setup))))

Even if it only sets variables and should be harmless here, I don't know,
would it be easy to make this openmpi-setup become a mpi-setup with
ad-hoc initialisations?

Maurice




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Fri, 21 Feb 2020 11:33:02 GMT) Full text and rfc822 format available.

Message #50 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Maurice Brémond <Maurice.Bremond <at> inria.fr>
Cc: 39588 <at> debbugs.gnu.org, zimoun <zimon.toutoune <at> gmail.com>
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Fri, 21 Feb 2020 12:32:44 +0100
[Message part 1 (text/plain, inline)]
Hi,

I actually managed to reproduce it with a minimal test case (attached):

--8<---------------cut here---------------start------------->8---
$ guix build -f mpich-test.scm
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
La jena derivo estos konstruata:
   /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv
building /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv...
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(586)..............: 
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................: 
MPID_nem_init(324).................: 
MPID_nem_tcp_init(175).............: 
MPID_nem_tcp_get_business_card(401): 
MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0)
Backtrace:
           1 (primitive-load "/gnu/store/iykxzg1n018sigd4c23kx1c4ngz?")
In guix/build/utils.scm:
    652:6  0 (invoke _ . _)

guix/build/utils.scm:652:6: In procedure invoke:
Throw to key `srfi-34' with args `(#<condition &invoke-error [program: "mpiexec" arguments: ("-np" "2" "/gnu/store/8i1dci1wxd6c0q6a2cz4kgb8adfk8rrz-mpi-init") exit-status: 15 term-signal: #f stop-signal: #f] 7ffff6022f40>)'.
builder for `/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv' failed with exit code 1
build of /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv failed
View build log at '/var/log/guix/drvs/rg/r7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv.bz2'.
guix build: error: build of `/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv' failed
--8<---------------cut here---------------end--------------->8---

The same program outside the container works just fine:

--8<---------------cut here---------------start------------->8---
$ guix environment --ad-hoc mpich -- mpiexec -np 2 "/gnu/store/8i1dci1wxd6c0q6a2cz4kgb8adfk8rrz-mpi-init"
np = 2, rank = 0
np = 2, rank = 1
--8<---------------cut here---------------end--------------->8---

‘MPL_get_sockaddr’ uses ‘getaddrinfo’ for host name lookup.
Interestingly, ‘getaddrinfo’ fails in the build environment when passed
the flags that ‘MPL_get_sockaddr’ uses:

--8<---------------cut here---------------start------------->8---
(computed-file "getaddrinfo"
               #~(pk #$output
                     (getaddrinfo "localhost" #f
                                  (logior AI_ADDRCONFIG AI_V4MAPPED)
                                  AF_INET
                                  SOCK_STREAM
                                  IPPROTO_TCP)))
--8<---------------cut here---------------end--------------->8---

However, if you comment AF_INET, SOCK_STREAM, and IPPROTO_TCP, it works.

Now we need to see why the ‘ai_family’ hint is causing troubles in
glibc, and perhaps in parallel try to work around it in MPICH…

Ludo’.

PS: I’ll be mostly away from keyboard in the coming days.

[mpich.scm (text/plain, inline)]
(use-modules (guix) (gnu))

(define code
  (plain-file "mpi.c" "
#include <assert.h>
#include <stdio.h>
#include <mpi.h>

int main (int argc, char *argv[]) {
  int err, np, rank;
  err = MPI_Init (&argc, &argv);
  assert (err == 0);
  err = MPI_Comm_size(MPI_COMM_WORLD, &np);
  assert (err == 0);
  err = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  assert (err == 0);
  printf (\"np = %i, rank = %i\\n\", np, rank);
  return 0;
} "))

(define toolchain (specification->package "gcc-toolchain"))
(define mpich (specification->package "mpich"))

(computed-file "mpi-init"
               (with-imported-modules '((guix build utils))
                 #~(begin
                     (use-modules (guix build utils))

                     (setenv "PATH"
                             (string-append #$(file-append toolchain "/bin") ":"
                                            #$(file-append mpich "/bin")))
                     (setenv "CPATH" #$(file-append mpich "/include"))
                     (setenv "LIBRARY_PATH"
                             (string-append #$(file-append mpich "/lib") ":"
                                            #$(file-append toolchain "/lib")))
                     (invoke "mpicc" "-o" #$output "-Wall" "-g"
                             #$code)

                     ;; Run the MPI code in the build environment.
                     (invoke "mpiexec" "-np" "2" #$output))))

Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Tue, 25 Feb 2020 16:43:01 GMT) Full text and rfc822 format available.

Message #53 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Maurice Brémond <Maurice.Bremond <at> inria.fr>,
 39588 <at> debbugs.gnu.org
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Tue, 25 Feb 2020 17:41:56 +0100
Hi Ludo,

On Thu, 20 Feb 2020 at 11:23, zimoun <zimon.toutoune <at> gmail.com> wrote:
> On Thu, 20 Feb 2020 at 10:08, Ludovic Courtès <ludo <at> gnu.org> wrote:

> > > One easy move should to generalize -- if possible -- what is done in
> > > 'with-python2' or 'with-ocaml4.07'. But I am not convinced it is easy
> > > because it is clearly dependant on the build system.
>
> > > Well, for these particular patches, the variants are ok.
> > > But we should think about how to ease the variant generation of all the chain.
> >
> > Well again there are things like ‘package-input-rewriting’ that could
> > help: we could define a ‘package-with-mpich’ procedure.
>
> Yes. 'with-python2' and 'with-ocaml4.07' rewrite the build-system
> (implicit inputs) and 'package-with-mpich' rewrites packages
> ('package-input-rewritting' so explicit ones) more tweak some
> variables (environment and/or flags).
> Sounds good. :-)

I do not know why I remove the "package-" in "package-with-python2".
Whatever! :-)
My remark was to maybe distinguish between rewriting an input and
rewriting the build-system. But after some thoughts, I do not know if
it is useful and add more complexity.

However, I do not know if the good candidate is
'package-input-rewriting' or 'package-mapping'; as in
'package-with-python2'. Well, I will try to experiment in the
meantime.


All the best,
simon




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Thu, 15 Oct 2020 19:51:02 GMT) Full text and rfc822 format available.

Message #56 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Maurice Brémond <Maurice.Bremond <at> inria.fr>,
 39588 <at> debbugs.gnu.org
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Thu, 15 Oct 2020 21:50:00 +0200
Hi,

Reliving the openmpi->mpich topic of #39588 [1].

1: <http://issues.guix.gnu.org/issue/39588>


On Thu, 20 Feb 2020 at 10:08, Ludovic Courtès <ludo <at> gnu.org> wrote:

>> Well, for these particular patches, the variants are ok.
>> But we should think about how to ease the variant generation of all the chain.
>
> Well again there are things like ‘package-input-rewriting’ that could
> help: we could define a ‘package-with-mpich’ procedure.

Now the “#:deep?” exists, does it make sense to implement this
“package-with-mpich” procedure?  It could be helpful for HPC people,
isn’t?


All the best,
simon




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Fri, 16 Oct 2020 09:33:02 GMT) Full text and rfc822 format available.

Message #59 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: Maurice Brémond <Maurice.Bremond <at> inria.fr>,
 39588 <at> debbugs.gnu.org
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Fri, 16 Oct 2020 11:32:18 +0200
Hi,

zimoun <zimon.toutoune <at> gmail.com> skribis:

> Reliving the openmpi->mpich topic of #39588 [1].
>
> 1: <http://issues.guix.gnu.org/issue/39588>

Thanks for staying on top of things!  :-)

> On Thu, 20 Feb 2020 at 10:08, Ludovic Courtès <ludo <at> gnu.org> wrote:
>
>>> Well, for these particular patches, the variants are ok.
>>> But we should think about how to ease the variant generation of all the chain.
>>
>> Well again there are things like ‘package-input-rewriting’ that could
>> help: we could define a ‘package-with-mpich’ procedure.
>
> Now the “#:deep?” exists, does it make sense to implement this
> “package-with-mpich” procedure?  It could be helpful for HPC people,
> isn’t?

Or does ‘--with-input=openmpi=mpich’ fit the bill now?

Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Fri, 16 Oct 2020 11:47:01 GMT) Full text and rfc822 format available.

Message #62 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: Maurice Brémond <Maurice.Bremond <at> inria.fr>,
 39588 <at> debbugs.gnu.org
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Fri, 16 Oct 2020 13:46:16 +0200
Dear,

On Fri, 16 Oct 2020 at 11:32, Ludovic Courtès <ludo <at> gnu.org> wrote:

> Thanks for staying on top of things!  :-)

I have been a bit upset by recent discussions on French's HPC mailing
list about "modulefiles are awesome" ---- well I am sure they are
applying variant of "prêcher le faux pour savoir le vrai" [1]. :-)

1: <https://meta.wikimedia.org/wiki/Cunningham%27s_Law>


> > On Thu, 20 Feb 2020 at 10:08, Ludovic Courtès <ludo <at> gnu.org> wrote:

> > Now the “#:deep?” exists, does it make sense to implement this
> > “package-with-mpich” procedure?  It could be helpful for HPC people,
> > isn’t?
>
> Or does ‘--with-input=openmpi=mpich’ fit the bill now?

I do not have an MPI infrastructure or enough CPU power to rebuild a
lot at hand to fully test it.  If one of you can try on right infra
and report, it could be awesome.

BTW, it should work only if MPICH does not require extra phases or
environment variables.


All the best,
simon




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Mon, 19 Oct 2020 13:47:01 GMT) Full text and rfc822 format available.

Message #65 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: Maurice Brémond <Maurice.Bremond <at> inria.fr>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: 39588 <at> debbugs.gnu.org, Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Mon, 19 Oct 2020 15:46:20 +0200
[Message part 1 (text/plain, inline)]
Hello,

A build of mumps-openmpi with mpich fails:

guix time-machine -- build mumps-openmpi --with-input=openmpi=mpich

[...]
  mpirun -n 3 ./test_scotch_dgraph_check data/bump.grf
  Invalid error code (-2) (error ring index 127 invalid)
  INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
  Invalid error code (-2) (error ring index 127 invalid)
  INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
  Invalid error code (-2) (error ring index 127 invalid)
  INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
  Fatal error in PMPI_Init: Other MPI error, error stack:
  MPIR_Init_thread(586)..............: 
  MPID_Init(224).....................: channel initialization failed
  MPIDI_CH3_Init(105)................: 
  MPID_nem_init(324).................: 
  MPID_nem_tcp_init(175).............: 
  MPID_nem_tcp_get_business_card(401): 
  MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0)


This is what Ludo reproduced:
[Message part 2 (text/plain, inline)]
From: Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpich, python-mpi4py-mpich
To: Maurice Brémond <Maurice.Bremond <at> inria.fr>
Cc: 39588 <at> debbugs.gnu.org,  zimoun <zimon.toutoune <at> gmail.com>
Date: Fri, 21 Feb 2020 12:32:44 +0100 (34 weeks, 3 days, 2 hours ago)

Hi,

I actually managed to reproduce it with a minimal test case (attached):

$ guix build -f mpich-test.scm
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
La jena derivo estos konstruata:
   /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv
building /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv...
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215: expr: command not found
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(586)..............: 
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................: 
MPID_nem_init(324).................: 
MPID_nem_tcp_init(175).............: 
MPID_nem_tcp_get_business_card(401): 
MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0)
Backtrace:
           1 (primitive-load "/gnu/store/iykxzg1n018sigd4c23kx1c4ngz?")
In guix/build/utils.scm:
    652:6  0 (invoke _ . _)

guix/build/utils.scm:652:6: In procedure invoke:
Throw to key `srfi-34' with args `(#<condition &invoke-error [program: "mpiexec" arguments: ("-np" "2" "/gnu/store/8i1dci1wxd6c0q6a2cz4kgb8adfk8rrz-mpi-init") exit-status: 15 term-signal: #f stop-signal: #f] 7ffff6022f40>)'.
builder for `/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv' failed with exit code 1
build of /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv failed
View build log at '/var/log/guix/drvs/rg/r7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv.bz2'.
guix build: error: build of `/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv' failed


The same program outside the container works just fine:

$ guix environment --ad-hoc mpich -- mpiexec -np 2 "/gnu/store/8i1dci1wxd6c0q6a2cz4kgb8adfk8rrz-mpi-init"
np = 2, rank = 0
np = 2, rank = 1


‘MPL_get_sockaddr’ uses ‘getaddrinfo’ for host name lookup.
Interestingly, ‘getaddrinfo’ fails in the build environment when passed
the flags that ‘MPL_get_sockaddr’ uses:

(computed-file "getaddrinfo"
               #~(pk #$output
                     (getaddrinfo "localhost" #f
                                  (logior AI_ADDRCONFIG AI_V4MAPPED)
                                  AF_INET
                                  SOCK_STREAM
                                  IPPROTO_TCP)))

However, if you comment AF_INET, SOCK_STREAM, and IPPROTO_TCP, it works.

Now we need to see why the ‘ai_family’ hint is causing troubles in
glibc, and perhaps in parallel try to work around it in MPICH…

Ludo’.

PS: I’ll be mostly away from keyboard in the coming days.

(use-modules (guix) (gnu))

(define code
  (plain-file "mpi.c" "
#include <assert.h>
#include <stdio.h>
#include <mpi.h>

int main (int argc, char *argv[]) {
  int err, np, rank;
  err = MPI_Init (&argc, &argv);
  assert (err == 0);
  err = MPI_Comm_size(MPI_COMM_WORLD, &np);
  assert (err == 0);
  err = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  assert (err == 0);
  printf (\"np = %i, rank = %i\\n\", np, rank);
  return 0;
} "))

(define toolchain (specification->package "gcc-toolchain"))
(define mpich (specification->package "mpich"))

(computed-file "mpi-init"
               (with-imported-modules '((guix build utils))
                 #~(begin
                     (use-modules (guix build utils))

                     (setenv "PATH"
                             (string-append #$(file-append toolchain "/bin") ":"
                                            #$(file-append mpich "/bin")))
                     (setenv "CPATH" #$(file-append mpich "/include"))
                     (setenv "LIBRARY_PATH"
                             (string-append #$(file-append mpich "/lib") ":"
                                            #$(file-append toolchain "/lib")))
                     (invoke "mpicc" "-o" #$output "-Wall" "-g"
                             #$code)

                     ;; Run the MPI code in the build environment.
                     (invoke "mpiexec" "-np" "2" #$output))))

[Message part 3 (text/plain, inline)]

Note that it is ok with the raw mpich patch 
guix time-machine --commit=398ec3c1e265a3f89ed07987f33b264db82e4080 -- time-machine --url=https://gitlab.inria.fr/bremond/guix.git --branch=add-mpich -- build mumps-openmpi --with-input=openmpi=mpich    

I tried a build with the same hwloc as the embedded commit f7b08df258c2e7d04ca2035ddd55a1de91f806d4
(the HEAD used for hwloc in mpich) but the result is the same:

guix time-machine --commit=398ec3c1e265a3f89ed07987f33b264db82e4080 -- time-machine --url=https://gitlab.inria.fr/bremond/guix.git --branch=test-mpich -- build mumps-openmpi --with-input=openmpi=mpich

(the 2 steps time-machine needed is another question...)


Maurice

Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Tue, 20 Oct 2020 20:56:02 GMT) Full text and rfc822 format available.

Message #68 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Maurice Brémond <Maurice.Bremond <at> inria.fr>
Cc: 39588 <at> debbugs.gnu.org, zimoun <zimon.toutoune <at> gmail.com>
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Tue, 20 Oct 2020 22:55:13 +0200
Hi Maurice,

Maurice Brémond <Maurice.Bremond <at> inria.fr> skribis:

> A build of mumps-openmpi with mpich fails:
>
> guix time-machine -- build mumps-openmpi --with-input=openmpi=mpich

[...]

>   MPID_nem_tcp_get_business_card(401): 
>   MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0)

[...]

> ‘MPL_get_sockaddr’ uses ‘getaddrinfo’ for host name lookup.
> Interestingly, ‘getaddrinfo’ fails in the build environment when passed
> the flags that ‘MPL_get_sockaddr’ uses:
>
> (computed-file "getaddrinfo"
>                #~(pk #$output
>                      (getaddrinfo "localhost" #f
>                                   (logior AI_ADDRCONFIG AI_V4MAPPED)
>                                   AF_INET
>                                   SOCK_STREAM
>                                   IPPROTO_TCP)))
>
> However, if you comment AF_INET, SOCK_STREAM, and IPPROTO_TCP, it works.
>
> Now we need to see why the ‘ai_family’ hint is causing troubles in
> glibc, and perhaps in parallel try to work around it in MPICH…

Oh thanks for the reminder, I have yet to take a closer look… hopefully
soon.

Ludo’.




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Wed, 21 Oct 2020 14:44:01 GMT) Full text and rfc822 format available.

Message #71 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: zimoun <zimon.toutoune <at> gmail.com>
To: Maurice Brémond <Maurice.Bremond <at> inria.fr>
Cc: 39588 <at> debbugs.gnu.org, Ludovic Courtès <ludo <at> gnu.org>
Subject: (off-topic) double time-machine explanations
Date: Wed, 21 Oct 2020 16:43:41 +0200
Dear Maurice,

Thank you for the tests.  Ouch!  I will try to give a deep look next
week...  even if it is maybe out of my skill.  Well, the v1.2 is
coming and it could be nice to have both MPI. :-)

On Mon, 19 Oct 2020 at 15:46, Maurice Brémond <Maurice.Bremond <at> inria.fr> wrote:

[...]

> Note that it is ok with the raw mpich patch
> guix time-machine --commit=398ec3c1e265a3f89ed07987f33b264db82e4080 -- time-machine --url=https://gitlab.inria.fr/bremond/guix.git --branch=add-mpich -- build mumps-openmpi --with-input=openmpi=mpich
>
> I tried a build with the same hwloc as the embedded commit f7b08df258c2e7d04ca2035ddd55a1de91f806d4
> (the HEAD used for hwloc in mpich) but the result is the same:
>
> guix time-machine --commit=398ec3c1e265a3f89ed07987f33b264db82e4080 -- time-machine --url=https://gitlab.inria.fr/bremond/guix.git --branch=test-mpich -- build mumps-openmpi --with-input=openmpi=mpich
>
> (the 2 steps time-machine needed is another question...)

The 2 "time-machine" are because the repo
https://gitlab.inria.fr/bremond/guix.git lags really behind master, I
guess.  You can cut to only one by using a channel file, something
like:

--8<---------------cut here---------------start------------->8---
(list (channel
        (name 'guix)
        (url "https://git.savannah.gnu.org/git/guix.git")
        (commit
          "398ec3c1e265a3f89ed07987f33b264db82e4080"))
       (channel
         (name ’yours)
         (url "https://gitlab.inria.fr/bremond/guix.git")))
--8<---------------cut here---------------end--------------->8---

and then "guix time-machine -C channels.scm -- build ..."

<http://guix.gnu.org/manual/devel/en/html_node/Replicating-Guix.html>

But yeah, that's another story. :-)


All the best,
simon




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Fri, 23 Oct 2020 08:42:02 GMT) Full text and rfc822 format available.

Message #74 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: Maurice Brémond <Maurice.Bremond <at> inria.fr>
To: zimoun <zimon.toutoune <at> gmail.com>
Cc: 39588 <at> debbugs.gnu.org, Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: (off-topic) double time-machine explanations
Date: Fri, 23 Oct 2020 10:41:48 +0200
Hello Simon,

thank you for the explanation, and sorry for the digression.  I'm going
to read more carefully the manual...

Maurice




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Fri, 23 Oct 2020 09:34:01 GMT) Full text and rfc822 format available.

Message #77 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: Maurice Brémond <Maurice.Bremond <at> inria.fr>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 39588 <at> debbugs.gnu.org, zimoun <zimon.toutoune <at> gmail.com>
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Fri, 23 Oct 2020 11:33:10 +0200
Hello Ludovic,

Apparently at the mpich configuration level, using the experimental
device ch4 instead of ch3 solves the problem : just remove comment on
"--with-device=ch4:ucx".  Reversely, with mpich 3.4a2 (for which ch4 is
de default) setting --with-device=ch3 leads to the same failure as with
3.3.2.

I also checked sock channel for ch3 : with-device=ch3:sock, but then on
my laptop, scotch tests hang at

mpirun -n 3 ./test_scotch_dgraph_check data/bump.grf

For the moment, there isn't a stable 3.4 version yet for mpich. I had a
try with the latest 3.4b1 but a test failed...


Maurice




Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Fri, 23 Oct 2020 15:27:02 GMT) Full text and rfc822 format available.

Message #80 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludovic.courtes <at> inria.fr>
To: Maurice Brémond <Maurice.Bremond <at> inria.fr>
Cc: 39588 <at> debbugs.gnu.org, zimoun <zimon.toutoune <at> gmail.com>
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Fri, 23 Oct 2020 17:26:36 +0200
[Message part 1 (text/plain, inline)]
Hi Maurice,

Maurice Brémond <Maurice.Bremond <at> inria.fr> skribis:

> Apparently at the mpich configuration level, using the experimental
> device ch4 instead of ch3 solves the problem : just remove comment on
> "--with-device=ch4:ucx".  Reversely, with mpich 3.4a2 (for which ch4 is
> de default) setting --with-device=ch3 leads to the same failure as with
> 3.3.2.

Nice, we have a way forward.

With the patch below, I have successfully built:

  guix build mumps-openmpi --with-input=openmpi=mpich

and I confirm that despite the name it depends exclusively on MPICH.
:-)

If that’s fine with you I’ll go ahead and commit it; let me know!

> I also checked sock channel for ch3 : with-device=ch3:sock, but then on
> my laptop, scotch tests hang at
>
> mpirun -n 3 ./test_scotch_dgraph_check data/bump.grf
>
> For the moment, there isn't a stable 3.4 version yet for mpich. I had a
> try with the latest 3.4b1 but a test failed...

We’ll see, but having a solution that works with 3.3 and is likely to
work with 3.4 is good.

I guess we should also check whether we’re obtaining the expected
performance.  This builds fine too:

  guix build intel-mpi-benchmarks --with-input=openmpi=mpich

Thank you!

Ludo’.

[Message part 2 (text/x-patch, inline)]
diff --git a/gnu/packages/mpi.scm b/gnu/packages/mpi.scm
index 06a82cce95..9035147441 100644
--- a/gnu/packages/mpi.scm
+++ b/gnu/packages/mpi.scm
@@ -436,7 +436,12 @@ arrays) that expose a buffer interface.")
      `(#:configure-flags
        (list "--disable-silent-rules"             ;let's see what's happening
              "--enable-debuginfo"
-             ;; "--with-device=ch4:ucx" ; --with-device=ch4:ofi segfaults in tests
+
+             ;; Default to "ch4", as will be the case in 3.4.  It also works
+             ;; around issues when running test suites of packages that use
+             ;; MPICH: <https://issues.guix.gnu.org/39588#15>.
+             "--with-device=ch4:ucx" ; --with-device=ch4:ofi segfaults in tests
+
              (string-append "--with-hwloc-prefix="
                             (assoc-ref %build-inputs "hwloc"))

Information forwarded to guix-patches <at> gnu.org:
bug#39588; Package guix-patches. (Fri, 23 Oct 2020 17:06:01 GMT) Full text and rfc822 format available.

Message #83 received at 39588 <at> debbugs.gnu.org (full text, mbox):

From: Maurice Brémond <Maurice.Bremond <at> inria.fr>
To: Ludovic Courtès <ludovic.courtes <at> inria.fr>
Cc: 39588 <at> debbugs.gnu.org, zimoun <zimon.toutoune <at> gmail.com>
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Fri, 23 Oct 2020 19:04:38 +0200
>If that’s fine with you I’ll go ahead and commit it; let me know!
It's ok for me and what I do with it.

Bon week-end!





Reply sent to Ludovic Courtès <ludovic.courtes <at> inria.fr>:
You have taken responsibility. (Mon, 02 Nov 2020 14:03:02 GMT) Full text and rfc822 format available.

Notification sent to Maurice Brémond <Maurice.Bremond <at> inria.fr>:
bug acknowledged by developer. (Mon, 02 Nov 2020 14:03:02 GMT) Full text and rfc822 format available.

Message #88 received at 39588-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludovic.courtes <at> inria.fr>
To: Maurice Brémond <Maurice.Bremond <at> inria.fr>
Cc: 39588-done <at> debbugs.gnu.org, zimoun <zimon.toutoune <at> gmail.com>
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
 pt-scotch-mpich, python-mpi4py-mpich
Date: Mon, 02 Nov 2020 15:02:08 +0100
Salut,

Maurice Brémond <Maurice.Bremond <at> inria.fr> skribis:

>>If that’s fine with you I’ll go ahead and commit it; let me know!
> It's ok for me and what I do with it.
>
> Bon week-end!

Finally pushed as c73496f433044a76003b33c3855bb35ecd0df87f, thanks!

I’m closing this bug, let’s open a new one if we need to further discuss
MPI support in Guix.

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 01 Dec 2020 12:24:11 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 140 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.