[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpic
From: |
Maurice Brémond |
Subject: |
[bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpich, python-mpi4py-mpich |
Date: |
Mon, 19 Oct 2020 15:46:20 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) |
Hello,
A build of mumps-openmpi with mpich fails:
guix time-machine -- build mumps-openmpi --with-input=openmpi=mpich
[...]
mpirun -n 3 ./test_scotch_dgraph_check data/bump.grf
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in
MPID_nem_tcp_init:373
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in
MPID_nem_tcp_init:373
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in
MPID_nem_tcp_init:373
Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(586)..............:
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................:
MPID_nem_init(324).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(401):
MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0)
This is what Ludo reproduced:
From: Ludovic Courtès <ludo@gnu.org>
Subject: Re: [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich,
pt-scotch-mpich, python-mpi4py-mpich
To: Maurice Brémond <Maurice.Bremond@inria.fr>
Cc: 39588@debbugs.gnu.org, zimoun <zimon.toutoune@gmail.com>
Date: Fri, 21 Feb 2020 12:32:44 +0100 (34 weeks, 3 days, 2 hours ago)
Hi,
I actually managed to reproduce it with a minimal test case (attached):
$ guix build -f mpich-test.scm
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
La jena derivo estos konstruata:
/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv
building /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv...
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215:
expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215:
expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215:
expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215:
expr: command not found
/gnu/store/pkbg6kllx5xb8vb6kwrwm7qm4rnpmhia-mpich-3.3.2/bin/mpicc: line 215:
expr: command not found
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in
MPID_nem_tcp_init:373
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in
MPID_nem_tcp_init:373
Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(586)..............:
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................:
MPID_nem_init(324).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(401):
MPID_nem_tcp_init(373).............: gethostbyname failed, localhost (errno 0)
Backtrace:
1 (primitive-load "/gnu/store/iykxzg1n018sigd4c23kx1c4ngz?")
In guix/build/utils.scm:
652:6 0 (invoke _ . _)
guix/build/utils.scm:652:6: In procedure invoke:
Throw to key `srfi-34' with args `(#<condition &invoke-error [program:
"mpiexec" arguments: ("-np" "2"
"/gnu/store/8i1dci1wxd6c0q6a2cz4kgb8adfk8rrz-mpi-init") exit-status: 15
term-signal: #f stop-signal: #f] 7ffff6022f40>)'.
builder for `/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv' failed
with exit code 1
build of /gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv failed
View build log at
'/var/log/guix/drvs/rg/r7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv.bz2'.
guix build: error: build of
`/gnu/store/rgr7wnxbgxnp6s96zcnb4ryn3rqfcl7b-mpi-init.drv' failed
The same program outside the container works just fine:
$ guix environment --ad-hoc mpich -- mpiexec -np 2
"/gnu/store/8i1dci1wxd6c0q6a2cz4kgb8adfk8rrz-mpi-init"
np = 2, rank = 0
np = 2, rank = 1
‘MPL_get_sockaddr’ uses ‘getaddrinfo’ for host name lookup.
Interestingly, ‘getaddrinfo’ fails in the build environment when passed
the flags that ‘MPL_get_sockaddr’ uses:
(computed-file "getaddrinfo"
#~(pk #$output
(getaddrinfo "localhost" #f
(logior AI_ADDRCONFIG AI_V4MAPPED)
AF_INET
SOCK_STREAM
IPPROTO_TCP)))
However, if you comment AF_INET, SOCK_STREAM, and IPPROTO_TCP, it works.
Now we need to see why the ‘ai_family’ hint is causing troubles in
glibc, and perhaps in parallel try to work around it in MPICH…
Ludo’.
PS: I’ll be mostly away from keyboard in the coming days.
(use-modules (guix) (gnu))
(define code
(plain-file "mpi.c" "
#include <assert.h>
#include <stdio.h>
#include <mpi.h>
int main (int argc, char *argv[]) {
int err, np, rank;
err = MPI_Init (&argc, &argv);
assert (err == 0);
err = MPI_Comm_size(MPI_COMM_WORLD, &np);
assert (err == 0);
err = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
assert (err == 0);
printf (\"np = %i, rank = %i\\n\", np, rank);
return 0;
} "))
(define toolchain (specification->package "gcc-toolchain"))
(define mpich (specification->package "mpich"))
(computed-file "mpi-init"
(with-imported-modules '((guix build utils))
#~(begin
(use-modules (guix build utils))
(setenv "PATH"
(string-append #$(file-append toolchain "/bin") ":"
#$(file-append mpich "/bin")))
(setenv "CPATH" #$(file-append mpich "/include"))
(setenv "LIBRARY_PATH"
(string-append #$(file-append mpich "/lib") ":"
#$(file-append toolchain "/lib")))
(invoke "mpicc" "-o" #$output "-Wall" "-g"
#$code)
;; Run the MPI code in the build environment.
(invoke "mpiexec" "-np" "2" #$output))))
Note that it is ok with the raw mpich patch
guix time-machine --commit=398ec3c1e265a3f89ed07987f33b264db82e4080 --
time-machine --url=https://gitlab.inria.fr/bremond/guix.git --branch=add-mpich
-- build mumps-openmpi --with-input=openmpi=mpich
I tried a build with the same hwloc as the embedded commit
f7b08df258c2e7d04ca2035ddd55a1de91f806d4
(the HEAD used for hwloc in mpich) but the result is the same:
guix time-machine --commit=398ec3c1e265a3f89ed07987f33b264db82e4080 --
time-machine --url=https://gitlab.inria.fr/bremond/guix.git --branch=test-mpich
-- build mumps-openmpi --with-input=openmpi=mpich
(the 2 steps time-machine needed is another question...)
Maurice
- [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpich, python-mpi4py-mpich, zimoun, 2020/10/15
- [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpich, python-mpi4py-mpich, Ludovic Courtès, 2020/10/16
- [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpich, python-mpi4py-mpich, zimoun, 2020/10/16
- [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpich, python-mpi4py-mpich,
Maurice Brémond <=
- [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpich, python-mpi4py-mpich, Ludovic Courtès, 2020/10/20
- [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpich, python-mpi4py-mpich, Maurice Brémond, 2020/10/23
- [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpich, python-mpi4py-mpich, Ludovic Courtès, 2020/10/23
- [bug#39588] gnu: Add mpich, scalapack-mpich, mumps-mpich, pt-scotch-mpich, python-mpi4py-mpich, Maurice Brémond, 2020/10/23
- [bug#39588] (off-topic) double time-machine explanations, zimoun, 2020/10/21
- [bug#39588] (off-topic) double time-machine explanations, Maurice Brémond, 2020/10/23