[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Guix and openmpi in a container environment
From: |
Todor Kondić |
Subject: |
Re: Guix and openmpi in a container environment |
Date: |
Mon, 27 Jan 2020 12:48:04 +0000 |
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, 27 January 2020 11:54, Todor Kondić <address@hidden> wrote:
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Sunday, 19 January 2020 11:25, Todor Kondić address@hidden wrote:
>
> > I am getting mpirun errors when trying to execute a simple
> > mpirun -np 1 program
> > (where program is e.g. 'ls') command in a container environment.
> > The error is usually:
> > All nodes which are allocated for this job are already filled.
> > which makes no sense, as I am trying this on my workstation (single socket,
> > four cores -- your off-the-shelf i5 cpu) and no scheduling system enabled.
> > I set up the container with this command:
> > guix environment -C -N --ad-hoc -m default.scm
> > where default.scm:
> > (use-modules (guix packages))
> > (specifications->manifest
> > `(;; Utilities
> > "less"
> > "bash"
> > "make"
> > "openssh"
> > "guile"
> > "nano"
> > "glibc-locales"
> > "gcc-toolchain@7.4.0"
> > "gfortran-toolchain@7.4.0"
> > "python"
> > "openmpi"
> > "fftw"
> > "fftw-openmpi"
> > ,@(map (lambda (x) (package-name x)) %base-packages)))
> > Simply installing openmpi (guix package -i openmpi) in my usual Guix
> > profile just works out of the box. So, there has to be some quirk where the
> > openmpi container installation is blind to some settings within the usual
> > environment.
>
> For the environment above,
>
> if the mpirun invocation is changed to provide the hostname
>
> mpirun --host $HOSTNAME:4 -np 4 ls
>
> ls is executed in four processes and the output is four times the contents of
> the current directory as expected.
>
> Of course, ls is not an MPI program. However, testing this elementary fortran
> MPI code,
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> program testrun2
> use mpi
> implicit none
> integer :: ierr
>
> call mpi_init(ierr)
> call mpi_finalize(ierr)
>
> end program testrun2
>
> --------------------------------------------------------------------------------------------------------------------------
>
> fails with runtime errors on any number of processes.
>
> The compilation line was:
> mpif90 test2.f90 -o testrun2
>
> The mpirun command:
> mpirun --host $HOSTNAME:4 -np 4
>
> Let me reiterate, there is no need to declare the host and its maximal number
> of slots in the normal user environment. Also, the runtime errors are gone.
>
> Could it be that the openmpi package needs a few other basic dependencies not
> present in the package declaration for the particular case of a single node
> (normal PC) machine?
>
> Also, I noted that gfortran/mpif90 ignores "CPATH" and "LIBRARY_PATH" env
> variables. I had to specify this explicitly via -I and -L flags to the
> compiler.
After playing around a bit more, I can confirm that pure guix environment does
works. Therefore, my solution is to get rid of -C flag and use --pure when
developing and testing the MPI code on my workstation. Of course, it would be
interesting to find out why OpenMPI stops working inside the "-C" environment.
The closest problem solved on the net that I could find was about the friction
between the new vader shared memory module of the OpenMPI and Docker containers
(https://github.com/open-mpi/ompi/issues/4948). The recommended circumvention
technique did not work, but it feels related.