[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Guix on clusters and in HPC
From: |
Ludovic Courtès |
Subject: |
Guix on clusters and in HPC |
Date: |
Tue, 18 Oct 2016 16:20:43 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) |
Hello,
I’m trying to gather a “wish list” of things to be done to facilitate
the use of Guix on clusters and for high-performance computing (HPC).
Ricardo and I wrote about the advantages, shortcomings, and perspectives
before:
http://elephly.net/posts/2015-04-17-gnu-guix.html
https://hal.inria.fr/hal-01161771/en
I know that Pjotr, Roel, Ben, Eric and maybe others also have experience
and ideas on what should be done (and maybe even code? :-)).
So I’ve come up with an initial list of work items going from the
immediate needs to crazy ideas (batch scheduler integration!) that
hopefully make sense to cluster/HPC people. I’d be happy to get
feedback, suggestions, etc. from whoever is interested!
(The reason I’m asking is that I’m considering submitting a proposal at
Inria to work on some of these things.)
TIA! :-)
Ludo’.
- non-root usage
+ file system virtualization needed
* map ~/.local/gnu/store to /gnu/store
* user name spaces?
* [[https://github.com/proot-me/PRoot/][PRoot]]? but performance problems?
* common interface, like “guix enter” spawns a shell where
/gnu/store is available
+ daemon functionality as a library
* client no longer connects to the daemon, does everything
locally, including direct store accesses
* can use substitutes
+ or plain ’guix-daemon --disable-root’?
+ see
[[http://lists.gnu.org/archive/html/help-guix/2016-06/msg00079.html][discussion
with Ben Woodcroft and Roel]]
- central daemon usage (like at MDC, but improved)
+ describe/define appropriate setup, like:
* daemon runs on front-end node
* clients can connect to daemon from compute nodes, and perform
any operation
* use of distributed file systems: anything to pay attention to?
* how should the front-end offload to compute nodes?
+ technical issues
* daemon needs to be able to listen for connections elsewhere
* client needs to be able to
[[http://debbugs.gnu.org/cgi/bugreport.cgi?bug=20381][connect remotely]]
instead of using
[[http://debbugs.gnu.org/cgi/bugreport.cgi?bug=20381#5][‘socat’ hack]]
* how do we share localstatedir? how do we share /gnu/store?
* how do we share the profile directory?
+ admin/social issues
* daemon runs as root
* daemon needs Internet access
* Ricardo mentions lack of nscd and problems caused by the use of
NSS plugins like [[https://fedoraproject.org/wiki/Features/SSSD][SSSD]]
in this context
+ batch scheduler integration?
* allow users to offload right from their machine to the cluster?
- package variants, experimentation
+ for experiments, as in Section 4.2 of
[[https://hal.inria.fr/hal-01161771/en][the RepPar paper]]
* in the meantime we added
[[https://www.gnu.org/software/guix/manual/html_node/Package-Transformation-Options.html][--with-input
et al.]]; need more?
+ for
[[https://lists.gnu.org/archive/html/guix-devel/2016-10/msg00005.html][CPU-specific
optimizations]]
+ somehow support -mtune=native (and even profile-guided
optimizations?)
+ simplify the API to switch compilers, libcs, etc.
- workflow, reproducible science
+ implement [[http://debbugs.gnu.org/cgi/bugreport.cgi?bug=22629][channels]]
+ provide a way to see which Guix commit is used, like “guix channel
describe”
+ simple ways to
[[https://lists.gnu.org/archive/html/guix-devel/2016-10/msg00701.html][test the
dependents of a package]] (see also
discussion between E. Agullo & A. Enge)
* new transformation options: --with-graft, --with-source
recursive
+ support
[[https://lists.gnu.org/archive/html/guix-devel/2016-05/msg00380.html][workflows
and pipelines]]?
+ add [[https://github.com/galaxyproject/galaxy/issues/2778][Guix support
in Galaxy]]?