[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#60573] [PATCH] doc: cookbook: Add "Installing Guix on a Cluster" ch
From: |
Ludovic Courtès |
Subject: |
[bug#60573] [PATCH] doc: cookbook: Add "Installing Guix on a Cluster" chapter. |
Date: |
Thu, 5 Jan 2023 12:47:23 +0100 |
From: Ludovic Courtès <ludovic.courtes@inria.fr>
This is derived from the article at
<https://hpc.guix.info/blog/2017/11/installing-guix-on-a-cluster/>, with
clarifications and updates.
* doc/guix-cookbook.texi (Installing Guix on a Cluster): New chapter.
---
doc/guix-cookbook.texi | 433 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 414 insertions(+), 19 deletions(-)
Hello!
This turns <https://hpc.guix.info/blog/2017/11/installing-guix-on-a-cluster/>
into a living document that we can keep up-to-date.
Comments? Suggestions?
Ludo'.
diff --git a/doc/guix-cookbook.texi b/doc/guix-cookbook.texi
index 795e7d3b25..5e7209c9c5 100644
--- a/doc/guix-cookbook.texi
+++ b/doc/guix-cookbook.texi
@@ -21,7 +21,8 @@ Copyright @copyright{} 2020 Brice Waegeneire@*
Copyright @copyright{} 2020 André Batista@*
Copyright @copyright{} 2020 Christine Lemmer-Webber@*
Copyright @copyright{} 2021 Joshua Branson@*
-Copyright @copyright{} 2022 Maxim Cournoyer*
+Copyright @copyright{} 2022 Maxim Cournoyer@*
+Copyright @copyright{} 2023 Ludovic Courtès
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
@@ -73,8 +74,9 @@ Weblate} (@pxref{Translating Guix,,, guix, GNU Guix reference
manual}).
* Packaging:: Packaging tutorials
* System Configuration:: Customizing the GNU System
* Containers:: Isolated environments and nested systems
-* Advanced package management:: Power to the users!
+* Advanced package management:: Power to the users!
* Environment management:: Control environment
+* Installing Guix on a Cluster:: High-performance computing.
* Acknowledgments:: Thanks!
* GNU Free Documentation License:: The license of this document.
@@ -83,28 +85,45 @@ Weblate} (@pxref{Translating Guix,,, guix, GNU Guix
reference manual}).
@detailmenu
--- The Detailed Node Listing ---
-Scheme tutorials
-
-* A Scheme Crash Course:: Learn the basics of Scheme
-
Packaging
-* Packaging Tutorial:: Let's add a package to Guix!
+* Packaging Tutorial:: A tutorial on how to add packages to Guix.
System Configuration
-* Auto-Login to a Specific TTY:: Automatically Login a User to a Specific
TTY
-* Customizing the Kernel:: Creating and using a custom Linux kernel
on Guix System.
-* Guix System Image API:: Customizing images to target specific
platforms.
-* Using security keys:: How to use security keys with Guix System.
-* Connecting to Wireguard VPN:: Connecting to a Wireguard VPN.
-* Customizing a Window Manager:: Handle customization of a Window manager
on Guix System.
-* Running Guix on a Linode Server:: Running Guix on a Linode Server. Running
Guix on a Linode Server
-* Setting up a bind mount:: Setting up a bind mount in the
file-systems definition.
-* Getting substitutes from Tor:: Configuring Guix daemon to get substitutes
through Tor.
-* Setting up NGINX with Lua:: Configuring NGINX web-server to load Lua
modules.
+* Auto-Login to a Specific TTY:: Automatically Login a User to a Specific TTY
+* Customizing the Kernel:: Creating and using a custom Linux kernel on
Guix System.
+* Guix System Image API:: Customizing images to target specific
platforms.
+* Using security keys:: How to use security keys with Guix System.
+* Connecting to Wireguard VPN:: Connecting to a Wireguard VPN.
+* Customizing a Window Manager:: Handle customization of a Window manager on
Guix System.
+* Running Guix on a Linode Server:: Running Guix on a Linode Server
+* Setting up a bind mount:: Setting up a bind mount in the file-systems
definition.
+* Getting substitutes from Tor:: Configuring Guix daemon to get substitutes
through Tor.
+* Setting up NGINX with Lua:: Configuring NGINX web-server to load Lua modules.
* Music Server with Bluetooth Audio:: Headless music player with Bluetooth
output.
+Containers
+
+* Guix Containers:: Perfectly isolated environments
+* Guix System Containers:: A system inside your system
+
+Advanced package management
+
+* Guix Profiles in Practice:: Strategies for multiple profiles and
manifests.
+
+Environment management
+
+* Guix environment via direnv:: Setup Guix environment with direnv
+
+Installing Guix on a Cluster
+
+* Setting Up a Head Node:: The node that runs the daemon.
+* Setting Up Compute Nodes:: Client nodes.
+* Cluster Network Access:: Dealing with network access restrictions.
+* Cluster Disk Usage:: Disk usage considerations.
+* Cluster Security Considerations:: Keeping the cluster secure.
+
@end detailmenu
@end menu
@@ -3637,6 +3656,380 @@ will have predefined environment variables and
procedures.
Run @command{direnv allow} to setup the environment for the first time.
+
+@c *********************************************************************
+@node Installing Guix on a Cluster
+@chapter Installing Guix on a Cluster
+
+@cindex cluster installation
+@cindex high-performance computing, HPC
+@cindex HPC, high-performance computing
+Guix is appealing to scientists and @acronym{HPC, high-performance
+computing} practitioners: it makes it easy to deploy potentially complex
+software stacks, and it lets you do so in a reproducible fashion---you
+can redeploy the exact same software on different machines and at
+different points in time.
+
+In this chapter we look at how a cluster sysadmin can install Guix for
+system-wide use, such that it can be used on all the cluster nodes, and
+discuss the various tradeoffs@footnote{This chapter is adapted from a
+@uref{https://hpc.guix.info/blog/2017/11/installing-guix-on-a-cluster/,
+blog post published on the Guix-HPC web site in 2017}.}.
+
+@quotation Note
+Here we assume that the cluster is running a GNU/Linux distro other than
+Guix System and that we are going to install Guix on top of it.
+@end quotation
+
+@menu
+* Setting Up a Head Node:: The node that runs the daemon.
+* Setting Up Compute Nodes:: Client nodes.
+* Cluster Network Access:: Dealing with network access restrictions.
+* Cluster Disk Usage:: Disk usage considerations.
+* Cluster Security Considerations:: Keeping the cluster secure.
+@end menu
+
+@node Setting Up a Head Node
+@section Setting Up a Head Node
+
+The recommended approach is to set up one @emph{head node} running
+@command{guix-daemon} and exporting @file{/gnu/store} over NFS to
+compute nodes.
+
+Remember that @command{guix-daemon} is responsible for spawning build
+processes and downloads on behalf of clients (@pxref{Invoking
+guix-daemon,,, guix, GNU Guix Reference Manual}), and more generally
+accessing @file{/gnu/store}, which contains all the package binaries
+built by all the users (@pxref{The Store,,, guix, GNU Guix Reference
+Manual}). ``Client'' here refers to all the Guix commands that users
+see, such as @code{guix install}. On a cluster, these commands may be
+running on the compute nodes and we'll want them to talk to the head
+node's @code{guix-daemon} instance.
+
+To begin with, the head node can be installed following the usual binary
+installation instructions (@pxref{Binary Installation,,, guix, GNU Guix
+Reference Manual}). Thanks to the installation script, this should be
+quick. Once installation is complete, we need to make some adjustments.
+
+Since we want @code{guix-daemon} to be reachable not just from the head
+node but also from the compute nodes, we need to arrange so that it
+listens for connections over TCP/IP. To do that, we'll edit the systemd
+startup file for @command{guix-daemon},
+@file{/etc/systemd/system/guix-daemon.service}, and add a
+@code{--listen} argument to the @code{ExecStart} line so that it looks
+something like this:
+
+@example
+ExecStart=/var/guix/profiles/per-user/root/current-guix/bin/guix-daemon
--build-users-group=guixbuild --listen=/var/guix/daemon-socket/socket
--listen=0.0.0.0
+@end example
+
+For these changes to take effect, the service needs to be restarted:
+
+@example
+systemctl daemon-reload
+systemctl restart guix-daemon
+@end example
+
+@quotation Note
+The @code{--listen=0.0.0.0} bit means that @code{guix-daemon} will
+process @emph{all} incoming TCP connections on port 44146
+(@pxref{Invoking guix-daemon,,, guix, GNU Guix Reference Manual}). This
+is usually fine in a cluster setup where the head node is reachable
+exclusively from the cluster's local area network---you don't want that
+to be exposed to the Internet!
+@end quotation
+
+The next step is to define our NFS exports in
+@uref{https://linux.die.net/man/5/exports,@file{/etc/exports}} by adding
+something along these lines:
+
+@example
+/gnu/store *(ro)
+/var/guix *(rw, async)
+/var/log/guix *(ro)
+@end example
+
+The @file{/gnu/store} directory can be exported read-only since only
+@command{guix-daemon} on the master node will ever modify it.
+@file{/var/guix} contains @emph{user profiles} as managed by @code{guix
+package}; thus, to allow users to install packages with @code{guix
+package}, this must be read-write.
+
+Users can create as many profiles as they like in addition to the
+default profile, @file{~/.guix-profile}. For instance, @code{guix
+package -p ~/dev/python-dev -i python} installs Python in a profile
+reachable from the @code{~/dev/python-dev} symlink. To make sure that
+this profile is protected from garbage collection---i.e., that Python
+will not be removed from @file{/gnu/store} while this profile exists---,
+@emph{home directories should be mounted on the head node} as well so
+that @code{guix-daemon} knows about these non-standard profiles and
+avoids collecting software they refer to.
+
+It may be a good idea to periodically remove unused bits from
+@file{/gnu/store} by running @command{guix gc} (@pxref{Invoking guix
+gc,,, guix, GNU Guix Reference Manual}). This can be done by adding a
+crontab entry on the head node:
+
+@example
+root@@master# crontab -e
+@end example
+
+@noindent
+... with something like this:
+
+@example
+# Every day at 5AM, run the garbage collector to make sure
+# at least 10 GB are free on /gnu/store.
+0 5 * * 1 /usr/local/bin/guix gc -F10G
+@end example
+
+We're done with the head node! Let's look at compute nodes now.
+
+@node Setting Up Compute Nodes
+@section Setting Up Compute Nodes
+
+First of all, we need compute nodes to mount those NFS directories that
+the head node exports. This can be done by adding the following lines
+to @uref{https://linux.die.net/man/5/fstab,@file{/etc/fstab}}:
+
+@example
+@var{head-node}:/gnu/store /gnu/store nfs defaults,_netdev,vers=3 0 0
+@var{head-node}:/var/guix /var/guix nfs defaults,_netdev,vers=3 0 0
+@var{head-node}:/var/log/guix /var/log/guix nfs defaults,_netdev,vers=3 0 0
+@end example
+
+@noindent
+... where @var{head-node} is the name or IP address of your head node.
+From there on, assuming the mount points exist, you should be able to
+mount each of these on the compute nodes.
+
+Next, we need to provide a default @command{guix} command that users can
+run when they first connect to the cluster (eventually they will invoke
+@command{guix pull}, which will provide them with their ``own''
+@command{guix} command). Similar to what the binary installation script
+did on the head node, we'll store that in @file{/usr/local/bin}:
+
+@example
+mkdir -p /usr/local/bin
+ln -s /var/guix/profiles/per-user/root/current-guix/bin/guix \
+ /usr/local/bin/guix
+@end example
+
+We then need to tell @code{guix} to talk to the daemon running on our
+master node, by adding these lines to @code{/etc/profile}:
+
+@example
+GUIX_DAEMON_SOCKET="guix://@var{head-node}"
+export GUIX_DAEMON_SOCKET
+@end example
+
+To avoid warnings and make sure @code{guix} uses the right locale, we
+need to tell it to use locale data provided by Guix (@pxref{Application
+Setup,,, guix, GNU Guix Reference Manual}):
+
+@example
+GUIX_LOCPATH=/var/guix/profiles/per-user/root/guix-profile/lib/locale
+export GUIX_LOCPATH
+
+# Here we must use a valid locale name. Try "ls $GUIX_LOCPATH/*"
+# to see what names can be used.
+LC_ALL=fr_FR.utf8
+export LC_ALL
+@end example
+
+For convenience, @code{guix package} automatically generates
+@file{~/.guix-profile/etc/profile}, which defines all the environment
+variables necessary to use the packages---@code{PATH},
+@code{C_INCLUDE_PATH}, @code{PYTHONPATH}, etc. Thus it's a good idea to
+source it from @code{/etc/profile}:
+
+@example
+GUIX_PROFILE="$HOME/.guix-profile"
+if [ -f "$GUIX_PROFILE/etc/profile" ]; then
+ . "$GUIX_PROFILE/etc/profile"
+fi
+@end example
+
+Last but not least, Guix provides command-line completion notably for
+Bash and zsh. In @code{/etc/bashrc}, consider adding this line:
+
+@verbatim
+. /var/guix/profiles/per-user/root/current-guix/etc/bash_completion.d/guix
+@end verbatim
+
+Voilà!
+
+You can check that everything's in place by logging in on a compute node
+and running:
+
+@example
+guix install hello
+@end example
+
+The daemon on the head node should download pre-built binaries on your
+behalf and unpack them in @file{/gnu/store}, and @command{guix install}
+should create @file{~/.guix-profile} containing the
+@file{~/.guix-profile/bin/hello} command.
+
+@node Cluster Network Access
+@section Network Access
+
+Guix requires network access to download source code and pre-built
+binaries. The good news is that only the head node needs that since
+compute nodes simply delegate to it.
+
+It is customary for cluster nodes to have access at best to a
+@emph{white list} of hosts. Our head node needs at least
+@code{ci.guix.gnu.org} in this white list since this is where it gets
+pre-built binaries from by default, for all the packages that are in
+Guix proper.
+
+Incidentally, @code{ci.guix.gnu.org} also serves as a
+@emph{content-addressed mirror} of the source code of those packages.
+Consequently, it is sufficient to have @emph{only}
+@code{ci.guix.gnu.org} in that white list.
+
+Software packages maintained in a separate repository such as one of the
+various @uref{https://hpc.guix.info/channels, HPC channels} are of
+course unavailable from @code{ci.guix.gnu.org}. For these packages, you
+may want to extend the white list such that source and pre-built
+binaries (assuming this-party servers provide binaries for these
+packages) can be downloaded. As a last resort, users can always
+download source on their workstation and add it to the cluster's
+@file{/gnu/store}, like this:
+
+@verbatim
+GUIX_DAEMON_SOCKET=ssh://compute-node.example.org \
+ guix download
http://starpu.gforge.inria.fr/files/starpu-1.2.3/starpu-1.2.3.tar.gz
+@end verbatim
+
+The above command downloads @code{starpu-1.2.3.tar.gz} @emph{and} sends
+it to the cluster's @code{guix-daemon} instance over SSH.
+
+Air-gapped clusters require more work. At the moment, our suggestion
+would be to download all the necessary source code on a workstation
+running Guix. For instance, using the @option{--sources} option of
+@command{guix build} (@pxref{Invoking guix build,,, guix, GNU Guix
+Reference Manual}), the example below downloads all the source code the
+@code{openmpi} package depends on:
+
+@example
+$ guix build --sources=transitive openmpi
+
+@dots{}
+
+/gnu/store/xc17sm60fb8nxadc4qy0c7rqph499z8s-openmpi-1.10.7.tar.bz2
+/gnu/store/s67jx92lpipy2nfj5cz818xv430n4b7w-gcc-5.4.0.tar.xz
+/gnu/store/npw9qh8a46lrxiwh9xwk0wpi3jlzmjnh-gmp-6.0.0a.tar.xz
+/gnu/store/hcz0f4wkdbsvsdky3c0vdvcawhdkyldb-mpfr-3.1.5.tar.xz
+/gnu/store/y9akh452n3p4w2v631nj0injx7y0d68x-mpc-1.0.3.tar.gz
+/gnu/store/6g5c35q8avfnzs3v14dzl54cmrvddjm2-glibc-2.25.tar.xz
+/gnu/store/p9k48dk3dvvk7gads7fk30xc2pxsd66z-hwloc-1.11.8.tar.bz2
+/gnu/store/cry9lqidwfrfmgl0x389cs3syr15p13q-gcc-5.4.0.tar.xz
+/gnu/store/7ak0v3rzpqm2c5q1mp3v7cj0rxz0qakf-libfabric-1.4.1.tar.bz2
+/gnu/store/vh8syjrsilnbfcf582qhmvpg1v3rampf-rdma-core-14.tar.gz
+…
+@end example
+
+(In case you're wondering, that's more than 320@ MiB of
+@emph{compressed} source code.)
+
+We can then make a big archive containing all of this (@pxref{Invoking
+guix archive,,, guix, GNU Guix Reference Manual}):
+
+@verbatim
+$ guix archive --export \
+ `guix build --sources=transitive openmpi` \
+ > openmpi-source-code.nar
+@end verbatim
+
+@dots{} and we can eventually transfer that archive to the cluster on
+removable storage and unpack it there:
+
+@verbatim
+$ guix archive --import < openmpi-source-code.nar
+@end verbatim
+
+This process has to be repeated every time new source code needs to be
+brought to the cluster.
+
+As we write this, the research institutes involved in Guix-HPC do not
+have air-gapped clusters though. If you have experience with such
+setups, we would like to hear feedback and suggestions.
+
+@node Cluster Disk Usage
+@section Disk Usage
+
+@cindex disk usage, on a cluster
+A common concern of sysadmins' is whether this is all going to eat a lot
+of disk space. If anything, if something is going to exhaust disk
+space, it's going to be scientific data sets rather than compiled
+software---that's our experience with almost ten years of Guix usage on
+HPC clusters. Nevertheless, it's worth taking a look at how Guix
+contributes to disk usage.
+
+First, having several versions or variants of a given package in
+@file{/gnu/store} does not necessarily cost much, because
+@command{guix-daemon} implements deduplication of identical files, and
+package variants are likely to have a number of common files.
+
+As mentioned above, we recommend having a cron job to run @code{guix gc}
+periodically, which removes @emph{unused} software from
+@file{/gnu/store}. However, there's always a possibility that users will
+keep lots of software in their profiles, or lots of old generations of
+their profiles, which is ``live'' and cannot be deleted from the
+viewpoint of @command{guix gc}.
+
+The solution to this is for users to regularly remove old generations of
+their profile. For instance, the following command removes generations
+that are more than two-month old:
+
+@example
+guix package --delete-generations=2m
+@end example
+
+Likewise, it's a good idea to invite users to regularly upgrade their
+profile, which can reduce the number of variants of a given piece of
+software stored in @file{/gnu/store}:
+
+@example
+guix pull
+guix upgrade
+@end example
+
+As a last resort, it is always possible for sysadmins to do some of this
+on behalf of their users. Nevertheless, one of the strengths of Guix is
+the freedom and control users get on their software environment, so we
+strongly recommend leaving users in control.
+
+@node Cluster Security Considerations
+@section Security Considerations
+
+@cindex security, on a cluster
+On an HPC cluster, Guix is typically used to manage scientific software.
+Security-critical software such as the operating system kernel and
+system services such as @code{sshd} and the batch scheduler remain under
+control of sysadmins.
+
+The Guix project has a good track record delivering security updates in
+a timely fashion (@pxref{Security Updates,,, guix, GNU Guix Reference
+Manual}). To get security updates, users have to run @code{guix pull &&
+guix upgrade}.
+
+Because Guix uniquely identifies software variants, it is easy to see if
+a vulnerable piece of software is in use. For instance, to check whether
+the glibc@ 2.25 variant without the mitigation patch against
+``@uref{https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt,Stack
+Clash}'', one can check whether user profiles refer to it at all:
+
+@example
+guix gc --referrers /gnu/store/…-glibc-2.25
+@end example
+
+This will report whether profiles exist that refer to this specific
+glibc variant.
+
+
@c *********************************************************************
@node Acknowledgments
@chapter Acknowledgments
@@ -3658,8 +4051,10 @@ information on these fine people. The @file{THANKS}
file lists people
who have helped by reporting bugs, taking care of the infrastructure,
providing artwork and themes, making suggestions, and more---thank you!
-This document includes adapted sections from articles that have previously
-been published on the Guix blog at @uref{https://guix.gnu.org/blog}.
+This document includes adapted sections from articles that have
+previously been published on the Guix blog at
+@uref{https://guix.gnu.org/blog} and on the Guix-HPC blog at
+@uref{https://hpc.guix.info/blog}.
@c *********************************************************************
base-commit: 473692b812b4ab4267d9bddad0fb27787d2112ff
--
2.38.1
- [bug#60573] [PATCH] doc: cookbook: Add "Installing Guix on a Cluster" chapter.,
Ludovic Courtès <=