[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug#61701] [PATCH] doc: Propose new cookbook section for reproducible r
From: |
kyle |
Subject: |
[bug#61701] [PATCH] doc: Propose new cookbook section for reproducible research. |
Date: |
Wed, 22 Feb 2023 05:17:29 +0000 |
From: Kyle Andrews <kyle@posteo.net>
The intent was to cover the most common cases where R and python using
researchers could rapidly achieve the benefits of reproducibility.
---
doc/guix-cookbook.texi | 174 +++++++++++++++++++++++++++++++++++
guix/build-system/python.scm | 1 +
2 files changed, 175 insertions(+)
diff --git a/doc/guix-cookbook.texi b/doc/guix-cookbook.texi
index b9fb916f4a..8a10bcbec7 100644
--- a/doc/guix-cookbook.texi
+++ b/doc/guix-cookbook.texi
@@ -114,6 +114,7 @@ Top
Environment management
+* Reproducible Research in Practice:: Write manifests to create reproducible
environments.
* Guix environment via direnv:: Setup Guix environment with direnv
Installing Guix on a Cluster
@@ -3538,9 +3539,182 @@ Environment management
demonstrate such utilities.
@menu
+* Reproducible Research in Practice:: Write manifests to create reproducible
environments
* Guix environment via direnv:: Setup Guix environment with direnv
@end menu
+@node Reproducible Research in Practice
+@section Common scientific software environments
+
+Many researchers write applied scientific software supported by a
+mixture of more generic tools developed by teams written within the R
+and Python ecosystems and supporting shell utilities. Even researchers
+who predominantly stick to using just R or just python often have to use
+both R and python at the same time when collaborating with others. This
+tutorial covers strategies for creating manifests to handle such
+situations.
+
+Widely used R packages are hosted on CRAN, which employs a strict test
+suite backed by continuous integration infrastructure for the latest R
+version. A positive result of this rigid discipline is that most R
+packages from the same period of time will interoperate well together
+when used with a particular R version. This means there is a clear
+low-complexity target for achieving a reproducible environment.
+
+Writing a manifest for packaging R code alone requires only minimal
+knowledge of the Guix infrastructure. This stub should work for most
+cases involving the R packages already in Guix.
+
+@example
+(use-modules
+ (gnu packages cran)
+ (gnu packages statistics))
+
+(packages->manifest
+ (list r r-tidyverse))
+
+R packages are defined predominantly inside of gnu/packages/cran.scm and
+gnu/packages/statistics.scm files under a guix source repository.
+
+This manifest can be run with the basic guix shell command:
+
+@example
+guix shell --manifest=manifest.scm --container
+@end example
+
+Please remember at the end to pin your channels so that others in the
+future know how to recover your exact Guix environment.
+
+@example
+guix describe --format=channels > channels.scm
+@end example
+
+This can be done with Guix time machine:
+
+@example
+guix time-machine --channels=channels.scm \
+ -- guix shell --manifest=manifest.scm --container
+@end example
+
+In contrast, the python scientific ecosystem is far less
+standardized. There is no effort made to integrate all python packages
+together. While there is a latest python version, it is less often less
+dominantly used for various reasons such as the fact that python tends
+to be employed with much larger teams than R is. This makes packaging up
+reproducible python environments much more difficult. Adding R together
+with python as a mixture complicates things still further. However, we
+have to be mindful of the goals of reproducible research.
+
+If reproducibility becomes an end in itself and not a catlyst towards
+faster discovery, then Guix will be a non-starter for scientists. Their
+goal is to develop useful understanding about particular aspects of the
+world.
+
+Thankfully, three common scenarios cover the vast majority of
+needs. These are:
+
+@itemize
+@item
+combining standard package definitions with custom package definitions
+@item
+combining package definitions from the current revision with other revisions
+@item
+combining package variants which need a modified build-system
+@end itemize
+
+In the rest of the tutorial we develop a manifest which tackles all
+three of these common issues. The hope is that if you see the hardest
+possible common situation as being readily solvable without writing
+thousands of lines of code, researchers will clearly see it as worth the
+effort which will not pose a significant detour from the main line of
+their research.
+
+@example
+(use-modules
+ (guix packages)
+ (guix download)
+ (guix licenses)
+ (guix profiles)
+ (gnu packages)
+ (gnu packages cran)
+ (guix inferior)
+ (guix channels)
+ (guix build-system python))
+
+;; guix import pypi APTED
+(define python-apted
+ (package
+ (name "python-apted")
+ (version "1.0.3")
+ (source (origin
+ (method url-fetch)
+ (uri (pypi-uri "apted" version))
+ (sha256
+ (base32
+ "1sawf6s5c64fgnliwy5w5yxliq2fc215m6alisl7yiflwa0m3ymy"))))
+ (build-system python-build-system)
+ (home-page "https://github.com/JoaoFelipe/apted")
+ (synopsis "APTED algorithm for the Tree Edit Distance")
+ (description "APTED algorithm for the Tree Edit Distance")
+ (license expat)))
+
+(define last-guix-with-python-3.6
+ (list
+ (channel
+ (name 'guix)
+ (url "https://git.savannah.gnu.org/git/guix.git")
+ (commit
+ "d66146073def03d1a3d61607bc6b77997284904b"))))
+
+(define connection-to-last-guix-with-python-3.6
+ (inferior-for-channels last-guix-with-python-3.6))
+
+(define first car)
+
+(define python-3.6
+ (first
+ (lookup-inferior-packages
+ connection-to-last-guix-with-python-3.6 "python")))
+
+(define python3.6-numpy
+ (first
+ (lookup-inferior-packages
+ connection-to-last-guix-with-python-3.6 "python-numpy")))
+
+(define included-packages
+ (list r r-reticulate))
+
+(define inferior-packages
+ (list python-3.6 python3.6-numpy))
+
+(define package-with-python-3.6
+ (package-with-explicit-python python-3.6
+ "python-" "python3.6-" 'python3-variant))
+
+(define custom-variant-packages
+ (list (package-with-python-3.6 python-apted)))
+
+(concatenate-manifest
+ (map packages->manifest
+ (list
+ included-packages
+ inferior-packages
+ custom-variant-packages)))
+@end example
+
+This should produce a profile with the latest R and an older python
+3.6. These should be able to interoperate with code like:
+
+@example
+library(reticulate)
+use_python("python")
+apted = import("apted")
+t1 = '{a{b}{c}}'
+t2 = '{a{b{d}}}'
+metric = apted$APTED(t1, t2)
+distance = metric$compute_edit_distance()
+@end example
+
@node Guix environment via direnv
@section Guix environment via direnv
diff --git a/guix/build-system/python.scm b/guix/build-system/python.scm
index c8f04b2298..d4aaab906d 100644
--- a/guix/build-system/python.scm
+++ b/guix/build-system/python.scm
@@ -36,6 +36,7 @@ (define-module (guix build-system python)
#:use-module (srfi srfi-1)
#:use-module (srfi srfi-26)
#:export (%python-build-system-modules
+ package-with-explicit-python
package-with-python2
strip-python2-variant
default-python
--
2.37.2
- [bug#61701] [PATCH] doc: Propose new cookbook section for reproducible research.,
kyle <=