guix-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

30/66: programming-2022: Expand "Background" section.


From: Ludovic Courtès
Subject: 30/66: programming-2022: Expand "Background" section.
Date: Wed, 29 Jun 2022 11:32:01 -0400 (EDT)

civodul pushed a commit to branch master
in repository maintenance.

commit 494f203a571bf787ef9b2ed7f4e3bde274caae65
Author: Ludovic Courtès <ludo@gnu.org>
AuthorDate: Fri Jan 14 17:49:29 2022 +0100

    programming-2022: Expand "Background" section.
    
    * doc/programming-2022/supply-chain.skb (Background): Add subsections
    and expound.
    * doc/programming-2022/security.sbib: Add references.
---
 doc/programming-2022/security.sbib    |  31 +++++++
 doc/programming-2022/supply-chain.skb | 161 ++++++++++++++++++++++++++--------
 2 files changed, 153 insertions(+), 39 deletions(-)

diff --git a/doc/programming-2022/security.sbib 
b/doc/programming-2022/security.sbib
index 8a07e60..1425fd4 100644
--- a/doc/programming-2022/security.sbib
+++ b/doc/programming-2022/security.sbib
@@ -250,6 +250,37 @@ Thayer")
   (title "Signing commits with GPG")
   (url 
"https://docs.gitlab.com/ce/user/project/repository/gpg_signed_commits/";))
 
+(article courant2022:ocamlboot
+  (author "Nathanaëlle Courant, Julien Lepiller, Gabriel Scherer")
+  (year "2022")
+  (title "Debootstrapping Without Archeology: Stacked Implementations in 
Camlboot")
+  (booktitle "Programming Journal")
+  (issue "3")
+  (volume "6")
+  (notes "to appear"))
+
+(misc wurmus2017:jdk-bootstrap
+  (author "Ricardo Wurmus")
+  (year "2017")
+  (month "June")
+  (title "Building the JDK without Java")
+  (url 
"https://www.freelists.org/post/bootstrappable/Building-the-JDK-without-Java";))
+
+(misc milosavljevic2018:rust-bootstrap
+  (title "Bootstrapping Rust")
+  (author "Danny Milosavljevic")
+  (year "2018")
+  (month "December")
+  (url "https://guix.gnu.org/en/blog/2018/bootstrapping-rust/";))
+
+(misc wurmus2022:bootstrappable-web
+  (title "Bootstrappable Builds")
+  (author "Ricardo Wurmus, Ludovic Courtès, Paul Wise, Gábor Boskovits,
+rain1, Matthew Kraai, Julien Lepiller, Jeremiah Orians, Jelle Licht,
+Jan Nieuwenhuizen")
+  (year "2022")
+  (url "https://bootstrappable.org/";))
+
 #|
 (defun skr-from-bibtex ()
   "Vaguely convert the BibTeX snippets after POINT to SBibTeX."
diff --git a/doc/programming-2022/supply-chain.skb 
b/doc/programming-2022/supply-chain.skb
index 0c6db4a..69e3a4a 100644
--- a/doc/programming-2022/supply-chain.skb
+++ b/doc/programming-2022/supply-chain.skb
@@ -231,35 +231,85 @@ and report on our experience.  Last, ,(numref :text 
[Section]
 ,(emph [package managers]) like Debian's ,(tt [apt]), which allow them
 to install, upgrade, and remove software from a large collection of free
 software packages.  GNU Guix,(footnote (url "https://guix.gnu.org";)) is
-primarily a ,(emph [functional]) package manager that builds upon the
-ideas developed for Nix by Dolstra ,(it [et al.]) ,(ref :bib
-'(dolstra2004:nix courtes2013:functional)).  The term “functional” means
-that software build processes are considered as ,(emph [pure functions]): 
given a
-set of inputs (compiler, libraries, build scripts, and so on), a
-package’s build function is assumed to always produce the same result.
-Build results are stored in an immutable persistent data structure, the
-,(emph [store]), implemented as a single directory, ,(tt [/gnu/store]).
-Each entry in ,(tt [/gnu/store]) has a file name composed of the hash of
-all the build inputs used to produce it, followed by a symbolic name.
-For example, ,(tt [/gnu/store/yr9rk90jf…-gcc-10.3.0]) identifies a
-specific build of GCC 10.3.  A variant of GCC 10.3, for instance one
-using different build options or different dependencies, would get a
-different hash.  Thus, each store file name uniquely identifies build
-results.  This model is the foundation of ,(emph [end-to-end provenance
-tracking]): Guix records and uniquely identifies the inputs leading to
-build results available in ,(tt [/gnu/store]).])
-      (p [Providing more than 18,000 software packages today, Guix is
+such a tool, though it can be thought of more broadly as a toolbox for a
+software deployment with salient features and processes that improve
+security: a foundation for ,(emph [reproducible builds]), and what we
+call ,(emph [bootstrappable builds]).])
+      
+      (section :title [A Deployment Toolbox]
+
+        (p [Guix provides a command-line interface similar to that of
+other package managers: ,(tt [guix install python]), for instance,
+installs the Python interpreter, ,(tt [guix pull]) updates Guix itself
+and the set of available packages, and ,(tt [guix upgrade]) upgrades
+previously-installed packages to their latest available version.
+Package management is per-user rather than system-wide; it does not
+require system administrator privileges, nor does it require mutual
+trust among users.])
+
+        (p [Providing more than 20,000 software packages today, Guix is
 used as a general purpose day-to-day GNU/Linux distribution that
 provides the additional safety net of ,(emph [transactional upgrades and
-rollbacks]): because build results are kept in the store by default, any
-new deployment, of individual packages or whole systems, can be rolled
-back ,(ref :bib '(dolstra2004:nix courtes2013:functional)).  Its ability
-to reproduce software environments, bit for bit, at different points in
-time and on different machines, makes it a tool of choice in support of
-reproducible computational experiments and software engineering ,(ref
-:bib 'hinsen2020:staged-computation).])
-
-      (p [Guix, like Nix and unlike Debian or Fedora, is essentially a
+rollbacks]) for all software deployment operations.  For example, if an
+upgrade has undesired effects, users can run ,(tt [guix package
+--roll-back]) to immediately restore packages as they were before the
+upgrade.  Its ability to reproduce software environments, bit for bit,
+at different points in time and on different machines, makes it a tool
+of choice in support of reproducible computational experiments and
+software engineering ,(ref :bib 'hinsen2020:staged-computation).])
+        
+        (p [Guix can be used on top of another system; the only
+requirement is that the system runs the Linux kernel—be it Android or a
+GNU/Linux distribution.  Guix packages stand alone: they provide all the
+user-land software they need, down to the C library; this guarantees
+they behave the same on any system.])
+
+        (p [There are other tools beyond the “package manager”
+interface.  The ,(tt [guix pack]) command, for example, creates
+standalone ,(emph [application bundles]) or ,(emph [container images])
+providing one or more software packages and all the packages they depend
+on at run time.  The container images can be loaded by Docker, podman,
+and similar “container tools” to run the software on any other
+machine.])
+                
+        (p [Last, Guix can be used as a standalone GNU/Linux
+distribution called Guix System.  Its salient feature are that it lets
+users declare the ,(emph [whole system configuration])—from user
+accounts, to services and installed packages—using a domain-specific
+language (DSL) embedded in Scheme, a functional programming language of
+the Lisp family ,(ref :bib 'sperber09:r6rs).  The ,(tt [guix system
+reconfigure]) command changes the running system to match the
+user-provided configuration.  This is an atomic operation and users can
+always roll back to an older “generation” of the system, should anything
+go wrong.  The ,(tt [guix system image]) command can create system
+images in a variety of formats, including the QCOW2 format commonly-used
+for virtual machines (VMs) and emulators such as QEMU.  ,(tt [guix
+deploy]) goes a step further and can deploy Guix System ,(emph [on a set
+of machines]), be it over secure shell (SSH) connections or using the
+interfaces of a virtual private server (VPS) provider.]))
+      
+      (section :title [Reproducible Builds]
+
+        (p [At its core, Guix is a ,(emph [functional]) deployment tool
+that builds upon the ideas developed for the Nix package manager by
+Dolstra ,(it [et al.]) ,(ref :bib '(dolstra2004:nix
+courtes2013:functional)).  The term “functional” means that software
+build processes are considered as ,(emph [pure functions]): given a set
+of inputs (compiler, libraries, build scripts, and so on), a package’s
+build function is assumed to always produce the same result.  Build
+results are stored in an immutable persistent data structure, the ,(emph
+[store]), implemented as a single directory, ,(tt [/gnu/store]).  Each
+entry in ,(tt [/gnu/store]) has a file name composed of the hash of all
+the build inputs used to produce it, followed by a symbolic name.  For
+example, ,(tt [/gnu/store/yr9rk90jf…-gcc-10.3.0]) identifies a specific
+build of GCC 10.3.  A variant of GCC 10.3, for instance one using
+different build options or different dependencies, would get a different
+hash.  Thus, each store file name uniquely identifies build results.
+This model is the foundation of ,(emph [end-to-end provenance
+tracking]): Guix records and uniquely identifies the inputs leading to
+build results available in ,(tt [/gnu/store]).])
+
+        (p [Guix, like Nix and unlike Debian or Fedora, is essentially a
 ,(emph [source-based distribution]): Guix package definitions describe
 how to build packages from source.  When running a command such as ,(tt
 [guix install gcc]), Guix proceeds as if it were to build GCC from
@@ -272,7 +322,7 @@ specifically for substitutes for ,(tt
 desired build output.  Substitutes are cryptographically signed by the
 server and Guix rejects substitutes not signed by one of the keys the
 user authorized.])
-      (p [To maximize chances that build processes actually look like
+        (p [To maximize chances that build processes actually look like
 pure functions, they are spawned in isolated build environments—Linux
 ,(emph [containers])—ensuring that only explicitly declared inputs are
 visible to the build process.  This, in turn, helps achieve bit-for-bit
@@ -287,8 +337,11 @@ provides makes verification clear and easy.  For example, 
the command
 locally and prints an error if the build result differs from that
 already available.  Likewise, ,(tt [guix challenge hello]) compares
 binaries of the ,(tt [hello]) package available locally with those
-provided by one or several substitute servers.])
-      (p [Are reproducible builds enough to guarantee that one can
+provided by one or several substitute servers.]))
+        
+      (section :title [Bootstrappable Builds]
+
+        (p [Are reproducible builds enough to guarantee that one can
 verify source-to-binary mappings?  In his Turing Award acceptance
 speech, Ken Thompson described a scenario whereby a legitimate-looking
 build process would produce a malicious binary ,(ref :bib
@@ -300,21 +353,51 @@ such that it emits malicious code when it recognizes 
specific patterns
 of source code.  This attack can be undetectable.  What makes such
 attacks possible is that users and distributions rely on opaque binaries
 at some level to “bootstrap” the entire package dependency graph.])
-      (p [In 2017, Nieuwenhuizen ,(it [et al.]) sought to address
+        
+        (p [GNU/Linux systems are built around the C language.  At the
+root of the package dependency graph, we have the GNU C Library (glibc),
+the GNU Compiler Collection (GCC), the GNU Binary Utilities (Binutils),
+and the GNU command-line utilities (Coreutils, grep, sed, Findutils,
+etc.)—all this written in C and C++.  How does one build the first GCC
+though?  Historically, distributions such as Debian would rely on
+previously-built binaries to build the new one: when GCC is upgraded, it
+is built using GCC as available in the previous version of the
+distribution.])
+        
+        (p [The functional build model does not allow us to “cheat”:
+the whole dependency graph has to be described and be self-contained.
+Thus, it must describe how the first GCC and C library are obtained.
+Initially, Guix would rely on of pre-built statically-linked binaries of
+GCC, Binutils, libc, and the other packages mentioned above to get
+started ,(ref :bib 'courtes2013:functional).  Even though these ,(emph
+[binary seeds]) were eventually built with Guix and thus reproducible
+and verifiable using the same Guix revision, they were just that: around
+250 MiB of opaque, non-auditable binaries.])
+
+        (p [In 2017, Nieuwenhuizen ,(it [et al.]) sought to address
 this forty-year-old problem at its root: by ensuring no opaque binaries
-appear in the package dependency graph—no less ,(ref :bib
+appear at the bottom of the package dependency graph—no less ,(ref :bib
 'janneke:mes-web).  To that end, Nieuwenhuizen developed GNU Mes, a
 small interpreter of the Scheme language written in C, capable enough to
 run MesCC, a non-optimizing C compiler.  That, coupled with other heroic
 efforts, led to a drastic reduction of the size of the opaque binaries
 at the root of the Guix package graph, well below what had been achieved
-so far ,(ref :bib 'janneke2020:bootstrap).  While many considered it
-unrealistic a few years earlier, the initial goal of building ,(emph
-[everything]) from source, starting from a small core and incrementally
-building more complex pieces of software, is now within reach ,(ref :bib
-'janneke2021:full-source-bootstrap).  This has the potential to thwart
-an entire class of software supply chain attacks that has been known but
-left unaddressed for forty years.]))
+so far ,(ref :bib '(janneke2020:bootstrap courant2022:ocamlboot)).
+While many considered it unrealistic a few years earlier, the initial
+goal of building ,(emph [everything]) from source, starting from a small
+core and incrementally building more complex pieces of software, is now
+within reach ,(ref :bib 'janneke2021:full-source-bootstrap).  This has
+the potential to thwart an entire class of software supply chain attacks
+that has been known but left unaddressed for forty years.])
+        
+        (p [Bootstrapping issues like these do not exist solely at the
+level of the C language; they show up in many compilers and occasionally
+in build systems too ,(ref :bib 'wurmus2022:bootstrappable-web).  Several
+of them were addressed in Guix for the first time: the Java development
+kit (JDK) is entirely built from source ,(ref :bib
+'wurmus2017:jdk-bootstrap), and so are the Rust ,(ref :bib
+'milosavljevic2018:rust-bootstrap) and OCaml compilers ,(ref :bib
+'courant2022:ocamlboot).])))  ;TODO: mention DDC?
    
    (chapter :title [Rationale] :ident "rationale"
       



reply via email to

[Prev in Thread] Current Thread [Next in Thread]