guix-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

branch master updated: doc: Add comment to CISA-2023-0026-0001 on softwa


From: Ludovic Courtès
Subject: branch master updated: doc: Add comment to CISA-2023-0026-0001 on software identification.
Date: Mon, 05 Feb 2024 11:48:07 -0500

This is an automated email from the git hooks/post-receive script.

civodul pushed a commit to branch master
in repository maintenance.

The following commit(s) were added to refs/heads/master by this push:
     new 5274429  doc: Add comment to CISA-2023-0026-0001 on software 
identification.
5274429 is described below

commit 5274429f940408926cc71f38ba81c248f1bd1aee
Author: Ludovic Courtès <ludo@gnu.org>
AuthorDate: Mon Feb 5 16:21:14 2024 +0100

    doc: Add comment to CISA-2023-0026-0001 on software identification.
    
    * doc/cisa-2023-0026-0001: New directory.
---
 doc/cisa-2023-0026-0001/channels.scm            |  11 +
 doc/cisa-2023-0026-0001/cisa-2023-0026-0001.org | 268 ++++++++++++++++++++++++
 doc/cisa-2023-0026-0001/cisa-2023-0026-0001.pdf | Bin 0 -> 278782 bytes
 doc/cisa-2023-0026-0001/manifest.scm            |  17 ++
 4 files changed, 296 insertions(+)

diff --git a/doc/cisa-2023-0026-0001/channels.scm 
b/doc/cisa-2023-0026-0001/channels.scm
new file mode 100644
index 0000000..003e1e0
--- /dev/null
+++ b/doc/cisa-2023-0026-0001/channels.scm
@@ -0,0 +1,11 @@
+(list (channel
+        (name 'guix)
+        (url "https://git.savannah.gnu.org/git/guix.git";)
+        (branch #f)
+        (commit
+          "65dc2d40cb113382fb98796f1d04099f28cab355")
+        (introduction
+          (make-channel-introduction
+            "9edb3f66fd807b096b48283debdcddccfea34bad"
+            (openpgp-fingerprint
+              "BBB0 2DDF 2CEA F6A8 0D1D  E643 A2A0 6DF2 A33A 54FA")))))
diff --git a/doc/cisa-2023-0026-0001/cisa-2023-0026-0001.org 
b/doc/cisa-2023-0026-0001/cisa-2023-0026-0001.org
new file mode 100644
index 0000000..d6b51c9
--- /dev/null
+++ b/doc/cisa-2023-0026-0001/cisa-2023-0026-0001.org
@@ -0,0 +1,268 @@
+#+TITLE: Public Comment to CISA-2023-0026-0001
+#+AUTHOR: Maxim Cournoyer, Ludovic Courtès, Jan Nieuwenhuizen, Simon Tournier
+#+DATE: January 2024
+#+SUBTITLE: Perspective from Developers of GNU Guix
+#+STARTUP: content hidestars
+#+LANGUAGE: fr
+#+LATEX_CLASS: article
+#+LATEX_CLASS_OPTIONS: [letterpaper]
+#+LATEX_HEADER: \usepackage{xcolor}
+#+LATEX_HEADER: \usepackage[T1]{fontenc}
+#+LATEX_HEADER: \definecolor{darkblue}{rgb}{0.0, 0.0, 0.55}
+#+LATEX_HEADER: \definecolor{cobalt}{rgb}{0.0, 0.28, 0.67}
+#+LATEX_HEADER: \definecolor{coolblack}{rgb}{0.0, 0.18, 0.39}
+#+LATEX_HEADER: \usepackage{libertine}
+#+LATEX_HEADER: \usepackage{inconsolata}
+#+OPTIONS: toc:nil
+
+#+begin_quote
+This is the answer to a request for information from the Cybersecurity
+and Infrastructure Security Agency (CISA) identified as 
[[https://www.regulations.gov/document/CISA-2023-0026-0001][CISA–2023–0026]].
+#+end_quote
+
+#+latex: \vspace{15mm}
+
+#+latex: \noindent
+Dear CISA team,
+
+#+latex: \vspace{6mm}
+#+latex: \noindent
+Please find below our contribution to the work of CISA regarding the
+merits and challenges of the software identifier ecosystems as
+discussed in CISA’s 
[[https://www.cisa.gov/resources-tools/resources/software-identification-ecosystem-option-analysis][October
 2023 white paper]].
+
+* About the Authors
+
+This document was written by core developers of [[https://guix.gnu.org][GNU 
Guix]][fn:1:https://guix.gnu.org], a software
+project we believe provides useful insight for the software
+identification goals defined by CISA.
+
+Maxim Cournoyer (Canada) is currently co-maintainer of Guix, a long-time
+Guix developer, with years of experience developing free and open source
+software.
+
+Ludovic Courtès (France) is founder of Guix, Guix contributor and former
+[[https://nixos.org][Nix]] contributor, working as a research software 
engineer at Inria, the
+French research institute in computer science.
+
+Jan Nieuwenhuizen (The Netherlands) is founder of
+[[https://www.gnu.org/software/mes][GNU 
Mes]][fn:2:https://www.gnu.org/software/mes], leader of the
+full-source bootstrap effort discussed thereafter, recognized for his
+many contributions to free software over more than twenty years.
+
+Simon Tournier (France) is a long-time contributor to Guix, leading
+integration with [[https://www.softwareheritage.org][Software
+Heritage]][fn:3:https://www.softwareheritage.org], working as a research
+software engineer at Université Paris-Cité.
+
+* About GNU Guix
+
+The authors draw their experience from the design and development of
+[[https://guix.gnu.org][GNU Guix]], a package manager, software deployment 
tool, and GNU/Linux
+distribution.  Guix today is the fifth largest Linux distribution
+according to [[https://repology.org][Repology]][fn:4:https://repology.org].  
Since its inception in 2012, it has received
+source code contributions from almost 1,000 people.
+
+* On Software Identification
+
+The /Software Identification Ecosystem Option Analysis/ white paper
+released by CISA in October 2023 studies options towards the definition
+of /a software identification ecosystem that can be used across the
+complete, global software space for all key cybersecurity use cases/.
+
+Our experience lies in the design and development of 
[[https://guix.gnu.org][GNU Guix]], a package
+manager, software deployment tool, and GNU/Linux distribution, which
+emphasizes three key elements: *reproducibility, provenance tracking,
+and auditability*.  We explain in the following sections our approach
+and how it relates to the goal stated in the aforementioned white paper.
+
+Guix produces binary artifacts of varying complexity from source code:
+package binaries, application bundles (container images to be consumed
+by Docker and related tools), system installations, system bundles
+(container and virtual machine images).
+
+All these artifacts qualify as “software” and so does source code.  Some
+of this “software” comes from well-identified upstream packages,
+sometimes with modifications added downstream by packagers (patches); binary
+artifacts themselves are the byproduct of a build process where the
+package manager uses /other/ binary artifacts it previously built
+(compilers, libraries, etc.) along with more source code (the package
+definition) to build them.  How can one identify “software” in that
+sense?
+
+Software is dual: it exists in /source/ form and in /binary/,
+machine-executable form.  The latter is the outcome of a complex
+computational process taking source code and intermediary binaries as
+input.
+
+Our thesis can be summarized as follows:
+
+#+begin_quote
+*We consider that the requirements for source code identifiers differ
+ from the requirements to identify binary artifacts.*
+
+Our view, embodied in GNU Guix, is that:
+
+  1. *Source code* can be identified in an unambiguous and distributed
+     fashion through /inherent identifiers/ such as cryptographic
+     hashes.
+
+  2. *Binary artifacts*, instead, need to be the byproduct of a
+     /comprehensive and verifiable build process itself available as
+     source code/.
+#+end_quote
+
+In the next sections, to clarify the context of this statement, we show
+how Guix identifies source code, how it defines the /source-to-binary/
+path and ensures its verifiability, and how it provides provenance
+tracking.
+
+* Source Code Identification
+
+Guix includes 
[[https://guix.gnu.org/manual/en/html_node/Defining-Packages.html][package 
definitions]][fn:5:https://guix.gnu.org/manual/en/html_node/Defining-Packages.html]
 for almost 30,000 packages.  Each
+package definition identifies its 
[[https://guix.gnu.org/manual/en/html_node/origin-Reference.html][origin]][fn:6:https://guix.gnu.org/manual/en/html_node/origin-Reference.html]—its
 “main” source code as well
+as patches.  The origin is *content-addressed*: it includes a SHA256
+cryptographic hash of the code (an /inherent identifier/), along with a
+primary URL to download it.
+
+Since source is content-addressed, the URL can be thought of as a hint.
+Indeed, *we connected Guix to the [[https://www.softwareheritage.org][Software 
Heritage]] source code
+archive*: when source code vanishes from its original URL, Guix falls
+back to downloading it from the archive.  This is made possible thanks
+to the use of inherent (or intrinsic) identifiers both by Guix and
+Software Heritage.
+
+More information can be found 
[[https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/][2019
 blog 
post]][fn:7:https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/]
 and in the documents of the
+[[https://www.swhid.org/][Software Hash Identifiers 
(SWHID)]][fn:8:https://www.swhid.org/] working group.
+
+* Reproducible Builds
+
+Guix provides a *verifiable path from source code to binaries* by
+ensuring [[https://reproducible-builds.org][reproducible builds]].  To achieve 
that, Guix builds upon the
+pioneering research work of Eelco Dolstra that led to the design of the
+[[https://nixos.org][Nix package manager]], with which it shares the same 
conceptual
+foundation.
+
+Namely, Guix relies on /hermetic builds/: builds are performed in
+isolated environments that contain nothing but explicitly-declared
+dependencies—where a “dependency” can be the output of another build
+process or source code, including build scripts and patches.
+
+An implication is that *builds can be verified independently*.  For
+instance, for a given version of Guix, =guix build gcc= should produce
+the exact same binary, bit-for-bit.  To facilitate independent
+verification, =guix challenge gcc= compares the binary artifacts of the
+GNU Compiler Collection (GCC) as built and published by different
+parties.  Users can also compare to a local build with =guix build gcc
+--check=.
+
+As with Nix, build processes are identified by /derivations/, which are
+low-level, content-addressed build instructions; derivations may refer
+to other derivations and to source code.  For instance,
+=/gnu/store/c9fqrmabz5nrm2arqqg4ha8jzmv0kc2f-gcc-11.3.0.drv= uniquely
+identifies the derivation to build a specific variant of version 11.3.0
+of the GNU Compiler Collection (GCC).  Changing the package
+definition—patches being applied, build flags, set of dependencies—, or
+similarly changing one of the packages it depends on, leads to a
+different derivation (more information can be found in 
[[https://edolstra.github.io/pubs/phd-thesis.pdf][Eelco Dolstra’s
+PhD thesis]]).
+
+Derivations form a graph that *captures the entirety of the build
+processes leading to a binary artifact*.  In contrast, mere package
+name/version pairs such as =gcc 11.3.0= fail to capture the breadth and
+depth elements that lead to a binary artifact.  This is a shortcoming of
+systems such as the *Common Platform Enumeration* (CPE) standard: it
+fails to express whether a vulnerability that applies to =gcc 11.3.0=
+applies to it regardless of how it was built, patched, and configured,
+or whether certain conditions are required.
+
+* Full-Source Bootstrap
+
+Reproducible builds alone cannot ensure the source-to-binary
+correspondence: the compiler could contain a backdoor, as demonstrated
+by Ken Thompson in /Reflections on Trusting Trust/.  To address that,
+Guix goes further by implementing so-called *full-source bootstrap*: for
+the first time, literally every package in the distribution is built
+from source code, 
[[https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down/][starting
 from a very small binary
+seed]][fn:9:https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down/].
+This gives an unprecedented level of transparency, allowing code to be
+audited at all levels, and improving robustness against the
+“trusting-trust attack” described by Ken Thompson.
+
+The European Union recognized the importance of this work through an
+[[https://nlnet.nl/project/GNUMes-fullsource/][NLnet Privacy & Trust Enhancing 
Technologies (NGI0 PET)
+grant]][fn:13:https://nlnet.nl/project/GNUMes-fullsource/] allocated in
+2021 to Jan Nieuwenhuizen to further work on full-source bootstrap in
+GNU Guix, GNU Mes, and related projects, followed by 
[[https://nlnet.nl/project/GNUMes-ARM_RISC-V/][another grant]] in
+2022 to expand support to the Arm and RISC-V CPU architectures.
+
+* Provenance Tracking
+
+We define provenance tracking as the ability *to map a binary artifact
+back to its complete corresponding source*.  Provenance tracking is
+necessary to allow the recipient of a binary artifact to access the
+corresponding source code and to verify the source/binary correspondence
+if they wish to do so.
+
+The [[https://guix.gnu.org/manual/en/html_node/Invoking-guix-pack.html][=guix 
pack=]] command can be used to build, for instance, containers
+images.  Running =guix pack -f docker python --save-provenance= produces
+a /self-describing Docker image/ containing the binaries of Python and
+its run-time dependencies.  The image is self-describing because
+=--save-provenance= flag leads to the inclusion of a /manifest/ that
+describes which revision of Guix was used to produce this binary.  A
+third party can retrieve this revision of Guix and from there view the
+entire build dependency graph of Python, view its source code and any patches
+that were applied, and recursively for its dependencies.
+
+To summarize, capturing the revision of Guix that was used is all it
+takes to /reproduce/ a specific binary artifact.  This is illustrated by
+[[https://guix.gnu.org/manual/en/html_node/Invoking-guix-time_002dmachine.html][the
+=time-machine=
+command]][fn:11:https://guix.gnu.org/manual/en/html_node/Invoking-guix-time_002dmachine.html].
+The example below deploys, /at any time on any machine/, the specific
+build artifact of the =python= package as it was defined in this Guix
+commit:
+
+#+begin_example
+guix time-machine -q --commit=d3c3922a8f5d50855165941e19a204d32469006f \
+  -- install python
+#+end_example
+
+#+latex: \noindent
+In other words, because Guix itself defines how artifacts are built,
+**the revision of the Guix source coupled with the package name
+unambiguously identify the package's binary artifact**.  As scientists,
+we build on this property to achieve reproducible research workflows, as
+explained in this [[https://doi.org/10.1038/s41597-022-01720-9][2022 article 
in /Nature/]][fn:12:/Toward practical
+transparent verifiable and long-term reproducible research using Guix/,
+https://doi.org/10.1038/s41597-022-01720-9]; as engineers, we value this
+property to analyze the systems we are running and determine which known
+vulnerabilities and bugs apply.
+
+Again, a software bill of materials (SBOM) written as a mere list of
+package name/version pairs would fail to capture as much information.
+The *Artifact Dependency Graph (ADG) of OmniBOR*, while less ambiguous,
+falls short in two ways: it is too fine-grained for typical cybersecurity
+applications (at the level of individual source files), and it only
+captures the alleged source/binary correspondence of individual files
+but not the process to go from source to binary.
+
+* Conclusions
+
+Inherent identifiers lend themselves well to unambiguous source code
+identification, as demonstrated by Software Heritage, Guix, and Nix.
+
+However, we believe binary artifacts should instead be treated as the
+result of a computational process; it is that process that needs to be
+fully captured to support *independent verification of the source/binary
+correspondence*.  For cybersecurity purposes, recipients of a binary
+artifact must be able to be map it back to its source code (/provenance
+tracking/), with the additional guarantee that they must be able to
+reproduce the entire build process to verify the source/binary
+correspondence (/reproducible builds and full-source bootstrap/).  As
+long as binary artifacts result from a reproducible build process,
+itself described as source code, *identifying binary artifacts boils
+down to identifying the source code of their build process*.
+
+These ideas are developed in the 2022 scientific paper 
[[https://doi.org/10.22152/programming-journal.org/2023/7/1][/Building a
+Secure Software Supply Chain with GNU 
Guix/]][fn:10:https://doi.org/10.22152/programming-journal.org/2023/7/1].
diff --git a/doc/cisa-2023-0026-0001/cisa-2023-0026-0001.pdf 
b/doc/cisa-2023-0026-0001/cisa-2023-0026-0001.pdf
new file mode 100644
index 0000000..2eef0da
Binary files /dev/null and b/doc/cisa-2023-0026-0001/cisa-2023-0026-0001.pdf 
differ
diff --git a/doc/cisa-2023-0026-0001/manifest.scm 
b/doc/cisa-2023-0026-0001/manifest.scm
new file mode 100644
index 0000000..b672d7d
--- /dev/null
+++ b/doc/cisa-2023-0026-0001/manifest.scm
@@ -0,0 +1,17 @@
+;; Manifest for Org-generated LaTeX.
+
+(specifications->manifest
+ '("rubber"
+
+   "texlive-scheme-basic"
+   "texlive-collection-latexrecommended"
+   "texlive-collection-fontsrecommended"
+
+   "texlive-libertine"
+   "texlive-inconsolata"
+
+   "texlive-wrapfig"
+   "texlive-ulem"
+   "texlive-capt-of"
+   "texlive-hyperref"
+   "texlive-upquote"))



reply via email to

[Prev in Thread] Current Thread [Next in Thread]