guix-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

branch master updated: website: Add draft of a post on the ld.so cache.


From: Ludovic Courtčs
Subject: branch master updated: website: Add draft of a post on the ld.so cache.
Date: Sun, 01 Aug 2021 12:23:04 -0400

This is an automated email from the git hooks/post-receive script.

civodul pushed a commit to branch master
in repository guix-artwork.

The following commit(s) were added to refs/heads/master by this push:
     new 5d3ab51  website: Add draft of a post on the ld.so cache.
5d3ab51 is described below

commit 5d3ab51e0d5bd023b7965ddb5b885bb7808a067d
Author: Ludovic Courtès <ludo@gnu.org>
AuthorDate: Sun Aug 1 18:22:16 2021 +0200

    website: Add draft of a post on the ld.so cache.
    
    * website/drafts/ld-so-cache.md: New file.
---
 website/drafts/ld-so-cache.md | 285 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 285 insertions(+)

diff --git a/website/drafts/ld-so-cache.md b/website/drafts/ld-so-cache.md
new file mode 100644
index 0000000..4b58f00
--- /dev/null
+++ b/website/drafts/ld-so-cache.md
@@ -0,0 +1,285 @@
+title: Taming the stat storm with a loader cache
+author: Ludovic Courtès
+tags: Scheme API, Performance
+date: 2021-08-02 14:00:00
+---
+
+It was one of these days where some of us on IRC were rehashing that old
+problem—that application startup in Guix causes a
+“[`stat`](https://linux.die.net/man/2/stat) storm”—and lamenting the
+lack of a solution when suddenly, Ricardo
+[proposes](https://logs.guix.gnu.org/guix/2020-11-24.log#183934) what,
+in hindsight, looks like an obvious solution: “maybe we could use a
+per-application ld cache?”.  A moment where collective thinking exceeds
+the sum of our individual thoughts.  The result is one of the many
+features that made it in the `core-updates` branch, slated to be merged
+in the coming weeks, one that reduces application startup time.
+
+# ELF files and their dependencies
+
+Before going into detail, let’s look at what those “`stat` storms” look
+like and where they come from.  Loading an
+[ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format)
+executable involves loading the shared libraries (the `.so` files, for
+“shared objects”) it depends on, recursively.  This is the job of the
+*loader* (or *dynamic linker*), `ld.so`, which is part of the GNU C
+Library (glibc) package.  What shared libraries an executable like that
+of Emacs depends on?  The `ldd` command answers that question:
+
+```
+$ ldd $(type -P .emacs-27.2-real)
+        linux-vdso.so.1 (0x00007fff565bb000)
+        libtiff.so.5 => 
/gnu/store/l1wwr5c34593gqxvp34qbwdkaf7xhdbd-libtiff-4.2.0/lib/libtiff.so.5 
(0x00007fd5aa2b1000)
+        libjpeg.so.62 => 
/gnu/store/5khkwz9g6vza1n4z8xlmdrwhazz7m8wp-libjpeg-turbo-2.0.5/lib/libjpeg.so.62
 (0x00007fd5aa219000)
+        libpng16.so.16 => 
/gnu/store/3x2kak8abb6z2klch72kfff2qxzv00pj-libpng-1.6.37/lib/libpng16.so.16 
(0x00007fd5aa1e4000)
+        libz.so.1 => 
/gnu/store/rykm237xkmq7rl1p0nwass01p090p88x-zlib-1.2.11/lib/libz.so.1 
(0x00007fd5aa1c2000)
+        libgif.so.7 => 
/gnu/store/bpw826hypzlnl4gr6d0v8m63dd0k8waw-giflib-5.2.1/lib/libgif.so.7 
(0x00007fd5aa1b8000)
+        libXpm.so.4 => 
/gnu/store/jgdsl6whyimkz4hxsp2vrl77338kpl0i-libxpm-3.5.13/lib/libXpm.so.4 
(0x00007fd5aa1a4000)
+[…]
+$ ldd $(type -P .emacs-27.2-real) | wc -l
+89
+```
+
+(If you’re wondering why we’re looking at `.emacs-27.2-real` rather than
+`emacs-27.2`, it’s because in Guix the latter is a tiny shell wrapper
+around the former.)
+
+To load a graphical program like Emacs, the loader needs to load more
+than 80 shared libraries!  Each is in its own `/gnu/store` sub-directory
+in Guix, one directory per package.
+
+But how does `ld.so` know where to find these libraries in the first
+place?  In Guix, during the link phase that produces an ELF file
+(executable or shared library), we tell the
+[linker](https://en.wikipedia.org/wiki/Linker_%28computing%29) to
+populate the `RUNPATH` entry of the ELF file with the list of
+directories where its dependencies may be found.  This is done by
+passing
+[`-rpath`](https://sourceware.org/binutils/docs/ld/Options.html#index-_002drpath_003ddir)
+options to the linker, which Guix’s [“linker
+wrapper”](https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/ld-wrapper.in)
+takes care of.  The `RUNPATH` is the *run-time library search path*:
+it’s a colon-separated list of directories where `ld.so` will look for
+shared libraries when it loads an ELF file.  We can look at the
+`RUNPATH` of our Emacs executable like this:
+
+```
+$ objdump -x $(type -P .emacs-27.2-real) | grep RUNPATH
+  RUNPATH              
/gnu/store/fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31/lib:/gnu/store/01b4w3m6mp55y531kyi1g8shh722kwqm-gcc-7.5.0-lib/lib:/gnu/store/l1wwr5c34593gqxvp34qbwdkaf7xhdbd-libtiff-4.2.0/lib:/gnu/store/5khkwz9g6vza1n4z8xlmdrwhazz7m8wp-libjpeg-turbo-2.0.5/lib:[…]
+```
+
+This `RUNPATH` has 39 entries, which roughly corresponds to the number
+of direct dependencies of the executable—dependencies are listed as
+`NEEDED` entries in the ELF file:
+
+```
+$ objdump -x $(type -P .emacs-27.2-real) | grep NEED | head
+  NEEDED               libtiff.so.5
+  NEEDED               libjpeg.so.62
+  NEEDED               libpng16.so.16
+  NEEDED               libz.so.1
+  NEEDED               libgif.so.7
+  NEEDED               libXpm.so.4
+  NEEDED               libgtk-3.so.0
+  NEEDED               libgdk-3.so.0
+  NEEDED               libpangocairo-1.0.so.0
+  NEEDED               libpango-1.0.so.0
+$ objdump -x $(type -P .emacs-27.2-real) | grep NEED | wc -l
+52
+```
+
+(Some of these `.so` files live in the same directory, which is why
+there are more `NEEDED` entries than directories in the `RUNPATH`.)
+
+A system such as Debian that follows the [file system hierarchy
+standard](https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard)
+(FHS), where all libraries are in `/lib` or `/usr/lib`, does not have to
+bother with `RUNPATH`: all `.so` files are known to be found in one of
+these two “standard” locations.  Anyway, let’s get back to our initial
+topic: the “`stat` storm”.
+
+# Walking search paths
+
+As you can guess, when we run Emacs, the loader first needs to locate
+and load the 80+ shared libraries it depends on.  That’s where things
+get pretty inefficient: the loader will search each `.so` file Emacs
+depends on in one of the 39 directories listed in its `RUNPATH`.
+Likewise, when it finally finds `libgtk-3.so`, it’ll look for its
+dependencies in each of the directories in its `RUNPATH`.  We can see
+that at play by tracing system calls with the
+[`strace`](https://strace.io/) command:
+
+```
+$ strace -c emacs --version
+GNU Emacs 27.2
+Copyright (C) 2021 Free Software Foundation, Inc.
+GNU Emacs comes with ABSOLUTELY NO WARRANTY.
+You may redistribute copies of GNU Emacs
+under the terms of the GNU General Public License.
+For more information about these matters, see the file named COPYING.
+% time     seconds  usecs/call     calls    errors syscall
+------ ----------- ----------- --------- --------- ----------------
+ 55.46    0.006629           3      1851      1742 openat
+ 16.06    0.001919           4       422           mmap
+ 11.46    0.001370           2       501       477 stat
+  4.79    0.000573           4       122           mprotect
+  3.84    0.000459           4       111           read
+  2.45    0.000293           2       109           fstat
+  2.34    0.000280           2       111           close
+[…]
+------ ----------- ----------- --------- --------- ----------------
+100.00    0.011952           3      3325      2227 total
+```
+
+For this simple `emacs --version` command, the loader and `emacs` probed
+for more than 1,300 files, with the
+[`openat`](https://linux.die.net/man/2/openat) and
+[`stat`](https://linux.die.net/man/2/stat) system calls, and more of
+these probes were unsuccessful (counted as “errors” here, meaning that
+the call returned an error).  The fraction of “erroneous” system calls
+is no less than 67% (2,227 over 3,325).  We can see the desperate search
+of `.so` files by looking at individual calls:
+
+```
+$ strace -e openat,stat emacs --version
+[…]
+openat(AT_FDCWD, 
"/gnu/store/fa6wj5bxkj5ll1d7292a70knmyl7a0cr-glibc-2.31/lib/libpng16.so.16", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
+openat(AT_FDCWD, 
"/gnu/store/01b4w3m6mp55y531kyi1g8shh722kwqm-gcc-7.5.0-lib/lib/libpng16.so.16", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
+openat(AT_FDCWD, 
"/gnu/store/l1wwr5c34593gqxvp34qbwdkaf7xhdbd-libtiff-4.2.0/lib/libpng16.so.16", 
O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
+openat(AT_FDCWD, 
"/gnu/store/5khkwz9g6vza1n4z8xlmdrwhazz7m8wp-libjpeg-turbo-2.0.5/lib/libpng16.so.16",
 O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
+openat(AT_FDCWD, 
"/gnu/store/3x2kak8abb6z2klch72kfff2qxzv00pj-libpng-1.6.37/lib/tls/haswell/x86_64/libpng16.so.16",
 O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
+stat("/gnu/store/3x2kak8abb6z2klch72kfff2qxzv00pj-libpng-1.6.37/lib/tls/haswell/x86_64",
 0x7ffe428a1c70) = -1 ENOENT (No such file or directory)
+openat(AT_FDCWD, 
"/gnu/store/3x2kak8abb6z2klch72kfff2qxzv00pj-libpng-1.6.37/lib/tls/haswell/libpng16.so.16",
 O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
+stat("/gnu/store/3x2kak8abb6z2klch72kfff2qxzv00pj-libpng-1.6.37/lib/tls/haswell",
 0x7ffe428a1c70) = -1 ENOENT (No such file or directory)
+openat(AT_FDCWD, 
"/gnu/store/3x2kak8abb6z2klch72kfff2qxzv00pj-libpng-1.6.37/lib/tls/x86_64/libpng16.so.16",
 O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
+stat("/gnu/store/3x2kak8abb6z2klch72kfff2qxzv00pj-libpng-1.6.37/lib/tls/x86_64",
 0x7ffe428a1c70) = -1 ENOENT (No such file or directory)
+openat(AT_FDCWD, 
"/gnu/store/3x2kak8abb6z2klch72kfff2qxzv00pj-libpng-1.6.37/lib/tls/libpng16.so.16",
 O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
+stat("/gnu/store/3x2kak8abb6z2klch72kfff2qxzv00pj-libpng-1.6.37/lib/tls", 
0x7ffe428a1c70) = -1 ENOENT (No such file or directory)
+openat(AT_FDCWD, 
"/gnu/store/3x2kak8abb6z2klch72kfff2qxzv00pj-libpng-1.6.37/lib/haswell/x86_64/libpng16.so.16",
 O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
+stat("/gnu/store/3x2kak8abb6z2klch72kfff2qxzv00pj-libpng-1.6.37/lib/haswell/x86_64",
 0x7ffe428a1c70) = -1 ENOENT (No such file or directory)
+openat(AT_FDCWD, 
"/gnu/store/3x2kak8abb6z2klch72kfff2qxzv00pj-libpng-1.6.37/lib/haswell/libpng16.so.16",
 O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
+stat("/gnu/store/3x2kak8abb6z2klch72kfff2qxzv00pj-libpng-1.6.37/lib/haswell", 
0x7ffe428a1c70) = -1 ENOENT (No such file or directory)
+openat(AT_FDCWD, 
"/gnu/store/3x2kak8abb6z2klch72kfff2qxzv00pj-libpng-1.6.37/lib/x86_64/libpng16.so.16",
 O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
+stat("/gnu/store/3x2kak8abb6z2klch72kfff2qxzv00pj-libpng-1.6.37/lib/x86_64", 
0x7ffe428a1c70) = -1 ENOENT (No such file or directory)
+openat(AT_FDCWD, 
"/gnu/store/3x2kak8abb6z2klch72kfff2qxzv00pj-libpng-1.6.37/lib/libpng16.so.16", 
O_RDONLY|O_CLOEXEC) = 3
+[…]
+```
+
+Above is the sequence where we see `ld.so` look for `libpng16.so.16`,
+searching in locations where we *know* it’s not going to find it.  A bit
+ridiculous.  How does this affect performance?  The impact is small in
+the most favorable case—on a hot cache, with fast solid state device
+(SSD) storage.  But it likely has a visible effect in other cases—on a
+cold cache, with a slower spinning hard disk drive (HDD), on a network
+file system (NFS).
+
+# Enter the per-package loader cache
+
+The idea that Ricardo submitted, using a loader cache, makes a lot of
+sense: we know from the start that `libpng.so` may only be found in
+`/gnu/store/…-libpng-1.6.37`, no need to look elsewhere.  In fact, it’s
+not new: glibc has had such a cache “forever”; it’s the
+`/etc/ld.so.cache` file you can see on FHS distros and which is
+typically created by running
+[`ldconfig`](https://linux.die.net/man/8/ldconfig) when a package has
+been installed.  Roughly, the cache maps library `SONAME`s, such as
+`libpng16.so.16`, to their file name on disk, say
+`/usr/lib/libpng16.so.16`.
+
+The problem is that this cache is inherently system-wide: it assumes
+that there is only *one* `libpng16.so` on the system; any binary that
+depends on `libpng16.so` will load it from its one and only location.
+This models perfectly matches the FHS, but it’s at odds with the
+flexibility offered by Guix, where several variants or versions of the
+library can coexist on the system, used by different applications.
+That’s the reason why Guix and other non-FHS distros such as NixOS or
+GoboLinux typically [turn
+off](https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/base.scm#n716?id=a92dfbce30777de6ca05031e275410cf9f56c84c)
+that feature altogether… and pay the cost of those `stat` storms.
+
+The insight we gained on that Tuesday evening IRC conversation is that
+we could *adapt* glibc’s loader cache to our setting: instead of a
+system-wide cache, we’d have a *per-application loader cache*.  As one
+of the last package [build
+phases](https://guix.gnu.org/manual/en/html_node/Build-Phases.html),
+we’d run `ldconfig` to create `etc/ld.so.cache` within that package’s
+`/gnu/store` sub-directory.  We then need to modify the loader so it
+would look for `${ORIGIN}/../etc/ld.so.cache` instead of
+`/etc/ld.so.cache`, where `${ORIGIN}` is the location of the ELF file
+being loaded.  A discussion of these changes is [in the issue
+tracker](https://issues.guix.gnu.org/44899); you can see [the glibc
+patch](https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/patches/glibc-dl-cache.patch?h=core-updates&id=0236013cd0fc86ff4a042885c735e3f36a7f5c25)
+and the new [`make-dynamic-linker-cache` build
+phase](https://git.savannah.gnu.org/cgit/guix.git/tree/guix/build/gnu-build-system.scm?h=core-updates&id=0236013cd0fc86ff4a042885c735e3f36a7f5c25#n735).
+
+How does this play out in practice?  Let’s try an `emacs` build that
+uses this new loader cache:
+
+```
+$ strace -c /gnu/store/ijgcbf790z4x2mkjx2ha893hhmqrj29j-emacs-27.2/bin/emacs 
--version
+GNU Emacs 27.2
+Copyright (C) 2021 Free Software Foundation, Inc.
+GNU Emacs comes with ABSOLUTELY NO WARRANTY.
+You may redistribute copies of GNU Emacs
+under the terms of the GNU General Public License.
+For more information about these matters, see the file named COPYING.
+% time     seconds  usecs/call     calls    errors syscall
+------ ----------- ----------- --------- --------- ----------------
+ 28.68    0.002909          26       110        13 openat
+ 25.13    0.002549          26        96           read
+ 20.41    0.002070           4       418           mmap
+  9.34    0.000947          10        90           pread64
+  6.60    0.000669           5       123           mprotect
+  4.12    0.000418           3       107         1 newfstatat
+  2.19    0.000222           2        99           close
+[…]
+------ ----------- ----------- --------- --------- ----------------
+100.00    0.010144           8      1128        24 total
+```
+
+Compared to what we have above, the total number of system calls has
+been divided by 3, and the fraction of erroneous system calls goes from
+67% to 0.2%.  Quite a difference!  We count on you, dear users, to [let
+us know](https://guix.gnu.org/en/contact) how this impacts load time for
+you.
+
+# Flexibility without `stat` storms
+
+With [GNU Stow](https://www.gnu.org/software/stow) in the 1990s, and
+then Nix, Guix, and other distros, the benefits of flexible file layouts
+rather than the rigid Unix-inherited FHS have been demonstrated—nowadays
+I see it as an antidote to opaque and bloated application bundles Ă  la
+Docker.  Luckily, few of our system tools have FHS assumptions baked in,
+probably in large part thanks to GNU’s insistence on a [rigorous
+installation directory
+categorization](https://www.gnu.org/prep/standards/html_node/Directory-Variables.html)
+in the early days rather than hard-coded directory names.  The loader
+cache is one of the few exceptions.  Adapting it to a non-FHS context is
+fruitful for Guix and for the other distros and packaging tools in a
+similar situation; perhaps it could become an option in glibc proper?
+
+This is not the end of `stat` storms, though.  Interpreters and language
+run-time systems rely on search paths—`GUILE_LOAD_PATH` for Guile,
+`PYTHONPATH` for Python, `OCAMLPATH` for OCaml, etc.—and are equally
+prone to stormy application startups.  Unlike ELF, they do not have a
+mechanism akin to `RUNPATH`, let alone a run-time search path cache.  We
+have yet to find ways to address these.
+
+
+#### About GNU Guix
+
+[GNU Guix](https://guix.gnu.org) is a transactional package manager and
+an advanced distribution of the GNU system that [respects user
+freedom](https://www.gnu.org/distros/free-system-distribution-guidelines.html).
+Guix can be used on top of any system running the Hurd or the Linux
+kernel, or it can be used as a standalone operating system distribution
+for i686, x86_64, ARMv7, AArch64 and POWER9 machines.
+
+In addition to standard package management features, Guix supports
+transactional upgrades and roll-backs, unprivileged package management,
+per-user profiles, and garbage collection.  When used as a standalone
+GNU/Linux distribution, Guix offers a declarative, stateless approach to
+operating system configuration management.  Guix is highly customizable
+and hackable through [Guile](https://www.gnu.org/software/guile)
+programming interfaces and extensions to the
+[Scheme](http://schemers.org) language.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]