bug-standards
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Coding Standards, automake, and the recent xz-utils backdoor


From: Jacob Bachmeyer
Subject: Re: GNU Coding Standards, automake, and the recent xz-utils backdoor
Date: Mon, 01 Apr 2024 23:03:32 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.22) Gecko/20090807 MultiZilla/1.8.3.4e SeaMonkey/1.1.17 Mnenhy/0.7.6.0

Zack Weinberg wrote:
[...] but I do think there's a valid point here: the malicious xz
maintainer *might* have been caught earlier if they had committed the
build-to-host.m4 modification to xz's VCS.

That would require someone to notice that xz.git has a build-to-host.m4 that does not exist anywhere in the history of gnulib.git. That is a fairly complex scan, although it does look straightforward to implement. That said, the m4 files in Gnulib *are* Free Software, so having a modified version cannot itself raise too many concerns.

  (Or they might not have!
Witness the three (and counting) malicious patches that they barefacedly
submitted to *other* software and got accepted because the malice was
subtle enough to pass through code review.)

Exactly.  :-/

That said, the whole thing looks to me like the attackers were trying to /not/ hit the more (what is the best word?) "advanced" users---the backdoor would only be inserted if building distribution packages, and then only under dpkg or rpm, not other systems like Gentoo's Portage or in an unpackaged "./configure && make && sudo make install" build. This would, of course, hit the most widely used systems, including (reports are that the sock farm tried very hard to get Ubuntu to ship the crocked version in their upcoming release, but the freeze was upheld) the systems most commonly used by less technically-skilled users, but pointedly exclude systems that require greater skill to use---and whose users would be more likely to notice anything amiss and start tearing the system apart with the debugger. Unfortunately for Mr. Sockmaster, it turns out that some highly-skilled users *do* use the widely-used systems and the backdoor caused sshd to misbehave enough to draw suspicion. (Profiling reports that sshd is spending most of its time in liblzma---a library it has no reason to use---will tend to raise a few eyebrows. :-) )

[...]
Maybe the best revision to the GNU Coding Standards would be that releases should, if at all possible, contain only text? Any binary files needed for testing can be generated during "make check" if necessary

I don't think this is a good idea.  It's only a speed bump for someone
trying to smuggle malicious data into a package (think "base64 -d") and
it makes life substantially harder for honest authors of programs that
work with binary data, and authors of material whose "source code"
(as GPLv3 uses that term) *is* binary data.  Consider pngsuite, for
instance (http://www.schaik.com/pngsuite/) -- it would be a *ton* of
work to convert each of these test PNG files into GNU Poke scripts,
and probably the result would be *less* ergonomic for purposes of
improving the test suite.

That is a bad example because SNG (<URL:https://sng.sourceforge.net/>) exists precisely to provide a a text representation of PNG binary structures. (Admittedly, if I recall correctly, the contents of IDAT are simply a hexdump.)

While we are on the topic, this leaves the other obvious place to hide binary data: images used as part of the manual. There is a reason that I added the "if at all possible" caveat, and I am not certain that it is always possible.

I would like to suggest that a more useful policy would be "files
written to $prefix by 'make install' should not have any data
dependency on files labeled as part of the package's testsuite".
That doesn't constrain honest authors and it seems within the
scope of what the reproducible builds people could test for.
(Build the package, install to nonce prefix 1, unpack the tarball
again, delete the test suite, build again, install to prefix 2, compare.)
Of course a sufficiently determined malicious coder could detect
the reproducible-build test environment, but unlike "no binary data"
this is a substantial difficulty increment.

This could be a good idea. Another way to check this even without reproducible builds would be to ensure that the access timestamps on testsuite files do not change while "make" is processing the main sources. Checking this is slightly more invasive, since you would need to run a hook between processing top-level directories during the main build, but for packages using recursive Automake, you could simply run "make -C src" (or wherever the main sources are) and make sure that the testsuite files still have the same atime afterwards. I admit that this is harder to automate in general, but distribution packaging processes already have other metadata that is manually maintained, so identifying the source subtrees that yield the installable artifacts should not be difficult.

Now that I think about it, I suggest tightening that policy a bit further: "files produced by make in the source subtree (typically src/) shall have no data dependency on files outside of that tree"

I doubt anyone ever thought that recursive make could end up as security/verifiability feature. 8-|


-- Jacob



reply via email to

[Prev in Thread] Current Thread [Next in Thread]