|
From: | Jacob Bachmeyer |
Subject: | Re: detecting modified m4 files (was: GNU Coding Standards, automake, and the recent xz-utils backdoor) |
Date: | Sun, 07 Apr 2024 19:26:38 -0500 |
User-agent: | Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.22) Gecko/20090807 MultiZilla/1.8.3.4e SeaMonkey/1.1.17 Mnenhy/0.7.6.0 |
Bruno Haible wrote:
Richard Stallman commented on Jacob Bachmeyer's idea:> > Another related check that /would/ have caught this attempt would be > > comparing the aclocal m4 files in a release against their (meta)upstream > > sources before building a package. This is something distribution > > maintainers could do without cooperation from upstream. If > > m4/build-to-host.m4 had been recognized as coming from gnulib and > > compared to the copy in gnulib, the nonempty diff would have been > > suspicious.I have a hunch that some effort is needed to do that comparison, but that it is feasible to write a script to do it could make it easy. Is that so?Yes, the technical side of such a comparison is relatively easy to implement: - There are less than about 2000 or 5000 *.m4 files that are shared between projects. Downloading and storing all historical versions of these files will take ca. 0.1 to 1 GB. - They would be stored in a content-based index, i.e. indexed by sha256 hash code. - A distribution could then quickly test whether a *.m4 file found in a distrib tarball is "known". The recurrently time-consuming part is, whenever an "unknown" *.m4 file appears, to - manually review it, - update the list of upstream git repositories (e.g. when a project has been forked) or the list of releases to consider (e.g. snapshots of GNU Autoconf or GNU libtool, or distribution-specific modifications). I agree with Jacob that a distro can put this in place, without needing to bother upstream developers.
I have since thought of a simple solution that /would/ have caught this backdoor campaign in its tracks: an "autopoint --check" command that simply compares the m4/ files (and possibly others?) that autopoint would copy in if m4/ were empty against the files that would be copied and reports any differences. A newer serial in the package tree than the system m4 library produces a minor complaint; a file with the same serial and different contents produces a major complaint. An older serial in the package tree should be reported, but is likely to be of no consequence if a distribution's packaging routine will copy in the known-good newer version before rebuilding configure. Any m4/ files local to the package are simply reported, but those are also in the package's Git repository.
Distribution package maintainers would run "autopoint --check" and pass any suspicious files to upstream maintainers for evaluation. (The distribution's own packaging system can trace an m4 file in the system library came to its upstream package.) The modified build-to-host.m4 would have been very /unlikely/ to slip past the gnulib/gettext/Automake/Autoconf maintainers, although few distribution packagers would have had suspicions. The gnulib maintainers would know that gl_BUIILD_TO_HOST should not be checking /anything/ itself and the crackers would have been caught.
This should be effective in closing off a large swath of possible attacks: a backdoor concealed in binary test data (or documentation) requires some visible means to unpack it, which means the unpacker must appear in source somewhere. While the average package maintainer might not be able to make sense of a novel m4 file, the maintainers of GNU's version of that file /will/ be able to recognize such chicanery, and the "red herrings" the cracker added for obfuscation would become a liability. Without them, the effect of the new code is more obvious, so the crackers lose either way.
-- Jacob
[Prev in Thread] | Current Thread | [Next in Thread] |