[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor
From: |
Jacob Bachmeyer |
Subject: |
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor |
Date: |
Mon, 01 Apr 2024 22:51:57 -0500 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.22) Gecko/20090807 MultiZilla/1.8.3.4e SeaMonkey/1.1.17 Mnenhy/0.7.6.0 |
Jose E. Marchesi wrote:
Jose E. Marchesi wrote:
[...]
I agree that distcheck is good but not a cure all. Any static
system can be attacked when there is motive, and unit tests are
easily gamed.
The issue seems to be releases containing binary data for unit tests,
instead of source or scripts to generate that data. In this case,
that binary data was used to smuggle in heavily obfuscated object
code.
As a side note, GNU poke (https://jemarch.net/poke) is good for
generating arbitrarily complex binary data from clear textual
descriptions.
While it is suitable for that use, at last check poke is itself very
complex, complete with its own JIT-capable VM. This is good for
interactive use, but I get nervous about complexity in testsuites,
where simplicity can greatly aid debugging, and it /might/ be possible
to hide a backdoor similarly in a poke pickle. (This seems to be a
general problem with powerful interactive editors.)
Yes, I agree simplicity it is very desirable, in testsuites and actually
everywhere else. I also am not fond of dragging in dependencies.
Exactly---I am sure that poke is great for interactive use, but a
self-contained solution is probably better for a testsuite.
But I suppose we also agree in that it is not possible to assembly
non-trivial binary data structures in a simple way, without somehow
moving the complexity of the encoding into some sort of generator, which
will not be simple. The GDB testsuite, for example, ships with a DWARF
assembler written in around 3000 lines of Tcl. Sure, it is simpler than
poke and doesn't drag in additional dependencies. But it has to be
carefully maintained and kept up to date, and the complexity is there.
The problem for a compression tool testsuite is that compression formats
are (I believe) defined as byte-streams or bit-streams. Further, the
generator(s) must be able to produce /incorrect/ output as well, in
order to test error handling.
Further, GNU poke defines its own specialized programming language for
manipulating binary data. Supplying generator programs in C (or C++)
for binary test data in a package that itself uses C (or C++) ensures
that every developer with the skills to improve or debug the package
can also understand the testcase generators.
Here we will have to disagree.
IMO it is precisely the many and tricky details on properly marshaling
binary data in general-purpose programming languages that would have
greater odds to lead to difficult to understand, difficult to maintain
and possibly buggy or malicious encoders. The domain specific language
is here an advantage, not a liability.
This you need to do in C to encode and generate test data for a single
signed 32-bit NUMBER in an output file in a _more or less_ portable way:
void generate_testdata (off_t offset, int endian, int number)
{
int bin_flag = 0, fd;
#ifdef _WIN32
int bin_flag = O_BINARY;
#endif
fd = open ("testdata.bin", bin_flag, S_IWUSR);
if (fd == -1)
fatal ("error generating data.");
if (endian == BIG)
{
b[0] = (number >> 24) & 0xff;
b[1] = (number >> 16) & 0xff;
b[2] = (number >> 8) & 0xff;
b[3] = number & 0xff;
}
else
{
b[3] = (number >> 24) & 0xff;
b[2] = (number >> 16) & 0xff;
b[1] = (number >> 8) & 0xff;
b[0] = number & 0xff;
}
lseek (fd, offset, SEEK_SET);
for (i = 0; i < 4; ++i)
write (fd, &b[i], 1);
close (fd);
}
While that is a nice general solution, (aside from neglecting the
declaration "uint8_t b[4];"; with "int b[4];", the code would only work
on a little-endian processor; with no declaration, the compiler will
reject it) a compression format would be expected to define the
endianess of stored values, so the major branch in that function would
collapse to just one of its alternatives. Compression formats are
generally defined as streams, so a different decomposition of the
problem would likely make more sense: (example untested)
void emit_int32le (FILE * out, int value)
{
unsigned int R, i;
for (R = (unsigned int)value, i = 0; i < 4; R = R >> 8, i++)
if (fputc(R & 0xff, out) == EOF)
fatal("error writing int32le");
}
Other code handles opening OUT, or OUT is actually stdout and we are
writing down a pipe or the shell handled opening the file. (The main
function can easily check that stdout is not a terminal and bail out if
it is.) Remember that I am suggesting test generator programs, which do
not need to be as general as ordinary code, nor do they need the same
level of user-friendliness, since they are expected to be run from
scripts that encode the precise knowledge of how to call them. (That
this version is also probably more efficient by avoiding a syscall for
every byte written is irrelevant for its intended use.)
This is the Poke equivalent:
fun generate_testdata = (offset<uint<64>,B> off, int<32> endian, int<32>
number) void:
{
var fd = open ("testdata.bin");
set_endian (endian);
int<32> @ fd : off = number;
close (fd);
}
And thanks to the DSL, this scales nicely to more complex structures,
such as an ELF64 relocation instead of a signed 32-bit integer:
fun generate_testdata = (offset<uint<64>,B> off, int<32> endian, int<32>
number) void:
{
type Elf64_RelInfo =
struct Elf64_Xword
{
uint<32> r_sym;
uint<32> r_type;
};
type Elf64_Rela =
struct
{
offset<uint<64>,B> r_offset;
Elf64_RelInfo r_info;
offset<int<64>,B> r_addend;
};
var fd = open ("got32reloc.bin");
set_endian (endian);
Elf64_Rela @ 0#B
= Elf64_Rela { r_info = Elf64_RelInfo { r_sym = 0xff00, r_type = 3 } }
close (fd);
}
Indeed Poke's language is excellent for concisely describing complex
structures accurately, closely resembling C's syntax, but if I
understand correctly, compression tool test data has few complex
structures and is mostly byte-streams. Also, how did poke know to send
the structure to "got32reloc.bin"? Was it simply the
most-recently-opened file (or perhaps the top entry on a stack of
recently-opened files)? Were the OFF and NUMBER parameters
intentionally unused, accidentally unused, or somehow implicitly used in
the second example?
-- Jacob
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Jacob Bachmeyer, 2024/04/01
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Jose E. Marchesi, 2024/04/01
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor,
Jacob Bachmeyer <=
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Zack Weinberg, 2024/04/01
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Russ Allbery, 2024/04/01
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Zack Weinberg, 2024/04/01
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Eric Gallager, 2024/04/01
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Bruno Haible, 2024/04/01
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Jacob Bachmeyer, 2024/04/02
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Jacob Bachmeyer, 2024/04/02
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Russ Allbery, 2024/04/02
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Eric Gallager, 2024/04/02