[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor
From: |
Jose E. Marchesi |
Subject: |
Re: GNU Coding Standards, automake, and the recent xz-utils backdoor |
Date: |
Tue, 02 Apr 2024 10:38:26 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) |
> Jose E. Marchesi wrote:
>>> Jose E. Marchesi wrote:
>>>
>>>>> [...]
>>>>>
>>>>>> I agree that distcheck is good but not a cure all. Any static
>>>>>> system can be attacked when there is motive, and unit tests are
>>>>>> easily gamed.
>>>>>>
>>>>> The issue seems to be releases containing binary data for unit tests,
>>>>> instead of source or scripts to generate that data. In this case,
>>>>> that binary data was used to smuggle in heavily obfuscated object
>>>>> code.
>>>>>
>>>> As a side note, GNU poke (https://jemarch.net/poke) is good for
>>>> generating arbitrarily complex binary data from clear textual
>>>> descriptions.
>>>>
>>> While it is suitable for that use, at last check poke is itself very
>>> complex, complete with its own JIT-capable VM. This is good for
>>> interactive use, but I get nervous about complexity in testsuites,
>>> where simplicity can greatly aid debugging, and it /might/ be possible
>>> to hide a backdoor similarly in a poke pickle. (This seems to be a
>>> general problem with powerful interactive editors.)
>>>
>>
>> Yes, I agree simplicity it is very desirable, in testsuites and actually
>> everywhere else. I also am not fond of dragging in dependencies.
>>
>
> Exactly---I am sure that poke is great for interactive use, but a
> self-contained solution is probably better for a testsuite.
>
>> But I suppose we also agree in that it is not possible to assembly
>> non-trivial binary data structures in a simple way, without somehow
>> moving the complexity of the encoding into some sort of generator, which
>> will not be simple. The GDB testsuite, for example, ships with a DWARF
>> assembler written in around 3000 lines of Tcl. Sure, it is simpler than
>> poke and doesn't drag in additional dependencies. But it has to be
>> carefully maintained and kept up to date, and the complexity is there.
>>
>
> The problem for a compression tool testsuite is that compression
> formats are (I believe) defined as byte-streams or bit-streams.
> Further, the generator(s) must be able to produce /incorrect/ output
> as well, in order to test error handling.
>>> Further, GNU poke defines its own specialized programming language for
>>> manipulating binary data. Supplying generator programs in C (or C++)
>>> for binary test data in a package that itself uses C (or C++) ensures
>>> that every developer with the skills to improve or debug the package
>>> can also understand the testcase generators.
>>>
>>
>> Here we will have to disagree.
>>
>> IMO it is precisely the many and tricky details on properly marshaling
>> binary data in general-purpose programming languages that would have
>> greater odds to lead to difficult to understand, difficult to maintain
>> and possibly buggy or malicious encoders. The domain specific language
>> is here an advantage, not a liability.
>>
>> This you need to do in C to encode and generate test data for a single
>> signed 32-bit NUMBER in an output file in a _more or less_ portable way:
>>
>> void generate_testdata (off_t offset, int endian, int number)
>> {
>> int bin_flag = 0, fd;
>>
>> #ifdef _WIN32
>> int bin_flag = O_BINARY;
>> #endif
>> fd = open ("testdata.bin", bin_flag, S_IWUSR);
>> if (fd == -1)
>> fatal ("error generating data.");
>> if (endian == BIG)
>> {
>> b[0] = (number >> 24) & 0xff;
>> b[1] = (number >> 16) & 0xff;
>> b[2] = (number >> 8) & 0xff;
>> b[3] = number & 0xff;
>> }
>> else
>> {
>> b[3] = (number >> 24) & 0xff;
>> b[2] = (number >> 16) & 0xff;
>> b[1] = (number >> 8) & 0xff;
>> b[0] = number & 0xff;
>> }
>>
>> lseek (fd, offset, SEEK_SET);
>> for (i = 0; i < 4; ++i)
>> write (fd, &b[i], 1);
>> close (fd);
>> }
>>
>
> While that is a nice general solution, (aside from neglecting the
> declaration "uint8_t b[4];"; with "int b[4];", the code would only
> work on a little-endian processor; with no declaration, the compiler
> will reject it) a compression format would be expected to define the
> endianess of stored values, so the major branch in that function would
> collapse to just one of its alternatives. Compression formats are
> generally defined as streams, so a different decomposition of the
> problem would likely make more sense: (example untested)
>
> void emit_int32le (FILE * out, int value)
> {
> unsigned int R, i;
>
> for (R = (unsigned int)value, i = 0; i < 4; R = R >> 8, i++)
> if (fputc(R & 0xff, out) == EOF)
> fatal("error writing int32le");
> }
> Other code handles opening OUT, or OUT is actually stdout and we
> are writing down a pipe or the shell handled opening the file.
> (The main function can easily check that stdout is not a terminal
> and bail out if it is.) Remember that I am suggesting test
> generator programs, which do not need to be as general as
> ordinary code, nor do they need the same level of
> user-friendliness, since they are expected to be run from scripts
> that encode the precise knowledge of how to call them. (That
> this version is also probably more efficient by avoiding a
> syscall for every byte written is irrelevant for its intended
> use.)
>> This is the Poke equivalent:
>>
>> fun generate_testdata = (offset<uint<64>,B> off, int<32> endian, int<32>
>> number) void:
>> {
>> var fd = open ("testdata.bin");
>> set_endian (endian);
>> int<32> @ fd : off = number;
>> close (fd);
>> }
>>
>> And thanks to the DSL, this scales nicely to more complex structures,
>> such as an ELF64 relocation instead of a signed 32-bit integer:
>>
>> fun generate_testdata = (offset<uint<64>,B> off, int<32> endian, int<32>
>> number) void:
>> {
>> type Elf64_RelInfo =
>> struct Elf64_Xword
>> {
>> uint<32> r_sym;
>> uint<32> r_type;
>> };
>> type Elf64_Rela =
>> struct
>> {
>> offset<uint<64>,B> r_offset;
>> Elf64_RelInfo r_info;
>> offset<int<64>,B> r_addend;
>> };
>> var fd = open ("got32reloc.bin");
>> set_endian (endian);
>> Elf64_Rela @ 0#B
>> = Elf64_Rela { r_info = Elf64_RelInfo { r_sym = 0xff00, r_type = 3 } }
>> close (fd);
>> }
>>
>
> Indeed Poke's language is excellent for concisely describing complex
> structures accurately, closely resembling C's syntax, but if I
> understand correctly, compression tool test data has few complex
> structures and is mostly byte-streams.
> Also, how did poke know to send the structure to "got32reloc.bin"?
> Was it simply the most-recently-opened file (or perhaps the top entry
> on a stack of recently-opened files)? Were the OFF and NUMBER
> parameters intentionally unused, accidentally unused, or somehow
> implicitly used in the second example?
The map in the second example ougth to be:
Elf64_Rela @ fd : 0#B
= Elf64_Rela { r_info = Elf64_RelInfo { r_sym = 0xff00, r_type = 3 } }
>
> -- Jacob
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Jacob Bachmeyer, 2024/04/01
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Jose E. Marchesi, 2024/04/01
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Zack Weinberg, 2024/04/01
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Russ Allbery, 2024/04/01
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Zack Weinberg, 2024/04/01
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Eric Gallager, 2024/04/01
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Bruno Haible, 2024/04/01
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Jacob Bachmeyer, 2024/04/02
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Jacob Bachmeyer, 2024/04/02
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Russ Allbery, 2024/04/02
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Eric Gallager, 2024/04/02
- Re: GNU Coding Standards, automake, and the recent xz-utils backdoor, Jacob Bachmeyer, 2024/04/02