help-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-tar] Reproducibility of tar archives


From: Jakob Bohm
Subject: Re: [Help-tar] Reproducibility of tar archives
Date: Mon, 1 Apr 2019 12:12:27 +0200
User-agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 31/03/2019 14:08, Yann E. MORIN wrote:
Hello All,

Recent versions of tar have slightly changed the format of archives.
Most notably:

   - 1.27 changed gnu long link headers for path elements > 100
     characters

   - 1.30 changed --numeric-owner for filenames > 100 characters

In Buildroot, we are using hashes of archives to ensure reproducibility
of the source code we build. We also generate tarballs for licensing
compliance. In both cases, we use hashes for those archives.

The two changes above mean that we have to restrict the tar versions we
accept to a small subset. All the hashes we have so far have been made
over the years, and they all use the format that was generated by
versions 1.27 to 1.29. As distributions are updated, they all switch to
1.30 or later, we have to then always build our own version of tar.

Currently, we envision three paths:

   - keep the status quo: this is not nice, because we would always have
     to build our own tar going forward, for every builds;

   - switch to an alternate archive format: this is not nice, because
     people are used to tarballs, and the alternatives are not all
     reproducible either; those that are repriducible are much less
     known, or practical to use, than tarballs;

   - bite the bullet, and redo all the hashes with the newer tar format:
     in the future every one will have a newer tar, and so we won't have
     o build our own every time.

That last point is what we would prefer, if we could be sure that there
would be no change in the output format in the foreseeable future.

So, here's my question: starting with tar-1.32 (the latest release as of
today), is the gnu tar format considered stable now, or is there no
guarantee about the gnu tar format stability?

For reference, here's how we generate the archives:

     tar cf - \
         --numeric-owner --owner=0 --group=0 --mtime="${date}" \
         --format=gnu -T "list.sorted" >"${output}.tar"

Can we expect this to be reproducible with future tar releases?

As a more general solution for others in a similar predicament, could
GNU tar add the ability to explicitly request the formats produced by
earlier versions, for example by adding options such as
--format=gnu1.27 and --format=gnu1.30(named for the versions that
first introduced the specific format changes, with a view to add new
ones as future changes are introduced).

Alternatively, could the Buildroot and GNU tar teams check if one of
the historic formats already explicitly supported by the --format
option provides the required stability.

Either way, the difference is between two interpretations of the
--format option: A. Restrict the output to headers that are
understood by specific old/3rd party unpackers.  B. Reproduce a
very specific output, including how tar chooses between seemingly
equivalent header types, ignored values etc.  This includes
bugward compatibility with historic tar output bugs that made the
wrong choices.

The 3rd option, consistent with how reproducible builds are
otherwise done, is to treat tar as part of the tool chain, thus
making the exact build or source version of tar part of the list
of exact tool versions needed to reproduce a specific build (just
like there is already a requirement to use exact versions of gcc,
autotools etc.), doing so would also allow the historic hash values
to remain valid, as they are each tied to the tar version they were
historically built with.


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded




reply via email to

[Prev in Thread] Current Thread [Next in Thread]