help-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-tar] Reproducibility of tar archives


From: Yann E. MORIN
Subject: Re: [Help-tar] Reproducibility of tar archives
Date: Mon, 1 Apr 2019 22:15:57 +0200
User-agent: Mutt/1.5.22 (2013-10-16)

Jakob, All,

On 2019-04-01 12:12 +0200, Jakob Bohm spake thusly:
> On 31/03/2019 14:08, Yann E. MORIN wrote:
> >So, here's my question: starting with tar-1.32 (the latest release as of
> >today), is the gnu tar format considered stable now, or is there no
> >guarantee about the gnu tar format stability?
> >
> >For reference, here's how we generate the archives:
> >
> >     tar cf - \
> >         --numeric-owner --owner=0 --group=0 --mtime="${date}" \
> >         --format=gnu -T "list.sorted" >"${output}.tar"
> >
> >Can we expect this to be reproducible with future tar releases?
> >
> As a more general solution for others in a similar predicament, could
> GNU tar add the ability to explicitly request the formats produced by
> earlier versions, for example by adding options such as
> --format=gnu1.27 and --format=gnu1.30(named for the versions that
> first introduced the specific format changes, with a view to add new
> ones as future changes are introduced).

Since we can't predict what the future will be made of, I find it
interesting to be indeed able to specify exactly what version of the
format to use, because as it is, --format=gnu means different things
with differnt tar versions, so they are essentially different formats.

So yes, I like this proposal.

> Alternatively, could the Buildroot and GNU tar teams check if one of
> the historic formats already explicitly supported by the --format
> option provides the required stability.

Fact is, older formats that are "stable" are not all capable of storing
the necessary information, like filenames or paths > 100 chars, or
extended attributes and so on...

> Either way, the difference is between two interpretations of the
> --format option: A. Restrict the output to headers that are
> understood by specific old/3rd party unpackers.  B. Reproduce a
> very specific output, including how tar chooses between seemingly
> equivalent header types, ignored values etc.  This includes
> bugward compatibility with historic tar output bugs that made the
> wrong choices.

It is not so much about older unpackers to understand the format: older
tar version _are_ able to extract tarballs created with 1.30-onward.

Rather, it's that archives made with older tar versions can't be
reproduced with newer tar versions, because, as you very nicely
pointed out, they really generate another format.

> The 3rd option, consistent with how reproducible builds are
> otherwise done, is to treat tar as part of the tool chain, thus
> making the exact build or source version of tar part of the list
> of exact tool versions needed to reproduce a specific build (just
> like there is already a requirement to use exact versions of gcc,
> autotools etc.), doing so would also allow the historic hash values
> to remain valid, as they are each tied to the tar version they were
> historically built with.

The problem is that today, Buildroot uses tar-1.29, so all hashes are
generated with that "gnu-1.29" format, and they eventually percolate to
our source mirror (aka backup): http://sources.buildroot.org/

Tomorrow, we update Buildroot to use, say, tar-1.32. All existing
archives are to be done again, because their hash do not match. And
thus the newer archives would eventually replace the old ones. And then
older builds could no longer use those new archives, because they would
not match the old hashes...

That's why having a stable format is very important: we can generate
archives at various point in time, and be able to reuse them later as
they use the same scheme.

I see the point of having tar part of the toolchain, but that means that
the source archives can no longer be shared between builds; they
actually become artifacts of the build rather than the source...

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 561 099 427 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'



reply via email to

[Prev in Thread] Current Thread [Next in Thread]