lilypond-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using 'libfaketime' for reproducible builds


From: Han-Wen Nienhuys
Subject: Re: Using 'libfaketime' for reproducible builds
Date: Mon, 28 Dec 2020 10:52:37 +0100

On Mon, Dec 28, 2020 at 10:26 AM Jonas Hahnfeld <hahnjo@hahnjo.de> wrote:

> Am Sonntag, dem 27.12.2020 um 22:24 +0100 schrieb Werner LEMBERG:
> > > Intercepting syscalls (or whatever the library does, I didn't
> > > check) doesn't sound like the right approach outside of testing
> > > reproducibility.
> >
> > Why?  It's even less intrusive than the `SOURCE_DATE_EPOCH` solution.
>
> I definitely consider intercepting various syscalls by means of
> LD_PRELOADing more intrusive than setting a single environment variable
> that was invented for the purpose of setting timestamps. Just think of
> a new shiny syscall that might add a new source of non-reproducibility.
>

I agree with Jonas. As a further argument, LD_PRELOAD is also dependent on
the platform; I think it wouldn't work on OSX, for example.


> > > I think that's a pity, but nothing we can change as a
> > > "consumer" of library functions.
> >
> > Exactly.  As long as we don't change LilyPond to produce PDFs by
> > itself – which is a huge undertaking that I certainly won't start –
> > I think we have no other choice than using something like
> > 'libfaketime' or a patched gs version.  I definitely prefer the
> > former.
>
> What I wanted to say is that we cannot change the developers' minds to
> support the environment variable. But we can (and IMHO should) use all
> available interfaces if we care about reproducibility. I see at least
> two more options:
>

Yes, +1 from me.


>
> 1) Strip non-determinism from the generated PDF. This is even mentioned
> at https://reproducible-builds.org/docs/timestamps/ - before discussing
> libfaketime which spends more than half of the paragraph mentioning
> possible issues.
>
> 2) As we control the input PS code, we don't have to worry about the
> operators that get the current time, draw a random number, etc. (as
> long as we don't use them ourselves). Instead the bug linked above says
> we just need to tell GS which CreationDate and ModDate to use (via
> PDFmarks) and this should be straight-forward to fill with values
> depending on SOURCE_DATE_EPOCH.
> This probably leaves the UUIDs (is that the issue you mention above?)
> which can be overridden using -sDocumentUUID and -sInstanceUUID.
> Setting a constant time using libfaketime will result in the same UUID
> for all generated PDFs, so it can't get worse; but I think it would be
> desirable to do better than that and compute a "unique" ID based on the
> input file, maybe as simple as the hash of the file path. It must be
> considered that different values will prevent reuse of the GS API
> instance, but I'd argue that a constant value should be fine in this
> case.
>

the man for DocumentUUID says

Note that Ghostscript has no assess to the host node ID due to a
minimization of platform dependent modules. Therefore it uses an MD5 hash
of the document contents for generating UUIDs.
I wonder if we'd get reproducible documents if we provide only InstanceUUID
-- 
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen


reply via email to

[Prev in Thread] Current Thread [Next in Thread]