lilypond-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using 'libfaketime' for reproducible builds


From: Jonas Hahnfeld
Subject: Re: Using 'libfaketime' for reproducible builds
Date: Mon, 28 Dec 2020 12:19:08 +0100
User-agent: Evolution 3.38.2

Am Montag, dem 28.12.2020 um 11:40 +0100 schrieb Werner LEMBERG:
> > I definitely consider intercepting various syscalls by means of
> > LD_PRELOADing more intrusive than setting a single environment
> > variable that was invented for the purpose of setting timestamps.
> > Just think of a new shiny syscall that might add a new source of
> > non-reproducibility.
> 
> What 'new shiny syscall' shall influence the creation of PDFs,
> specified by international standards?  I think this is a straw man
> argument.

For example a new syscall to get the current time. While improbable,
just look at things like the new statx call, it happens from time to
time that very fundamental interfaces are introduced and eventually
used.

> I dare to say that the ghostscript interface changes in the last few
> years are by far more numerous (look at the LilyPond commits
> Masamichi
> had to implement) than the number of time interface changes (which,
> AFAIK, are zero since a long time, but I'm not an expert)...
> 
> > 1) Strip non-determinism from the generated PDF. This is even
> >     mentioned at https://reproducible-builds.org/docs/timestamps/ -
> >     before discussing libfaketime which spends more than half of the
> >     paragraph mentioning possible issues.  [...]
> 
> This is what I've started with, see the attached experimental stuff.
> However, I stopped working on it since it will always remain a
> partial solution, because ...

This is not what I had in mind here, that is more towards my option 2).
With "strip" I really mean removing emitted fields from the generated
PDF. The format is very readable, a simple search-and-replace is easily
able to deal with all instances discussed so far.

> > This probably leaves the UUIDs (is that the issue you mention
> > above?)  which can be overridden using -sDocumentUUID and
> > -sInstanceUUID.
> 
> 
> ... there is one additional field called `/ID` in (some) PDF output
> files that is apparently a random-based value.  I've contacted some
> gs people to get more info on that.

If you look into the GS code, it is based upon a) the current time
(addressed by libfaketime) and b) the output file name that recent
LilyPond sets to a random string for atomically moving the generated
PDF.

> It also seems that ghostscript's creation and insertion of subsetted
> fonts is dependent on the system time.  To me this looks like a gs
> bug.  During my tests a lot of PDFs – even with the above 
> experimental changes – have exactly this problem (this is, the
> subsetted fonts were not identical inspite of completely identical
> source fonts), which means that you can't circumvent it.  Using
> 'libfaketime', this issue magically disappears.
> 
> > Setting a constant time using libfaketime will result in the same
> > UUID for all generated PDFs, so it can't get worse; but I think it
> > would be desirable to do better than that and compute a "unique" ID
> > based on the input file, maybe as simple as the hash of the file
> > path.
> 
> 
> Well, UUIDs as used by ghostscript are based on both the time and
> hash values, which means that we actually *do* get unique UUIDs, with
> the restriction that the first 12 digits of the UUID are a fixed
> value because of the frozen time.  In other words, this is not a
> reason to reject the use of 'libfaketime'.

Fair enough, but please note that my comment was in the part where I
was elaborating about possible solutions. Which just means that my
"better" got demoted to "equally good" (but controlled by LilyPond).

Jonas

Attachment: signature.asc
Description: This is a digitally signed message part


reply via email to

[Prev in Thread] Current Thread [Next in Thread]