bug-gettext
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #59658] xgettext and msginit don't honor SOURCE_DATE_EPOCH


From: Miguel Ángel Arruga Vivas
Subject: [bug #59658] xgettext and msginit don't honor SOURCE_DATE_EPOCH
Date: Fri, 18 Dec 2020 12:55:10 -0500 (EST)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0

Follow-up Comment #9, bug #59658 (project gettext):

[comment #8 comment #8:]
> 3) What the "reproducible builds" initiative is about: See
https://reproducible-builds.org/docs/definition/
> [...]
> https://reproducible-builds.org/docs/plans/

I'm very thankful of your comments and your pointers, as I haven't read until
now most of the pages you're pointing out.  Nonetheless, your interpretation
of these documents and their objectives isn't the same as mine.  Disagreement
is a great thing, as we can learn new things from other points of view and
understand each other a bit more.

My personal interpretation of those documents regarding the issue at hand is
that the ability of "recreate bit-by-bit identical copies of all specified
artifacts" is the final objective, and divergences from that are only accepted
as a temporal workaround until everybody is on board---as freedom only comes
with agreement, not by imposition.

Despite this I want to clarify that my main concern is a general design issue:
I don't consider acceptable the modification of the input processing outside
of the purpose of the tool in hand, only to cope with an unrelated problem and
because "it fits there".  Sincerely, I wouldn't be happy if the attached patch
which implements your proposal for msginit was "the contribution" that closed
this bug report.

>From my point of view, the behavior of msginit shouldn't change for this bug
report, as shouldn't have changed msgfmt in that sense.  The timestamp used
for the output has to be acknowledged as an input, and workarounds like
dateshift are just that, not at all a sign of good design.  SOURCE_DATE_EPOCH,
regardless of reproducible-builds.org, provides an already standard way of
controlling exactly that input.

> 4)
> > they are symptoms, not the issue.  The impossibility of controlling one of
the implicit inputs (the time) is the issue here.
> 
> I disagree here, on two reasons:
> * In https://reproducible-builds.org/docs/timestamps/ the
reproducible-builds.org people state that the *preferred* way to handle
timestamps is to omit them from the output. This is what I propose (1) for the
en@quot.po files. The SOURCE_DATE_EPOCH is a second-choice mechanism, which
can be useful when you don't want to go into the details.
That link (at least on my computer) contains on the second paragraph of the
header "timestamps are best avoided":

If a date is required to give users an idea on when the software was made, it
is better to use a date that is relevant to the source code instead of the
build

But you, and I agree on that, insist on the usefulness of the timestamp
generated by xgettext---which was the main issue at hand on both bug reports
as the msginit header year wasn't something to look at before rolling the hill
up---, therefore *their preferred way* isn't omitting the timestamp but
*providing a meaningful value* based on the source date and not the build
date. 

The responsibility of providing the right date is on the shoulders of the
person executing the software, the responsibility of the software is to allow
the control that through some mechanism or any other mean, which currently is
an implicit and non-easily controllable input from the system clock.  sed, awk
or even switching bits by hand can be done, but xgettext and msginit are the
only ones who can avoid the need of any further cleanup after them when the
builder wants to do exactly that.

> * The SOURCE_DATE_EPOCH mechanism is dangerous: 
Dangerous?  Human response to danger is fear, which isn't a reason but a
feeling, therefore I'm trying to answer from that point of view, because those
feelings may be there and they should be acknowledged and treated as such.  I
try to understand them along the responsibility of maintaining a project so
useful and long lived, and as user I have to thank you for all your dedication
and contributions towards making software accessible to everybody regardless
their language.  For that reason, I'd like to move the terms of the discourse
to advantages/benefits/disadvantages/risks related to them, and analyze the
problem with that in mind.

> It can produce output that is not meaningful. I do not want an xgettext or
msginit program that produces "POT-Creation-Date: 1970-01-01 00:00+0000" or
"Automatically generated, 1971.", because that would lead to confusion and
trouble for translators.
This is obviously a risk, but that would be reported as a bug by any
translator to the creator of the POT, as it's their responsibility to provide
a useful date---xgettext can only provide a mechanism for that.  Paraphrasing
from the other thread, if any project maintainer wants to enforce the usage of
the system clock, SOURCE_DATE_EPOCH can be unset right before the desired
invocation, therefore this risk has a low final impact from my point of view.

Furthermore, the date retrieved from the system clock isn't meaningful either,
as it doesn't follow the code but the invocation.  A recreation of an old POT
file doesn't provide a sensible value here, so there is an advantage to the
end users being able to control it.

> Everyone can modify the current time on a per-process basis, using tools
such as dateshift http://www.linuxcertif.com/man/1/dateshift/ or time-warp.so
https://www.thanassis.space/tricks.html . So, while everyone can generate
bogus POT files, it should not be that easy.
>
> I want a feature that is not so easy to fool into producing bogus output,
even if it covers only a special case of what you consider a general issue.
A system clock without battery nor ntp produces a similar output.  I guess
you're here confusing the responsibilities of the one who executes the code
from the one who creates it: the first one has the responsibility of providing
the inputs, the second of processing the inputs and generating the output,
again for the usage of the former.  To call "bogus" the desired output based
on the input provided by the user may be a bit loaded.  The environment
variables, as any command line parameter, are completely controlled by the
caller: only a person who doesn't know how the execution process works can be
fooled that way, which again is out of the scope of gettext.

> 5)
> > nobody defines SOURCE_DATE_EPOCH on their environment without a concrete
objective, even less by default.
> 
> They are defined (or will get defined over time) on many distros.
That isn't true on any interactive session on any current system, and I'd say
that you don't have anything to worry in that sense in the near future.  Be
sure that I'd raise as many bug reports as needed to anybody who has the
nefarious idea of defining that environment variable by default on any kind of
session used for actual user interaction.

I think that source of confusion here is because distros following
reproducible-builds.org ideas define that variable only _under their build
processes for their distributable packages_.

> See here e.g. for Fedora
https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/57
There is a big issue with that macro: the changelog date of the spec isn't a
meaningful one because it isn't related to the source date and it's allowed to
have a common spec for several versions of the same software.  The variable
must be set to the maximum of either the source date or the spec changelog
latest date, not only the latter.  Clearly any distro using that flag doesn't
comply with the spirit of SOURCE_DATE_EPOCH spec.

> and here for RedHat https://bugzilla.redhat.com/show_bug.cgi?id=1793722 .
It's a shame that this was directly closed as NOTABUG even when it clearly is.
The reaction wasn't very polite neither.  Sad. :( 

> I'm sure that if I searched for the build recipes on openSUSE, I would find
similar things.
Sure, because they use the same rpm flag:
https://en.opensuse.org/openSUSE:Reproducible_Builds

> If these distros produce POT files in their source rpms, I don't want them
to contain "POT-Creation-Date: 1970-01-01 00:00+0000".
That isn't the case in any of these projects.  They'd produce POT files with
the latest .spec changelog entry date, which also is quite bad but nowhere
close to your example.

I understand your worry about this point, but this isn't the forum to discuss
their choices and neither your view point nor mine cannot be enforced over
them.  As you said, they could already use any of the currently available
workarounds to give a bogus but fixed date, but they are trying to give a
*meaningful* one, even when it's clearly wrong for many workflows using .spec
files.

> 6)
> > their suggested mechanism for git projects is:
> > SOURCE_DATE_EPOCH=$(git log -1 --pretty=%ct)
> 
> OK, good to see that there is even a simpler format string directive for
producing a pretty date.
> 
> The difference between what you have in mind and what I prefer is that I
prefer a setting in the Makevars file, that can be set by a package
maintainer, and that a distro cannot abuse.
Distributions don't generate POT files for projects except for their software,
so even in that case they will be well aware of the issue as they probably
want to have their software translated, and they probably don't want to
receive continuous complaints from their translators.

Wrapping up, I see two main options that would solve the issue:
1. The implementation of SOURCE_DATE_EPOCH that I propose.
2. The usage of a command line parameter or another variable like git (e.g.
XGETTEXT_POT_CREATION_DATE).

A third option of adding code only to Makevars without any other change
wouldn't really be backwards compatible nor practical for projects that call
xgettext by other means, which I think is enough disadvantage to discard it.

1. provides an uniform interface, which I see as a benefit because it eases
the cognitive burden of the computer user.  A disadvantage for this option is
that RPM currently is misusing that variable.  The package maintainer/builder
also has the advantage of only having to set this variable for several
processes, e.g. help2man.

2. it provides an specific interface which can be extensively adapted and
documented.  I'd be happy of hearing more about the benefits of this option,
but I'm only able to see the disadvantages, which aren't as hard as with the
third option: Makevars has to be changed if the option comes from the command
line (and old software builds would need a wrapper anyway), or the cognitive
burden of the caller must increase with another environment variable which
really doesn't add nor require anything extra, unlike
GIT_{COMMIT,AUTHOR}_DATE.

My position still stays with 1 because the advantages seem bigger than the
disadvantages, risks and possible issues, speaking too as a maintainer of only
5 translations <https://translationproject.org/team/es.html>, but I'm open to
hear your (rational) point of view, as well as I'd be thankful to read about
any advantage and/or disadvantage that I haven't take into account, because I
hope that we can finally agree on a way forward to transform this report into
a nice new line in the NEWS file. :-)

(file #50513)
    _______________________________________________________

Additional Item Attachment:

File name: 0001-msginit-Do-not-use-POT-Creation-Date.patch Size:1 KB
   
<https://file.savannah.gnu.org/file/0001-msginit-Do-not-use-POT-Creation-Date.patch?file_id=50513>



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?59658>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]