guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Preservation of Guix Report


From: Timothy Sample
Subject: Re: Preservation of Guix Report
Date: Fri, 22 Oct 2021 10:19:17 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hey,

Ludovic Courtès <ludo@gnu.org> writes:

> Timothy Sample <samplet@ngyro.com> skribis:
>
>> Early this summer I did a bunch of work trying to figure out which Guix
>> sources are preserved by the SWH archive.  I’m finally ready to share
>> some preliminary results!
>>
>>     https://ngyro.com/pog-reports/2021-10-20/
>>
>> This report is already quite outdated, though.  It only covers commits
>> up to the end of May, and sometime in June is when the sources were
>> checked against the SWH archive.  I’m sharing it now to avoid any
>> further delays.
>
> This is truly awesome!  (Did you manage to grab all that info with the
> default rate limit?!)

Yes, but I have another trick.  The “known” endpoint [1].  If you
already know the SWHIDs you want to check, you can check 1,000 per call.
With the anonymous rate limit, I can check 120,000 every hour, which is
plenty.

[1] 
https://docs.softwareheritage.org/devel/swh-web/uri-scheme-api.html#get--api-1-content-known-(sha1)[,(sha1),%20...,(sha1)]-

> I can’t wait for the updated report now that Simon and yourself have
> identified that SWHID computation bug!

I’m computing SWHIDs while writing this.  Not long now!

> Some of our <git-reference> refer to tags, not commits.  How do you
> determine whether they’re saved?

The short answer is “elbow grease”.  Basically, I’m taking a “work
harder, not smarter” approach.  :p  I go out and obtain the source,
verify it with Guix’s hash, and then compute the SWHID.  This is another
thing we could move to the CI infrastructure, but I think there might be
some hiccoughs.  For git-references, I believe we can’t just compute the
ID after the download derivation – we would have to change the download
derivation itself.  Maybe add an ‘swhid’ output?  It’s a little more
complicated than just throwing up some scripts, anyway.

> ‘guix lint -c archival’ uses ‘lookup-origin-revision’, which is a good
> approximation, but it’s not 100% reliable because tags can be modified
> and that procedure only tells you that a same-named tag was found, not
> that it’s the commit you were expecting.  (And really, we should stop
> referring to tags.)

Like zimoun said elsewhere in this thread, having an explicit mapping
from Guix hash to SHWID will improve reliability quite a bit.  It’s hard
to get to 100%, though!  With the reports, we will eventually be able to
check everything.  However, there’s still a small possibility of bugs
and false positives.  Ultimately, I’m hoping the reports will help
detect small problems (some specific source is missing) and guide our
efforts on big problems (xz support in Disarchive or support for more
version control systems, etc.).


-- Tim



reply via email to

[Prev in Thread] Current Thread [Next in Thread]