guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Preservation of Guix Report


From: zimoun
Subject: Re: Preservation of Guix Report
Date: Thu, 21 Oct 2021 09:39:27 +0200

Hi Timothy,

On Wed, 20 Oct 2021 at 15:48, Timothy Sample <samplet@ngyro.com> wrote:

> Early this summer I did a bunch of work trying to figure out which Guix
> sources are preserved by the SWH archive.  I’m finally ready to share
> some preliminary results!
>
>     https://ngyro.com/pog-reports/2021-10-20/

Cool!  Really interesting.


> What’s cool is that the report is automated.  Next on my list is to
> update the database and generate a new report.  Then, we can compare the
> results and see if we are improving.  (My read on the results so far is
> that improving “sources.json” will yield big improvements, but we might
> not be able to get to that before the next report.)

Here two minor comments:

 1. Since a couple of days, I run:

        $ GUIX_SWH_TOKEN=$TOKEN guix lint -c archival

    where $TOKEN is provided by the SWH Authentication service [1].
    Instead of a rate limit at 120, it is 1200.  Therefore, more
    ’git-fetch’ packages are added.  I am in the process to automate
    that but do not hold your breath. :-)

 2. For still unknown reasons, the bridge between SWH and Disarchive has
    some holes.  For instance,

        $ guix lint -c archive znc
        gnu/packages/messaging.scm:996:12: znc@1.8.2: Disarchive entry refers 
to non-existent SWH directory '33a3b509b5ff8e9039626d11b7a800281884cf2a'

        $ wget https://guix.gnu.org/sources.json
        $ cat sources.json | jq | grep znc
             "integrity": "sha256-IwbxlQzncsWlmlf1SG1Zu5yrmEl8RfxJy8RawN7BGbs="
             "integrity": "sha256-q0jatpd+j0PW//szIo0ViGX2jd5wJtEjxpPXcznc8rs="
               "https://znc.in/releases/archive/znc-1.8.2.tar.gz";

        $ guix download https://znc.in/releases/archive/znc-1.8.2.tar.gz
        Starting download of /tmp/guix-file.hnjWTE
        From https://znc.in/releases/archive/znc-1.8.2.tar.gz...
         znc-1.8.2.tar.gz  2.0MiB                                     599KiB/s 
00:03 [##################] 100.0%
        /gnu/store/58khbiwp2ghhzg00gnzdy2jlfv49vajm-znc-1.8.2.tar.gz
        03fyi0j44zcanj1rsdx93hkdskwfvhbywjiwd17f9q1a7yp8l8zz

    Therefore, something is wrong somewhere.  Because of #1, I detect
    many of such examples.  I do not know if SWH-ID computed by
    Disarchive is incorrect or if SWH has not ingested.  Investigations
    required. :-)


1: <https://archive.softwareheritage.org/api/>


> It’s surprising to me that SWH is not already getting these from
> “sources.json”.  I picked an arbitrary one, “rust-quote-0.6”, and it’s
> simply not in “sources.json”.  On the other hand, I bet SWH would like a
> crates.io (and CRAN, etc.) loader, too.

>From the SWH doc, there is a CRAN lister [2] but I have not checked what
they ingest concretely.  Because on our side, we are using ’url-fetch’
and it appears to me possible to have a tiny mismatch between what is
inside the release tarball (what we concretely use) vs what SWH ingests
directly from CRAN.

2: 
<https://docs.softwareheritage.org/devel/apidoc/swh.lister.cran.html?highlight=cran#module-swh.lister.cran>


And answering to your question [3] about “sources.json”, I think the
ingestion started after this commit
35bb77108fc7f2339da0b5be139043a5f3f21493 from guix-artwork.  Other said,
SWH started to ingest from “sources.json” after July 2020; probably
around September 2020.

3: <https://lists.gnu.org/archive/html/guix-devel/2021-10/msg00141.html>

> One other way to help would be to suggest improvements to the report.  I
> don’t want to fiddle with it too much, but if there is some simple graph
> or table or list that should be there, I’m happy to give it a go.

For the Missing and Unknown fields, could you distinguish the kind of
origin?  Is it mainly git-fetch or url-fetch or others?

It would help to spot the issues to work on it (sources.json, SWH side,
Disarchive, etc.).


Cheers,
simon



reply via email to

[Prev in Thread] Current Thread [Next in Thread]