guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Preservation of Guix Report


From: Timothy Sample
Subject: Re: Preservation of Guix Report
Date: Thu, 21 Oct 2021 12:26:26 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hi zimoun,

zimoun <zimon.toutoune@gmail.com> writes:

>  2. For still unknown reasons, the bridge between SWH and Disarchive has
>     some holes.  For instance,
>
>         $ guix lint -c archive znc
>         gnu/packages/messaging.scm:996:12: znc@1.8.2: Disarchive entry refers 
> to non-existent SWH directory '33a3b509b5ff8e9039626d11b7a800281884cf2a'
>
> [...]
>
>     Therefore, something is wrong somewhere.  Because of #1, I detect
>     many of such examples.  I do not know if SWH-ID computed by
>     Disarchive is incorrect [...].

Bingo!

According to SWH (emphasis mine):

    SWHIDs for contents, directories, revisions, and releases are, *at
    present*, compatible with the Git way of computing identifiers for
    its objects.

This is not true anymore.  As they go on to say:

    Note that Git compatibility is incidental and is not guaranteed to
    be maintained in future versions of this scheme (or Git).

Disarchive does it the Git way, and SWH does something slightly
different.  The SWH hash is 4e58dc09b8362caf1265102130a593b070562a68,
but the Git hash is 33a3b509b5ff8e9039626d11b7a800281884cf2a.  The
difference is that Disarchive, like Git, ignores empty directories.  It
makes sense that an archival project like SWH would not do that, and
they indeed don’t.

Fixing this in Disarchive is going to make a *huge* difference, so that
is now high priority for me (it’s a one line change, but I want to fix
it, release it, update Guix, and recompute the report).

> And answering to your question [3] about “sources.json”, I think the
> ingestion started after this commit
> 35bb77108fc7f2339da0b5be139043a5f3f21493 from guix-artwork.  Other said,
> SWH started to ingest from “sources.json” after July 2020; probably
> around September 2020.
>
> 3: <https://lists.gnu.org/archive/html/guix-devel/2021-10/msg00141.html>

Thanks!  While investigating the above problem, I found a page that
lists what SWH is getting from us [1] and another showing when they are
scanning “sources.json” [2].  I don’t know if you’ve seen them before,
but they will be invaluable for figuring this stuff out.

[1] 
https://archive.softwareheritage.org/browse/origin/branches/?origin_url=https://guix.gnu.org/sources.json
[2] 
https://archive.softwareheritage.org/browse/origin/visits/?origin_url=https://guix.gnu.org/sources.json

> For the Missing and Unknown fields, could you distinguish the kind of
> origin?  Is it mainly git-fetch or url-fetch or others?

Good idea.  I think I can do this easily enough.  I might shelve it for
a bit, because I’m too excited to update the report with the Disarchive
hash fix.  :)


-- Tim



reply via email to

[Prev in Thread] Current Thread [Next in Thread]