guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Preservation of Guix report for 2024-01-26


From: Timothy Sample
Subject: Preservation of Guix report for 2024-01-26
Date: Sat, 27 Jan 2024 18:47:27 -0600
User-agent: Gnus/5.13 (Gnus v5.13)

Hello all,

For a while now, I’ve been tracking coverage of Guix sources in the
Software Heritage (SWH) archive.  I maintain a dataset of sources that
goes back (almost five years) to Guix 1.0.0.  Every once in a while, I
update this dataset and check it against SWH to see how much is missing.
I just put together a new report.

The permalink is https://ngyro.com/pog-reports/2024-01-26, but you can
link to the latest report, too: https://ngyro.com/pog-reports/latest/.

New in this edition is checking for Subversion sources and
bzip2-compressed tarballs.  Subversion is well covered (98.5%), since it
is basically asking, “is TeX Live in SWH?”.  The bzip2 sources are
similar to other compressed tarballs.

One of the benefits of this report is that it catches issues with our
integration with SWH.  This is the second time publishing this report
that I discovered that SWH had stopped loading sources from us.  When
that happens, the number of missing sources starts climbing steeply for
recent commits.  Before publishing this, I reached out to SWH and they
restarted the loader.  It was able to bring in most of the sources but
you can see a slight increase in missing sources about halfway between
September (when it stopped) and now.  That’s likely due to sources that
came and went from our “sources.json” listing while they weren’t
looking.

Speaking of which, another benefit of this dataset is that we have a
list of ~6K historical sources that we would like to see added to SWH.
We are currently coordinating with them to load these sources.  I plan
to update the report when we get results from that.

However, there remain a handful of missing sources that are current, and
should be getting loaded.  This suggests areas where we could improve.
Here’s a not-quite-random sample of some of the current missing sources
(from commit 25bcf4e), and my thoughts as to why they are missing.

mirror://gnupg/gpgme/gpgme-1.18.0.tar.bz2
https://download.enlightenment.org/rel/apps/econnman/econnman-1.1.tar.gz
https://ftp.heanet.ie/mirrors/ftp.xemacs.org/aux/compface-1.5.2.tar.gz
mirror://cpan/authors/id/E/ET/ETHER/MooseX-Types-0.45.tar.gz
mirror://apache/commons/daemon/source/commons-daemon-1.1.0-src.tar.gz

  Some of these (I didn’t check them all) are in SWH as content rather
  than directories.  That’s kinda good, because Guix knows how to get
  them, but also kinda mysterious.  I’ve asked swh-devel about it.
  Depending on the answer, I might have to adapt the checks to deal with
  the possibility of SWH having the tarball rather than its contents.
  In fact, that might be an improvement either way, but it muddies the
  data model quite a bit.

https://rubygems.org/downloads/rjb-1.6.7.gem
https://rubygems.org/downloads/mspec-1.9.1.gem
https://rubygems.org/downloads/cztop-0.12.2.gem
https://rubygems.org/downloads/morecane-0.2.0.gem

  This is an error on my side.  I’ve been treating gems as regular
  files, but they are (and SWH treats them as) tarballs.

https://git.sr.ht/~abcdw/guile-ares-rs

  This one was in SWH, but not up-to-date enough to have the tag we use.
  I don’t think they regularly crawl git.sr.ht yet.  Also, it looks like
  they tried to visit this origin while SourceHut was down (around a
  week ago).  I used “Save code now” to fix this and now this source is
  in SWH.  This kind of thing should be improved soon, as they are
  working on new code that will pick up Git repositories from our
  “sources.json” file.

Given that some of those tarballs and Ruby gems are in fact in SWH and
I’m just missing them, we are probably doing better than the report
suggests!

The short-term road map for this is to send the historical sources to
SWH and fix the Ruby gems, and then make a new report.  So expect a
minor update with much better numbers soon-ish.

The long-term road map is to make it work like an archive.  It will run
continuously and store *all* Guix sources.  To make this easy data-wise,
it will only store what’s not covered by SWH.  I avoided this earlier
out of fear of creating another point of failure.  I’m still afraid of
this, but as it stands every source that is just out there on the
Internet and not in SWH is a point of failure.  Surely having them all
in one place would be better, right?


-- Tim



reply via email to

[Prev in Thread] Current Thread [Next in Thread]