guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Git-LFS or Git Annex?


From: Simon Tournier
Subject: Re: Git-LFS or Git Annex?
Date: Thu, 25 Jan 2024 17:55:11 +0100

Hi Ludo, all,

On mer., 24 janv. 2024 at 16:22, Ludovic Courtès <ludo@gnu.org> wrote:

> The question boils down to: Git-LFS or Git Annex?

Some months ago, I gave a look for managing some datasets.  My
conclusion is Git-Annex.  The main drawback of Git-LFS is that the
server needs to support the protocol.  On Git-Annex side, the main
drawback is Haskell.

Haskell could seem a detail but it is not when considering other
architectures than x86_64.  Give a look to CI filtering with ’ghc-’:

    http://ci.guix.gnu.org/eval/1074397/dashboard?system=i686-linux

Here I pick i686 as an example for making the point of the Haskell
support of non-x86_64.  Aside, I do not speak about the resources that
Haskell requires for being compiled.

Do not take me wrong: it does not mean that’s a roadblock but let keep
that in mind: Git-Annex comes with limitations because of Haskell.

That’s said, Git-Annex seems adapted for the workflow you describe:
backup large files between various servers.  And it would be a bridge
between content and address.  However, the content still needs to be
stored on some servers, IMHO.  Git-Annex supports “special remotes” [1]
but it is not clear for me if the aim is to distribute the workload
between the two main servers or if the aim is just to ease the
maintenance of backups.

Last, you speak about content-addressed and this part is not clear for
me.  In Git-Annex, you have in one hand the Git content-addressed system
and in the other hand the “key-value backends“ [2].  Somehow, Git-Annex
stores the key in a file that is stored in Git itself and the value is
somehow stored outside Git itself.

Recently, support of Git-LFS had been added to git-download with
a4db19d8e07eeb26931edfde0f0e6bca4e0448d3.  In that context, with
content-addressed in mind, are you speaking to add Git-Annex support and
thus distribute the videos as substitutes; probably also easing the
maintenance of backups.  Or is the question unrelated?

On a side note, depending on the size of the videos, it is only possible
to use non-cryptograpgically backends as URL.

All that said, let fix the ideas: a simple example, sync content between
machine-A and machine-B where original content is also kept elsewhere.

Let create a Git repository with a file annexed.

--8<---------------cut here---------------start------------->8---
machine-A$ mkdir example && cd example
machine-A$ git init && git annex init

machine-A$ $ git annex addurl -b MD5 --file sources.json \
                 https://guix.gnu.org/sources.json
addurl https://guix.gnu.org/sources.json 
(to sources.json) ok
(recording state in git...)

machine-A$ file sources.json
sources.json: symbolic link to 
.git/annex/objects/jx/1j/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a

machine-A$ git annex add .
machine-A$ git commit -am 'Add sources.json'
[master (root-commit) bdf6bca] Add sources.json
 1 file changed, 1 insertion(+)
 create mode 120000 sources.json
--8<---------------cut here---------------end--------------->8---

Let’s backup.

--8<---------------cut here---------------start------------->8---
machine-B$ $ git clone file:///tmp/example backup && cd backup/

machine-B$ file sources.json 
sources.json: broken symbolic link to 
.git/annex/objects/jx/1j/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a
--8<---------------cut here---------------end--------------->8---

As you see, here nothing is really copied.  It is only a symbolic link
pointing to some content outside what Git trackes.

--8<---------------cut here---------------start------------->8---
machine-B$ guix hash -rx .
0x8kiaprmjq6f02pdq155wlznxhzi871mk0la6sp944q854pcpn5

machine-B$ git annex get sources.json
get sources.json (from origin...) 
ok
(recording state in git...)

machine-B$ guix hash -rx .
0x8kiaprmjq6f02pdq155wlznxhzi871mk0la6sp944q854pcpn5

machine-B$ file sources.json
sources.json: symbolic link to 
.git/annex/objects/jx/1j/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a
--8<---------------cut here---------------end--------------->8---

Let’s remove the file on machine-B; for whatever reason.

--8<---------------cut here---------------start------------->8---
machine-B$ git annex drop sources.json
drop sources.json ok
(recording state in git...)

machine-B$ file sources.json
sources.json: broken symbolic link to 
.git/annex/objects/jx/1j/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a
--8<---------------cut here---------------end--------------->8---

And assume that machine-A is now unreachable.   Let’s get again on
machine-B.

--8<---------------cut here---------------start------------->8---
machine-B$ git annex get sources.json
get sources.json (from web...) 
ok
(recording state in git...)

machine-B$ file sources.json
sources.json: symbolic link to 
.git/annex/objects/jx/1j/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a/MD5-s2697572--a2b17d21f5a209b1763ad537a97d9c5a
--8<---------------cut here---------------end--------------->8---

As we see, since ’origin’ is unreachable, it fetches directly from the
web.  Well, on machine-B running:

    git annex sync && git annex get -A

allows to first update the keys and then to fetch all the new content
from ’origin’.  It eases the maintenance of backups, IMHO.

The main advantages are: all is versioned thanks to Git and what is
locally stored is fine-controlled.

Well, if some motivated Haskeller would find fun to implement NAR as
backend, it would allow transparent substitution; from my understanding,
if the key contains NAR hash then it would be possible to bridge with
Guix content-addressed system. :-)

Cheers,
simon


1: https://git-annex.branchable.com/special_remotes/
2: https://git-annex.branchable.com/backends/
3: https://git-annex.branchable.com/internals/key_format/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]