gwl-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gwl-devel] Next steps for the GWL


From: zimoun
Subject: Re: [gwl-devel] Next steps for the GWL
Date: Thu, 6 Jun 2019 12:55:52 +0200

Hi,

On Thu, 6 Jun 2019 at 12:11, Ricardo Wurmus
<address@hidden> wrote:

> > One of the things I'd love to do
> > with GWL is to make it play well with git-annex, something that would
> > almost certainly be too specific for GWL itself.  For example
> >
> >   * Make data caching git-annex aware.  When deciding to recompute data
> >     files, GWL avoids computing the hash of data files, using scripts as
> >     the cheaper proxy, as you described in address@hidden
> >     But if the user is tracking data files with git-annex, getting the
> >     hash of data files becomes less expensive because we can ask
> >     git-annex for the hash it has already computed.
> >
> >   * Support getting annex data files on demand (i.e. 'git annex get') if
> >     they are needed as inputs.
>
> I wonder what the protocol should look like.  Should a workflow
> explicitly request a “git annex” file or should it be up to the person
> running the workflow, i.e. when “git annex” has been configured to be
> the cache backend it would simply look up the declared input/output
> files there.
>
> I suppose the answers would equally apply to using IPFS as a cache.

I agree that the mechanism such as `git-annex` should be nice.
But is it not a mean for the CAS that we previously discussed?

I fully agree with the features and their description. Totally cool!
However, I am a bit reluctant with `git-annex` because it requires a
Haskell compiler and it is far far from "bootstrapability". I am aware
of the Ricardo's try---and AFIAK the only one. And here [1]
explanations by one Haskeller.

My opinion: GWL should stay on the path of Reproducibility,
end-to-end. So `git-annex` should be a transitional step---while the
Haskell bootstrap is not solved---as a mean for the CAS (cache) and I
would find more elegant to use the "data-oriented IPFS": IPLD [2].


[1] https://www.joachim-breitner.de/blog/748-Thoughts_on_bootstrapping_GHC
[2] https://ipld.io/


All the best,
simon



reply via email to

[Prev in Thread] Current Thread [Next in Thread]