Buddelmeijer <hugo@buddelmeijer.nl> writes:
> Hi Konrad, Thibault and others,
>
> Konrad, is it perhaps possible for you to dig up this broken conda
> environment file?
Yes:
https://gist.github.com/brospars/4671d9013f0d99e1c961482peopledab533c57
That environment was set up in 2018 on a Linux machine, and then tested
under macOS and Windows as well. It broke in early 2019.
Thanks. Those dependencies indeed do not contain the hashes, so it is probably created with "conda env export --no-build".
I think such a file without build hashes would probably be what you want when you are giving a course, because it would allow students to install these exact versions of the packages, but build for their specific environment (e.g. Linux / macOS / Windows). It would provide limited reproducibility in the future, as you noticed. I guess you'd want three sets of environment files for a conda environment for a course:
1. With unpinned dependencies, so just "scipy", whenever possible. That way, you'd get the latest versions when rerunning the course. This requires frequent updates to the files to restrict/pin dependencies when necessary, e.g. "scipy<=1.8.0". This would be equivalent to a guix manifest file without any channel information.
2. With dependencies pinned just on version, "scipy=1.8.0", like the one you shared. This should allow you to get equivalent stacks on different environments. Guix does not really have an equivalent, by design, since it is not multi-platform. Although I suppose one could create a channel with many different versions of packages; then the manifest should specify the ones used.
3. With dependencies pinned on build hash, "scipy=1.8.0=py39hee8e79c_1". This should give you the exact same binaries every time. Roughly equivalent to a guix manifest with a channel file. But guix is still better, because its dependency graph is based on source code, which is easier to archive, so less chance of missing binaries (and more determinism).
Guix differentiates between scenarios 1 and 3 more cleanly, by having a clean separation between the manifest and the channels.
(Lets ignore the pip packages in the conda environment file for now.)
> It doesn't seem common to overwrite conda binaries. Conda takes some (not
> enough?) measures to prevent the scenario Konrad describes. In particular,
> the filenames include a 'hash' since conda 3 (~2014) [1]:
Weird. We worked with official Miniconda downloads from early 2018, and
our environment files contain no hashes.
Probably due to "--no-build" in "conda env export", or maybe the default was different back then.
My conclusion so far is that conda can never attain long-term
reproducibility, because it wants to be multi-platform. And that means
that it doesn't control the foundations on which it has to build.
Perhaps we are at the right time. I started using conda when I myself, or my colleagues, used many different environments. Linux, windows, mac, and different versions thereof. Back then, anaconda was great, because it was very hard to install everything otherwise.
However, nowadays everyone can run linux, either directly, or through WSL (windows subsystem for linux), or through containers. And everyone knows how to do this, and it is integrated in IDE's and such. So conda isn't really necessary anymore.
>From a user's point of view, a big problem with conda is the opacity of
the machinery, which in addition changes all the time as you say. With
Guix, I can understand how everything is built, and thus understand the
potential obstacles to a rebuild many years later. With conda, I don't
really know and my understanding is that the build machinery is not
even completely public (for Anaconda at least).
I agree with you on a philosophical level; ultimately understanding everything would be easier with guix. But we aren't there yet, I don't understand most of the guix packages I've looked at. That is probably because my guile/scheme skills are lacking.
Cheers,
Hugo