guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: distributed substitutes: file slicing


From: Florian Klink
Subject: Re: distributed substitutes: file slicing
Date: Mon, 26 Jun 2023 15:41:32 +0200

On 23-06-21 00:44:06, Csepp wrote:
I have a question / suggestion about the distributed substitutes
project: would downloads be split into uniformly sized chunks or could
the sizes vary?
Specifically, in an extreme case where an update introduced a single
extra byte at the beginning of a file, would that result in completely
new chunks?

An alternative I've been thinking about is this:
find the store references in a file and split it along these references,
optionally apply further chunking to the non-reference blobs.

It's probably best to do this at the NAR level??

Storing reference offsets is already something that we should be doing to
speed other operations up, so this could tie in nicely with that.

A bit late to the party, but I've been toying around with a different
model to represent contents inside store paths - see [tvix-store-docs]
for more details.

Essentially, tvix-store internally uses a model similar to git trees,
but with Blake3 as a digest for blobs (regular file contents).
Even with all that, you can still put on a NAR lens, and get back a
byte-by-byte identical NAR representation of a store path.

Because blake3 enables [verified streaming][bao], there's no need to
make granular chunking part of the information to encode - it can be a
transport concern only. It also allows easy "seeking" into different
parts of a store path, and due to content-adressability, easy partial
fetching.

I've been playing around with using a blob storage implementation
storing these blobs with content-defined chunking (and eventually
exposing more granular chunking data to clients).
Due to the "decomposition" of the NAR (storing blobs separately from the
"surrounding skeleton"), we always look at file contents separately.

I didn't yet run any benchmarks on whether it makes sense to "blank out"
store paths before ingesting, and dynamically applying these references
on top, but would be interested in some discussion around some
experiments.


flokli

--

[tvix-store-docs]: https://cs.tvl.fyi/depot/-/tree/tvix/store/docs
[bao]: https://github.com/oconnor663/bao



reply via email to

[Prev in Thread] Current Thread [Next in Thread]