gnunet-developers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encoding for Robust Immutable Storage (ERIS)


From: Christian Grothoff
Subject: Re: Encoding for Robust Immutable Storage (ERIS)
Date: Sat, 18 Jul 2020 13:59:11 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

Hello pukkamustard,

Interesting proposal, I can see the use for the verification capability.
For my taste, the block size is much too small. I understand 4k can make
sense for page tables and SATA, but looking at benchmarks 4k is still
too small to maximize SATA throughput. I would also worry about 4k for a
request size in any database or network protocol. The overheads per
request are still too big for modern hardware.  You could easily go to
8k, which could be justified with 9k jumbo frames for Ethernet and would
at least also utilitze all of the bits in your paths.  The 32k of ECRS
are close to the 64k which are reportedly the optimum for modern M.2
media. IIRC Torrents even use 256k.  The overhead from padding may be
large for very small files if you go beyond 4k, but you should also
think in terms of absolute overhead: even a 3100% overhead doesn't
change the fact that the absolute overhead is tiny for a 1k file.
Furthermore, you should consider a trick we use in GNUnet-FS, which is
that we share *directories*, and for small files, we simply _inline_ the
full file data in the meta data of the file that is stored with the
directory or search result. So you can basically avoid having to ever
download tiny files as separate entities, so for files <32k we have zero
overhead this way.

I'd be curious to see how much the two pass encoding costs in practice
-- it might be less expensive than ECRS if you are lucky (hashing one
big block being cheaper than many small hash operations), or much more
expensive if you are unlucky (have to actually read the data twice from
disk). I am not sure that it is worth it merely to reduce the number of
hashes/keys in the non-data blocks. Would be good to have some data on
this, for various file sizes and platforms (to judge IO/RAM caching
effects).  As I said, I can't tell for sure if the 2nd pass is virtually
free or quite expensive -- and that is an important detail. Especially
with a larger block size, the overhead of an extra key in the non-data
blocks could be quite acceptable.

For 3.4 Namespaces, I would urge you to look at the GNU Name System
(GNS). My plan is to (eventually, when I have way too much time and
could actually re-do FS...) replace SBLOCKS and KBLOCKS of ECRS with
basically only GNS.

Anyway, please do keep us posted on major evolutions of the standard! I
doubt we'll adopt with with 4k blocks, but if that changes, adding the
verification capability wouldn't be a bad thing IMO.

happy hacking!

Christian

On 7/10/20 8:59 AM, pukkamustard wrote:
> 
> Hello GNUNet,
> 
> I'd like to request feedback, questions and comments on an encoding of
> content very much inspired by ECRS that I have been working on: Encoding
> for Robust Immutable Storage (ERIS)
> 
> https://openengiadina.net/papers/eris.html
> 
> The motivation is to use the encoding in a social network like settings
> where short messages and interactions are encoded using ERIS (as RDF
> [1]).
> 
> There is one major difference to ECRS (and a couple smaller ones) that I
> would like to highlight:
> 
> 
> ** Verification capability
> 
> ERIS adds a verification capability. Holders of the verification
> capability can enumerate all blocks required to decode the content and
> verify integrity of the blocks without being able to decode the content.
> 
> This enables peers to cache the entire content without being able to
> read the content.
> 
> The verification capability is enabled by using two keys:
> 
> 1. A read key to encode the blocks holding content.
> 2. A verification key (which is deterministically derived from the read
>   key) to encode the intermediary nodes of the Merkle tree.
> 
> This makes the scheme slightly more complicated than ECRS and also
> requires a two-pass encoding (when using convergent encryption).
> 
> Nevertheless I believe this is a very important feature that maybe
> results in a better privacy/complexity/availability trade-off as alluded
> to in a previous thread
> (https://lists.gnu.org/archive/html/gnunet-developers/2020-05/msg00015.html).
> 
> 
> 
> ** Block size
> 
> Block size is chosen to be 4kB. This an optimization towards small
> content (short messages and social interactions).
> 
> 
> ** URN
> 
> Encoded content can be referred to by a URN making it usable from
> existing Web (and RDF) settings. This could be added to ECRS.
> 
> 
> ** No namespacing / keyword search
> 
> There are currently no SBlock or KBlock like features. The idea is that
> these features can be built on-top of the base encoding (including
> SBlock and KBlock).
> 
> 
> 
> We have a little JavaScript demo:
> https://openengiadina.gitlab.io/js-eris/ . As well as implementation in
> Guile [2].
> 
> I'd be very happy for your insight and feedback.
> 
> Thanks!
> 
> -pukkamustard
> 
> 
> [1] https://openengiadina.net/papers/content-addressable-rdf.html
> [2] https://gitlab.com/openengiadina/data-model/
> 
> 

Attachment: 0x939E6BE1E29FC3CC.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]