[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] Idea for file storage in GNUnet

From: Milan Bouchet-Valat
Subject: Re: [GNUnet-developers] Idea for file storage in GNUnet
Date: Sat, 08 Dec 2012 16:16:04 +0100

Le jeudi 06 décembre 2012 à 23:29 +0100, Christian Grothoff a écrit :
> Dear hypothesis,
> Thank you for your suggestion.  Let me first describe how I understood 
> your idea.  Basically, the idea is that GNUnet's file storage should not 
> occupy disk space, but leave it marked in the OS file system as "free" 
> (presumably because of redundancy, loss is not an issue). Then, when the 
> data is needed, GNUnet simply should check if the checksum is still 
> correct, and if so, serve it. That way, we could push drive utilization 
> to 100% without the user even noticing.
> Let me point out a few differences between your perception of the issues 
> with this and how I see them.  First of all, GNUnet already splits files 
> into blocks for storage, and the blocks are encrypted and 
> self-verifying, so we'd not even need to store a separate checksum. All 
> we would still need is an index which would allow us to quickly find the 
> offset on the disk that had the block (scanning a multi-TB disk for each 
> request is infeasible).  That index can still be big (say 5-10% of the 
> actual storage space) and would have to live in "reliable" storage (you 
> don't want a corrupt index), but we actually already have the 
> infrastructure for this in place (see OnDemandBlocks in the code, which 
> are essentially an index into an existing 'normal' file on disk).  So 
> that part is IMO *easier* than you might have thought.
> But there is another part, and that is getting to the data. Writing a 
> file and deleting it is easy, and your assessment that OSes don't really 
> delete it holds true in 99.9% of the cases. However, getting the 
> physical offset on disk while the file exists is already virtually 
> impossible if you're not inside the kernel. You can get the inode number 
> and device ID, but that's often just the meta-data and already not very 
> portable.  Getting the physical disk offset or the logical partition 
> offset -- I'm not aware of ANY API to get that.
> Even if we were able to get that offset (i.e. by having a custom kernel 
> module), we'd then need really, really sensitive access (/dev/sdaX) to 
> the raw disk to read it later. That is again not something a 
> security-sensitive application should take lightly. Now, I guess a SUID 
> wrapper to read at certain offsets if and only if the data stored there 
> matches a certain hash _might_ be doable, but it is still a pretty tough 
> proposal to get past a security audit (as, for example, an adversary 
> might just want to do a confirmation attack on an unrelated file).
> Finally, once you deleted the file, you want to somehow make sure that 
> the OS doesn't re-use this space first again.  But that is actually 
> quite likely to be the case, so the moment you write your 2nd file this 
> way, you are somewhat likely to overwrite your first file.  So what you 
> would really want is a way to enumerate all of the unused blocks on 
> disk, and then directly write there (instead of using the indirect route 
> of first writing a normal file and then deleting it to make the space 
> appear unused). That would require detailed knowledge of the specific 
> file-system, and would again require OS-level (and file-system specific) 
> extensions to the system.
> Given this, I don't think there is a chance to create an implementation 
> that has a chance of being used in the real world.
> Now, there is a second possibility --- just use "normal" files, and then 
> if you notice that the disk starts to get full, delete them.  The main 
> difference would be that the user would see that the disk is full 
> (df...), and that the file-system would likely fragment more.  If the 
> process that watches the fullness of the disk is done well, the effect 
> for the end-user would still be otherwise the same.  That is most likely 
> much easier to implement and deploy.
A new feature that is currently being discussed for inclusion in the
Linux kernel is volatile ranges. The idea is that applications can mark
mmap()ed memory ranges and their associated on-disk storage as low
priority, meaning they will be discarded when the system comes under
pressure. This would mean GNUnet could have, in addition to the standard
permanent datastore, an optional storage that would grow as much as the
kernel allows it to. When space is needed, the memory pages and their
files would be automatically dropped, and the GNUnet process would be

The whole thing is not set in stone AFAIK, but that might be of
interest. See:


reply via email to

[Prev in Thread] Current Thread [Next in Thread]