Re: [GNUnet-developers] Idea for file storage in GNUnet

gnunet-developers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] Idea for file storage in GNUnet

From:	Christian Grothoff
Subject:	Re: [GNUnet-developers] Idea for file storage in GNUnet
Date:	Thu, 06 Dec 2012 23:29:28 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121028 Thunderbird/16.0.2

Dear hypothesis,

Thank you for your suggestion. Let me first describe how I understoodyour idea. Basically, the idea is that GNUnet's file storage should notoccupy disk space, but leave it marked in the OS file system as "free"(presumably because of redundancy, loss is not an issue). Then, when thedata is needed, GNUnet simply should check if the checksum is stillcorrect, and if so, serve it. That way, we could push drive utilizationto 100% without the user even noticing.

Let me point out a few differences between your perception of the issueswith this and how I see them. First of all, GNUnet already splits filesinto blocks for storage, and the blocks are encrypted andself-verifying, so we'd not even need to store a separate checksum. Allwe would still need is an index which would allow us to quickly find theoffset on the disk that had the block (scanning a multi-TB disk for eachrequest is infeasible). That index can still be big (say 5-10% of theactual storage space) and would have to live in "reliable" storage (youdon't want a corrupt index), but we actually already have theinfrastructure for this in place (see OnDemandBlocks in the code, whichare essentially an index into an existing 'normal' file on disk). Sothat part is IMO *easier* than you might have thought.

But there is another part, and that is getting to the data. Writing afile and deleting it is easy, and your assessment that OSes don't reallydelete it holds true in 99.9% of the cases. However, getting thephysical offset on disk while the file exists is already virtuallyimpossible if you're not inside the kernel. You can get the inode numberand device ID, but that's often just the meta-data and already not veryportable. Getting the physical disk offset or the logical partitionoffset -- I'm not aware of ANY API to get that.

Even if we were able to get that offset (i.e. by having a custom kernelmodule), we'd then need really, really sensitive access (/dev/sdaX) tothe raw disk to read it later. That is again not something asecurity-sensitive application should take lightly. Now, I guess a SUIDwrapper to read at certain offsets if and only if the data stored therematches a certain hash _might_ be doable, but it is still a pretty toughproposal to get past a security audit (as, for example, an adversarymight just want to do a confirmation attack on an unrelated file).

Finally, once you deleted the file, you want to somehow make sure thatthe OS doesn't re-use this space first again. But that is actuallyquite likely to be the case, so the moment you write your 2nd file thisway, you are somewhat likely to overwrite your first file. So what youwould really want is a way to enumerate all of the unused blocks ondisk, and then directly write there (instead of using the indirect routeof first writing a normal file and then deleting it to make the spaceappear unused). That would require detailed knowledge of the specificfile-system, and would again require OS-level (and file-system specific)extensions to the system.

Given this, I don't think there is a chance to create an implementationthat has a chance of being used in the real world.

Now, there is a second possibility --- just use "normal" files, and thenif you notice that the disk starts to get full, delete them. The maindifference would be that the user would see that the disk is full(df...), and that the file-system would likely fragment more. If theprocess that watches the fullness of the disk is done well, the effectfor the end-user would still be otherwise the same. That is most likelymuch easier to implement and deploy.

Finally, a bigger question in my mind is if available disk space isreally generally the issue. For me, bandwidth, latency, seek-speed andCPU usage have been concerns, the disk is pretty much the only resourcethat is virtually unlimited --- it would take months of download timeover my Internet connection to fill my drive, and years to upload it(stupid DSL). So I'm afraid that while I think something could be donehere, I'm not sure it makes sense to prioritize this.


Happy hacking!

Christian

On 12/06/2012 10:03 PM, hypothesys wrote:

Hello GNUnet Developers,

First of all I apologize if this is not the correct place for discussing a
possible new feature to GNUnet and since I am not from the IT field I cannot
even attempt to implement it. Still, perhaps if you find this feature
valuable you would consider implementing it so I wanted to share it. Please
bear in mind that I am no expert and this may not be feasible for technical
reasons not obvious to me. In that case please say so and I will not take
more of your time.

Some time ago I had the idea that gnunet (as well as other projects) could
benefit from increased disk space for storage and that using the free space
on disk should be a technically possible if difficult task.

On many OS filesystems, when a file is deleted, it is not truly erased, in
the FAT filesystem for example, the list of disk clusters occupied by the
file be erased from the file allocation table marking those sectors
available. On other filesystems I do not know how that is handled but, for
the sake of argument let's say that a header is instead applied to the file
indicating that the file portion of the hard disk is available to be
overwritten.

/header/ data block Nº1; /header/ data block Nº2; /header/ data block
Nº3;...

If gnunet was able to split the file data into data blocks (encrypted of
course) and subsequently delete the data, while keeping both a checksum for
the data block and record of its disk location, the free disk space of
computers on which gnunet was installed could be used for storage without
compromising normal functioning of said computer.

This program, perhaps to be named gnunet-str (storage) would at the moment
of storage of data, create a checksum for every encrypted data block and for
every "contiguous" data group, as follows:

/block1/block2/block3/block4/block5/block6/block7/block8...
=>checksum1/checksum2/checksum3/checksum4/...

but also

/block1/block2/block3/block4/block5/block6/block7/block8...=>checksum1+2/checksum3+4/checksum5+6/checksum7+8...

and also

/block1/block2/block3/block4/block5/block6/block7/block8...=>checksum1+2+3+4/checksum5+6+7+8/checksum9+10+11+12...

and continuing...

In this way, it would be possible to (quickly? - by going from the checksums
for the agglomerations of blocks to the individual blocks) ascertain which
data was corrupt (by usage of the main OS, or a disk defrag) and had to be
replaced. It would then signal to other GNUnet nodes "Of the data stored
only 70% (for example) is still not corrupted. I can share this 70% but give
me the 30% back, or new files to store in this space".

Such a solution would allow big amounts of storage - in theory, if all free
space in the the hard drive of host computer. Due to its nature it would not
be possible to rely on the data not being compromised without implementing
redundancy. If this gnunet-str made x copies of file y for example, the
probability of data corruption and loss could be greatly diminished.
Tahoe-Lafs and gnunet are based on this principle (although I could be wrong
as I'm no expert), redundancy of storage between multiple peers on the net.
If this redundancy could also be implemented locally, the total storage for
GNUnet would increase.

Alternatively to providing a greater amount of data storage, perhaps such a
feature could instead be used to boost GNUnet's efficiency as parts of a
file on a distant node could also be made available on more nodes
diminishing the distance between the "asking node" and the node who actually
has the file.

Do you think such a feature could be useful for GNUnet? Once again do not
hesitate to say this idea is unfeasible for some reason, I just shared it in
the hopes of it being useful to an improved gnunet.

-- hypothesys

[Prev in Thread]

Current Thread

[Next in Thread]

[GNUnet-developers] Idea for file storage in GNUnet, hypothesys, 2012/12/06
- Re: [GNUnet-developers] Idea for file storage in GNUnet, Christian Grothoff <=
  - Re: [GNUnet-developers] Idea for file storage in GNUnet, hypothesys, 2012/12/06
    - Re: [GNUnet-developers] Idea for file storage in GNUnet, LRN, 2012/12/07
    - Re: [GNUnet-developers] Idea for file storage in GNUnet, Christian Grothoff, 2012/12/07
    - Re: [GNUnet-developers] Idea for file storage in GNUnet, hypothesys, 2012/12/07
    - Re: [GNUnet-developers] Idea for file storage in GNUnet, Christian Grothoff, 2012/12/07
    - Re: [GNUnet-developers] Idea for file storage in GNUnet, hypothesys, 2012/12/07
    - Re: [GNUnet-developers] Idea for file storage in GNUnet, Christian Grothoff, 2012/12/07
  - Re: [GNUnet-developers] Idea for file storage in GNUnet, Milan Bouchet-Valat, 2012/12/08
    - Re: [GNUnet-developers] Idea for file storage in GNUnet, Christian Grothoff, 2012/12/08

Prev by Date: [GNUnet-developers] Idea for file storage in GNUnet
Next by Date: Re: [GNUnet-developers] Idea for file storage in GNUnet
Previous by thread: [GNUnet-developers] Idea for file storage in GNUnet
Next by thread: Re: [GNUnet-developers] Idea for file storage in GNUnet
Index(es):
- Date
- Thread