[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] Idea for file storage in GNUnet

From: hypothesys
Subject: Re: [GNUnet-developers] Idea for file storage in GNUnet
Date: Thu, 6 Dec 2012 19:28:25 -0800 (PST)

Dear Christian,

First of all thank you for your reply. Yes, in a nutshell that was the idea.
Although I did not know what was necessary to accomplish it, it undoubtedly
would be a difficult task (now I see its difficulty borders on the
impossible). As for the possibility you proposed of using "normal" files and
a mechanism to remove them when the disk became near full, it seems a much
better idea. Not only it is much more simple to implement but also makes
better use of the disk space as redundancy would not be necessary.

Such a program could also (by reading the GNUnet index) display the GNUnet
utilized disk space and, by a simple subtraction, the disk space in use for
the rest of the system to the user. Disk fragmentation could be a issue as
it makes for a slower user experience, but if both the current and this
alternative method of storage were available for the user to choose from IMO
GNUnet could still benefit.

I understand (or at least believe I do) your point about the available
storage not being a priority as it is not limitative, however could this
increased storage not make up for a potencially lower latency for the
network? After all, if the there is more storage the same data could be made
available from a greater number of nodes. As such the number of network hops
a single data block has to travel between the "asking node" and the "data
provider node" diminishes, and so does latency at least when in darknet

I do not know if this would be impossible due to the way GNUnet routes data,
or a misconception on my reasoning, but I assume GNUnet must use some
implementation of Key-based routing and DHT. Perhaps this lower latency
could pose problems from the anonymity POV, and I cannot predict the
security implications, still it would not mean that the smallest route had
to be taken, only diminishing the minimum number of hops necessary to
transfer data. Probably it would mean small-world networks world would
become smaller proportionately to the total amount of available storage at
each node.

It is probable that some of the above points is wrong as once again this is
not my scientific field, but regardless of that, my point is that increased
data storage can have implications unrelated to storage if GNUnet routing
algorithms can make use of them. If I am wrong, once again please don't
hesitate to say so.



Christian Grothoff wrote:
> Dear hypothesis,
> Thank you for your suggestion.  Let me first describe how I understood 
> your idea.  Basically, the idea is that GNUnet's file storage should not 
> occupy disk space, but leave it marked in the OS file system as "free" 
> (presumably because of redundancy, loss is not an issue). Then, when the 
> data is needed, GNUnet simply should check if the checksum is still 
> correct, and if so, serve it. That way, we could push drive utilization 
> to 100% without the user even noticing.
> Let me point out a few differences between your perception of the issues 
> with this and how I see them.  First of all, GNUnet already splits files 
> into blocks for storage, and the blocks are encrypted and 
> self-verifying, so we'd not even need to store a separate checksum. All 
> we would still need is an index which would allow us to quickly find the 
> offset on the disk that had the block (scanning a multi-TB disk for each 
> request is infeasible).  That index can still be big (say 5-10% of the 
> actual storage space) and would have to live in "reliable" storage (you 
> don't want a corrupt index), but we actually already have the 
> infrastructure for this in place (see OnDemandBlocks in the code, which 
> are essentially an index into an existing 'normal' file on disk).  So 
> that part is IMO *easier* than you might have thought.
> But there is another part, and that is getting to the data. Writing a 
> file and deleting it is easy, and your assessment that OSes don't really 
> delete it holds true in 99.9% of the cases. However, getting the 
> physical offset on disk while the file exists is already virtually 
> impossible if you're not inside the kernel. You can get the inode number 
> and device ID, but that's often just the meta-data and already not very 
> portable.  Getting the physical disk offset or the logical partition 
> offset -- I'm not aware of ANY API to get that.
> Even if we were able to get that offset (i.e. by having a custom kernel 
> module), we'd then need really, really sensitive access (/dev/sdaX) to 
> the raw disk to read it later. That is again not something a 
> security-sensitive application should take lightly. Now, I guess a SUID 
> wrapper to read at certain offsets if and only if the data stored there 
> matches a certain hash _might_ be doable, but it is still a pretty tough 
> proposal to get past a security audit (as, for example, an adversary 
> might just want to do a confirmation attack on an unrelated file).
> Finally, once you deleted the file, you want to somehow make sure that 
> the OS doesn't re-use this space first again.  But that is actually 
> quite likely to be the case, so the moment you write your 2nd file this 
> way, you are somewhat likely to overwrite your first file.  So what you 
> would really want is a way to enumerate all of the unused blocks on 
> disk, and then directly write there (instead of using the indirect route 
> of first writing a normal file and then deleting it to make the space 
> appear unused). That would require detailed knowledge of the specific 
> file-system, and would again require OS-level (and file-system specific) 
> extensions to the system.
> Given this, I don't think there is a chance to create an implementation 
> that has a chance of being used in the real world.
> Now, there is a second possibility --- just use "normal" files, and then 
> if you notice that the disk starts to get full, delete them.  The main 
> difference would be that the user would see that the disk is full 
> (df...), and that the file-system would likely fragment more.  If the 
> process that watches the fullness of the disk is done well, the effect 
> for the end-user would still be otherwise the same.  That is most likely 
> much easier to implement and deploy.
> Finally, a bigger question in my mind is if available disk space is 
> really generally the issue.  For me, bandwidth, latency, seek-speed and 
> CPU usage have been concerns, the disk is pretty much the only resource 
> that is virtually unlimited --- it would take months of download time 
> over my Internet connection to fill my drive, and years to upload it 
> (stupid DSL).  So I'm afraid that while I think something could be done 
> here, I'm not sure it makes sense to prioritize this.
> Happy hacking!
> Christian
> On 12/06/2012 10:03 PM, hypothesys wrote:
>> Hello GNUnet Developers,
>> First of all I apologize if this is not the correct place for discussing
>> a
>> possible new feature to GNUnet and since I am not from the IT field I
>> cannot
>> even attempt to implement it. Still, perhaps if you find this feature
>> valuable you would consider implementing it so I wanted to share it.
>> Please
>> bear in mind that I am no expert and this may not be feasible for
>> technical
>> reasons not obvious to me. In that case please say so and I will not take
>> more of your time.
>> Some time ago I had the idea that gnunet (as well as other projects)
>> could
>> benefit from increased disk space for storage and that using the free
>> space
>> on disk should be a technically possible if difficult task.
>> On many OS filesystems, when a file is deleted, it is not truly erased,
>> in
>> the FAT filesystem for example, the list of disk clusters occupied by the
>> file be erased from the file allocation table marking those sectors
>> available. On other filesystems I do not know how that is handled but,
>> for
>> the sake of argument let's say that a header is instead applied to the
>> file
>> indicating that the file portion of the hard disk is available to be
>> overwritten.
>> /header/ data block Nº1; /header/ data block Nº2; /header/ data block
>> Nº3;...
>> If gnunet was able to split the file data into data blocks (encrypted of
>> course) and subsequently delete the data, while keeping both a checksum
>> for
>> the data block and record of its disk location, the free disk space of
>> computers on which gnunet was installed could be used for storage without
>> compromising normal functioning of said computer.
>> This program, perhaps to be named gnunet-str (storage) would at the
>> moment
>> of storage of data, create a checksum for every encrypted data block and
>> for
>> every "contiguous" data group, as follows:
>> /block1/block2/block3/block4/block5/block6/block7/block8...
>> =>checksum1/checksum2/checksum3/checksum4/...
>> but also
>> /block1/block2/block3/block4/block5/block6/block7/block8...=>checksum1+2/checksum3+4/checksum5+6/checksum7+8...
>> and also
>> /block1/block2/block3/block4/block5/block6/block7/block8...=>checksum1+2+3+4/checksum5+6+7+8/checksum9+10+11+12...
>> and continuing...
>> In this way, it would be possible to (quickly? - by going from the
>> checksums
>> for the agglomerations of blocks to the individual blocks) ascertain
>> which
>> data was corrupt (by usage of the main OS, or a disk defrag) and had to
>> be
>> replaced. It would then signal to other GNUnet nodes "Of the data stored
>> only 70% (for example) is still not corrupted. I can share this 70% but
>> give
>> me the 30% back, or new files to store in this space".
>> Such a solution would allow big amounts of storage - in theory, if all
>> free
>> space in the the hard drive of host computer. Due to its nature it would
>> not
>> be possible to rely on the data not being compromised without
>> implementing
>> redundancy. If this gnunet-str made x copies of file y for example, the
>> probability of data corruption and loss could be greatly diminished.
>> Tahoe-Lafs and gnunet are based on this principle (although I could be
>> wrong
>> as I'm no expert), redundancy of storage between multiple peers on the
>> net.
>> If this redundancy could also be implemented locally, the total storage
>> for
>> GNUnet would increase.
>> Alternatively to providing a greater amount of data storage, perhaps such
>> a
>> feature could instead be used to boost GNUnet's efficiency as parts of a
>> file on a distant node could also be made available on more nodes
>> diminishing the distance between the "asking node" and the node who
>> actually
>> has the file.
>> Do you think such a feature could be useful for GNUnet? Once again do not
>> hesitate to say this idea is unfeasible for some reason, I just shared it
>> in
>> the hopes of it being useful to an improved gnunet.
>> -- hypothesys
> _______________________________________________
> GNUnet-developers mailing list
> address@hidden

View this message in context:
Sent from the GnuNet - Dev mailing list archive at

reply via email to

[Prev in Thread] Current Thread [Next in Thread]