[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] Idea for file storage in GNUnet

From: hypothesys
Subject: Re: [GNUnet-developers] Idea for file storage in GNUnet
Date: Fri, 7 Dec 2012 07:40:03 -0800 (PST)

Ah, I see (or I think I do..). Bandwidth would still be the most limitative
indeed, I am just pointing out that having this feature could perhaps
simplify other, higher impact, tasks such as routing by diminishing the
overall network data traffic for each data request and download.

Correct me if it is not so, but IF a node is not at quota limit (and
admittedly dependent on bandwidth), GNUnet + gnunet-str could be used to
enforce that this storage quota would be as high as it can possibly be to
the available bandwidth.

Allow me to clarify what I meant by "normal/minimum threshold/allocate data
storage"; I meant the dedicated data storage model currently in use by
GNUnet; the dynamic storage would be made available in addition to that one,
named for discussion purposes "dynamic/maximum threshold/non-dedicated data
storage", which would maximize data storage for example as suggested by LRN
to 100-(N=20)% of available disk storage;OR 20 free GB left on disk
whichever happens first; AND Maximum Useful Quota
due to Bandwidth constrains.

This extra dynamic storage could work as normal storage does, but would
possibly be more useful to improve network performance. Such could possibly
be done by:

1) Cacheing "hot"/popular data blocks - Improving performance and
diminishing latency by reducing the necessary number of hops data must
travel when asked for.

2) Cacheing "cold"/less popular data blocks - Still slightly improving
performance and diminishing latency by reducing the necessary number of hops
data must travel when asked for, but mostly improving censorship-resistance
and preserving data diversity as by this "data flooding" effect an attacker
would have to take out many more GNUnet nodes at the same time to be able to
remove data from it. If unsuccessful, the network would just self-reorganize
the data on storage preserving the files.

3) A perhaps automatically and configurable  mixture of 1) "hot" blocks and)
2) "cold" blocks giving the network the desired final balance of
performance/censorship resistance.

I hope I have made valid points, but even if not, both of you seam to find
at least some value on the presented idea :). If you wish, then please do
add it to the todo list as I do not know how to do so. Hope you found my
contribution useful :)



Christian Grothoff-5 wrote:
> On 12/07/2012 01:18 PM, hypothesys wrote:
>> Dear LRN and Christian,
>> Thank you for your replies :). Regarding latency, peers running close to
>> quota and cached blocks I may, of course, not be understanding something
>> but
>> still believe this could work.
>> First of all an ignorance-derived question regarding LRN point that the
>> probability of finding a block of data in a random node not sharing it on
>> purpose would be small. Would this probability not also be additive (or
>> increase exponentially?) when the said random node, after checking
>> against
>> the local data block index and not find anything, relayed on the data
>> request? Assuming the data distribution in the network is not too
>> non-uniform/asymmetrical I do not see why that would not be the case.
> Right, this is not an issue.  A bigger issue might be that having more 
> data at a node that already has gigabytes might not help, as the node 
> may not have a problem with finding answers for requests on its disk but 
> rather with its ability to transmit the results (due to bandwidth 
> limitations).  In any case, IF a node is at the quota limit, having more 
> disk space available somehow will obviously benefit performance  to some 
> degree.
>> Also, privacy issues derived from prioritizing downloading and publishing
>> blocks to the "normal/minimum threshold/allocated" data storage: Why
>> would
>> this be necessary?
> I'm not sure I understand what you mean by the normal/minimum 
> threshold/allocate data storage.
>> GNUnet storage could operate as normal and this new
>> "dynamic/maximum threshold" storage serve not only to cache "hot" and
>> popular blocks, but also use a percentage of the dynamic storage to cache
>> copies of other nodes "normal/minimum threshold/allocated" storage. In
>> this
>> way both latency AND censorship-resistance would improve. It would
>> probably
>> need a mechanism to re-organize/swap files in normal storage depending on
>> the priority/distribution of files throughout the network though.
> I agree that GNUnet storage should be able to operate as normal with 
> this more 'dynamic' quota.  Only details like Bloomfilter (re)size(ing) 
> and actual resizing of the database would need to be looked at more 
> closely, not the actual mechanisms for space allocation.
>> I may be making a gross oversimplification here but it feels as if this
>> increased dynamic storage would add "capacity/ability/versatility" to
>> GNUnet, which could in turn, depending on the implementation, be used to
>> boost one, or several at the same time, feature(s) of GNUnet.
> There are many ways to 'boost' features of GNUnet; right now, maximizing 
> available disk space simply seems much less important to me compared to 
> known issues with bandwidth management (keyword: ATS), 
> ease-of-use/installation, routing (keyword: mesh) & various bugs.
> But I'm not opposed to adding this one to the list; just be aware that 
> we already have about 100 items on that list ;-).
> Happy hacking!
> Christian
>> Once again if I am wrong please say so. Not from the field ;)
>> Cheers,
>> hypothesys
>> Christian Grothoff-6 wrote:
>>> On 12/07/2012 08:34 AM, LRN wrote:
>>>> And "spare" is the problem. I can easily spare 20 or 40 gigabytes, but
>>>> 100 or 200 is somewhat trickier. I might have that kind of space now,
>>>> and be willing to give it to GNUnet, but i might want that space back
>>>> at some point. Not sure what GNUnet will do right now, if i shut down
>>>> my node, reduce the datastore size, then start the node up again.
>>>> Probably discard lowest-priority blocks until datastore shrinks to the
>>>> new limit?
>>> Yes, that's what it would do.  However, there is a caveat: the
>>> mysql/sqlite/postgres database that is involved might be happy to delete
>>> the records, but might not automatically reduce its file system space
>>> consumption.  So you may have to additionally trigger some
>>> database-specific routine to force the database to defragment/relinquish
>>> its allocation/garbage collect/whatever.
>>> Doing this may (temporarily) double your space requirements, depending
>>> on the database.  So this is an "implementation detail" that would make
>>> an automatic 'shrink if disk is full' implementation somewhat harder
>>> (but likely not impossible, as you can predict the necessary space for
>>> the reorganization).  Alternatively, one _may_ be able to use multiple
>>> database files and just delete one of those entirely once the quota is
>>> reached (this depends on the database backend that is being used).
>>>> Now, having a minimum space allocated to the datastore, and then just
>>>> using N% of the remaining free disk space for for datastore too, while
>>>> it's available - that really makes the decision easier. If GNUnet is
>>>> then taught to use pre-allocated datastore for important blocks (files
>>>> being downloaded or published; what are privacy issues here?), that
>>>> would mean that your node will serve _your_ interests first, and will
>>>> use the free space available to serve the network as best as it can.
>>> I don't think there is a problem here.  We already have routines to
>>> shrink-to-quota which are triggered if we are above quota (due to
>>> additional insertions or due to quota being lowered).
>>>> It should maintain either F% of space free, or G gigabytes (whichever
>>>> is larger). Obviously, F and G are configurable (i.d say - default F
>>>> to 20, and G to 20; unless GNUnet daemon that would reclaim free space
>>>> would be a slowpoke, 20 gigabytes should give it enough time to react).
>>>> It should also be completely disabled for SSDs, IMO. Because they are
>>>> small to begin with, _and_ because their performance degrades greatly
>>>> as they are filled with data.
>>> I suspect those arguments may not hold for long as SSD technology
>>> progresses...
>>>> Thus the idea is the same as with CPU resources - you set up low and
>>>> high thresholds for CPU load that GNUnet can cause. It will go as high
>>>> as the high threshold when uncontested, and will go down to the low
>>>> threshold when other processes compete for CPU resources with GNUnet.
>>>> Same for storage - use large portion of available free space for
>>>> datastore (primarily - for migrated and cached blocks), but be ready
>>>> to discard all that, and go as low as the size of the pre-allocated
>>>> datastore.
>>> Well, LRN, if you think peers actually run close to quota, there is a
>>> nice GNUNET_UTIL-call for starters: GNUNET_DISK_get_blocks_available.
>>> Adjusting the quota option in datastore based on that should not be too
>>> hard for you; the real bitch will be testing the various backends to
>>> make sure that they actually reduce disk space consumption --- and I
>>> guess reliably finding out which partition MySQL/Postgres actually store
>>> their data on might also be not so easy...
>>> Happy hacking! ;-).
>>> -Christian
>>> _______________________________________________
>>> GNUnet-developers mailing list
>>> address@hidden
> _______________________________________________
> GNUnet-developers mailing list
> address@hidden

View this message in context:
Sent from the GnuNet - Dev mailing list archive at

reply via email to

[Prev in Thread] Current Thread [Next in Thread]