gnunet-developers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] Replacing lookup


From: Christian Grothoff
Subject: Re: [GNUnet-developers] Replacing lookup
Date: Mon, 10 Mar 2003 18:54:38 -0500
User-agent: KMail/1.4.3

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Monday 10 March 2003 12:14 pm, Igor Wronsky wrote:
> A rant follows.

I've taken the liberty to summarize instead of quoting in full. :-)

> << GDBM sucks in performance for our setting>>
> << We need insert, delete, traverse, repair/check >>
> << Jan Marco's argument to use MySQL sounds
>       more and more reasonable with every hour spent
>       on re-inventing the Database-wheel >>

So far, I have to agree. Also we should learn a bit from Freenet here. Their 
home-grown (!) datastore was troublesome for Freenet as far as I've heard. So 
maybe the problem is truely bigger than what we thought. And even if we're 
doing better than Freenet did by using a well-tested library (gdbm/tdb) to 
start with, I'm happy to consider going to MySQL at this point. MySQL is 
GPLed, so this is no problem either.

> << The requirement to use MySQL may be too much
          for some users>>

Well, gdbm/tdb, gnunet-check, long pause times, etc. are also going to be a 
problem, maybe even a bigger one. Installing the mysql binary should not be a 
real problem unless we talk about embedded devices. Also, we may keep the 
interface in a way that someone who really can't use mysql can plugin another 
backend for the datastore. As long as we make it trivial to use 
(auto-inistialization of the DB, user permissions, etc.), I think MySQL will 
not really stop anyone from using GNUnet. 

> The interfaces would be loosely as follows. Internally,
> small-scale db modules like gdbm could still use the old lookup.c
> or whatnot but it must *not* be called from outside these functions.

Exactly. 

> insert(key,data,prio,indexinfo)
>   insert key to db as entry {key, data, prio}. If data=NULL,
>   just index with indexinfo (file#,offset).
>
> fetch(key,delta)
>   return {data}, increase prio related to this entry by delta

Implementations of this method will have to rely on some helper code to fetch 
on-demand encoded content. This way, the caller does not have to sort out all 
of these cases. And the bloom-filter should be applied before calling this 
method (since we need to distinguish between super and normal queries). 

> fetchrandom()
>   return a random key + data (for activemigration), don't
>   increase prio
>
> free(m)
>   delete those m entries from db that have smallest prio

Maybe not strictly "the smallest" but "a very low" prio. 

> It seems to me that basically these four suffice for gnunet's
> current requirements. The main point is to get rid of the lookup
> calls findEntry/addEntry/delEntry by including the necessary
> stuff in parameters of these calls. We might include something
> like forAllDatabaseEntries, no problem, but the whole must be split
> in {bloomfilters,database} dichotomy. Database can be seen as
> a black box, a module like gdbm or such can internally use
> whatever lookup kludges it likes.
>
> Comments?

Bloom filters. Can I assume that you'll leave those "out of the picture" for 
the database module and that you just assume that some other piece of code 
will use them to reduce the DB accesses? If so, I'll probably look into 
changing the BF code to "grow" if needed (e.g. if we have more than 8 GB of 
content [user changes quota]). For that, we would need an iterator "over all 
entries" in the DB. 

> I've run some tests on MySQL w/ 370000 entries along with their
> 1k data and randomizing some (colliding too!) priorities for them
> (you might be interested that in this case the db takes 417 megs,
> so there is slight overhead). I won't quote the worst case performance
> capabilities of MySQL for different ops (typically no more than log(n),
> though), but just state the intuitive feeling: insert/retrieve is 'about
> instantenous' and free(m) is linear in the number of deleted keys m
> (not linear in number of keys in db), thus deleting e.g. 1 meg of
> worst content goes in a jiffy. fetchrandom() can't be implemented
> straightforwardly but yet efficiently (afaik?), but we could use the
> following kludge: for db of n keys uniformly distributed, we can
> get - by simple probability calculation on n and 256 - the upper
> bound for size k of a prefix that still gives us expected number
> of results 1 (e.g. 'abc%' is a 3 char prefix, and can be efficiently
> queried for). We just randomize a k byte prefix, and lookup with it
> and take a random one of the result set. If the set is empty,
> decrement k and redo. If the db is non-empty, this procedure is
> guaranteed always to return a key, and most of the time with
> just a single query.
>
> One thing that is certain is that free(m) and traverses will
> be insanely cheaper than using lookup+gdbm or plain gdbm,
> even if the actual fetch/insert were more costly or equal.
>
> More comments? Ideas? This is mainly a interface design
> question. With a good interface and little overhead outside it,
> rather nice solutions with highlevel dbmgrs should be obtainable,
> but still allowing the hack3rs to develop lowlevel solutions.

I think this all sounds fairly nice and it seems like you have already looked 
(and benchmarked) some of the details, so I think I'll leave it in your 
capable hands -- Mantis has a sufficiently long buglist for me already :-)

Christian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE+bSW+9tNtMeXQLkIRAgqXAJ9Lpf2OxhtmXjTwL+NZhzHrl+QtgwCfeEMn
TVTJvoeiedmXX372RTU/hFQ=
=GNls
-----END PGP SIGNATURE-----





reply via email to

[Prev in Thread] Current Thread [Next in Thread]