gnunet-developers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] Designing a gnunet directory app


From: Christian Grothoff
Subject: Re: [GNUnet-developers] Designing a gnunet directory app
Date: Fri, 14 Jun 2002 13:06:42 -0500

On Friday 14 June 2002 05:40 am, you wrote:
> I was thinking more on the lines of 'unnecessary content timing out'
> than deletion or replacement. Imagine if someone inserts plain
> crap using a popular keyword. At least the base entry could stay there
> indefinitely, as the queries keep it alive? I don't know a solution
> to this one.

I don't think there is a real solution unless we introduce a feedback 
protocol that allows users to broadcast 'bad content' messages in order to 
get rid of it (moderation). The problem is, that moderation and censorship 
are awfully close *and* that a (malicious?) server may decide to keep the 
content anyway.

> For content which can go stale (lists and such) would it ultimately
> be a bad idea to be able to include an optional existence termination date
> to the data, by the original sender? That way, a (benevolent) inserter
> could supply a date like "this file is valid until 20.jun.2002" style
> stamp, after which any node encountering the data like that would
> just coldly delete it? Or would it be too much overhead? Atleast the
> current system has a pretty small blocksize and if I understood
> correctly,the date would have to be in every block ... Perhaps it would be
> enough to just include the date to the root block, and delete
> that, and the rest would disappear naturally as they are not
> asked for anymore.

Well, this still does not solve the problem of malicious insertion, which is 
really the issue that you want to address. Also note that you can not 
special-case root-blocks -- the design is aimed to make root-blocks and all 
other blocks indistinguishable from each other. This is a good thing since it 
makes attacks (partitioning, etc) harder. 

> > I would rather add an option to 'gnunet-insert-multi' (gnunet-insert
> > *has* all the information that gnunet-search prints) to create a
> > directory when the files are inserted.
>
> Ah. I see where you are aiming at. Instead of some people collecting
> keys posted by others, the original posters would insert directory
> entries at the same time. That is a nice idea, but then its clear that
> the directory system needs some (sub)standardized hierarchy so
> that people can choose between lists different styles of content
> without having to browse through all.

I don't think we need a hierarchy here. Of course, a directory could contain 
other directories, but there is no need to make it a hierarchy - a graph 
(think WWW) is better. And for that, all we need is directories containing 
references to other directories.

> This is because anonymous systems
> encourages people to publish all kinds of vile stuff. Not stopping to
> theorize on the tolerance of different people, let us just assume
> that the majority doesn't want to sort through hundreds of lists
> daily which might contain mostly offensive or uninteresting content.
>
> I think it would be enough to tag the inserted directory
> with some general group classification and have a "global directory"
> and "specific directories". Lets suppose someone posts 102
> pictures of sportscars. This should result in a directory list
> indexed with two keywords, "directory-global-[date]" and
> "directory-[group]-[date]". Searching could then result in like
>
> ---
> # bin/gnunet-search directory-global-14.jun.2002
> 872487347AA324329434ABC81247124823482342 1248248231 2045
> => pictures.cars : 102 entries : 14.jun.2002 <= (filename: ???, mimetype:
> unknown)
> 38923589329AABCB49294BBC3939423943943332 482832122 1386
> => pictures.elephants : 52 entries : 14.jun.2002 <= (filename: ???,
> mimetype: unknown)
> 35929238492384923489238492384182DBD72347 2348238 534
> => punk : 12 entries : 14.jun.2002 <= (filename: ???, mimetype: unknown)
> ---
>
> and in specific context just
>
> ---
> # bin/gnunet-search directory-pictures.cars-14.jun.2002
> 872487347AA324329434ABC81247124823482342 1248248231 2045
> => pictures.cars : 102 entries : 14.jun.2002 <= (filename: ???,
> mimetype: unknown)
> ----

You can always try to establish some standard naming conventions for keywords 
(like in the example above), but I would not make them mandatory -- in 
particular, if keywords are standardized, the deniability in GNUnet is not 
given for those keywords since an adversary can use a guessing attack!

> The point here would be that the global directory would/could
> mainly be used to know what groups are active right now, and
> people with specialized interests could limit their query to the
> directory of a particular group. If there was a more advanced
> gui, it could get the list of currently active groups by
> doing a query, and when inserting files the suitable group
> could be selected from a list. With a command line tool
> it could just be something like
>
> # gnunet-insert-multi -g [group] <files>
>
> What groups actually form could be left to evolution. ;)

I think you are mistaken in terms of how the query mechanism works. What you 
could do with groups is something similar to 'webrings' - whenever you insert 
a file, you add it to a 'group-directory' (which you publish every n files 
under the group-name). 

> The thing with dates is simply to get only recent listings. If
> optional content timeout was implemented, we wouldn't need
> the queries to use dates. How to address adversarial time
> out dates then? ... Another way, the found listings could be
> filtered by the application - eventually though the number
> of results to parse would grow prohibitive. And how would
> the old, but popular content, stay directorized then? Perhaps
> gnunet-download could generate and insert reports of what
> was downloaded succesfully. ;)

Dates are always bad because an adversary can manipulate them and they
can be used in partitioning attacks ("you were online at that time"). The 
problem with 'too many' results can always be addressed by using more obscure 
keywords, so I doubt that really applies.

> All this is ofcourse speculation, or designing. I hope some scheme
> can be considered sufficiently good to be worth implementing. :D
>
> > > Should compression be used on the directories?
> >
> > I would make it an option to the user. Some directories will be too small
> > to yield any big gains - compression was already thought as an option for
> > gnunet-insert, but so far nobody did it and I still believe that the
> > users can do it manually up-front if they really want to. The problem
> > with compression for gnunet-insert is also that it conflicts with the
> > on-demand encoding (indexing vs. insertion!), which would not apply for
> > directories. Anyway, 'tar' can't be wrong, and there, it's an option :-)
>
> In a software like gnunet where the point is not to reinvent
> the wheel (or so i suppose ;) ) the directory compression/decompression
> on the fly could probably be really simply achieved with zlib.

Sure. Or bzip2 - I suspect that the CPU/space trade-off would go 
significantly towards saving space in this case.

> > > Naturally the directory listings should also be machine (eg gui)
> > > readable. The app should be able to retrieve newest up-to-date lists.
> > > Perhaps lists could be signed by sender, using a handle perhaps.
> >
> > Right, we should have some standard format that allows signing with a
> > pseudonym.
>
> This calls for public/private key pair. Where could the public
> key be reliably published? Would it be ok to include a hash
> of the public key to the signed content? The actual key could
> then be retrieved by downloading the hash, after which the
> signature could be checked. Of course now someone could claim to
> be someone else by just supplying his own hash to the message,
> but then it would differ from the previous insertions by
> the original author.

I would just publish the full public key (258 bytes is not that much) with 
the directory. The signature itself is another 256 bytes, so that's 512 bytes 
overhead - ok in my opinion. Otherwise you may be able to obtain the 
directory but fail to get the public key (because of course nobody ever 
checks...).

> > > Of course trivially done that would not be proof against attacks,
> > > but if most people are benevolent, it'd enable 'fan communities'
> > > to follow directories created by some famous persons. ;)
> >
> > I don't see why it should not be done in a safe way - gnunet uses pretty
> > much the strongest practical ciphers for signing certain messages, why
> > shouldn't we do the same here? The time it takes to sign should be
> > insignificant anyway.
>
> I think the question is how complicated the directory system
> should be or needs to be. Is it important enough to the overall
> network? In a way, the ability to search by keywords makes
> the need for directories smaller, but then again it may be
> difficult to get users to index their content intelligently.
> Perhaps the problem is somewhat equivalent (or a bit harder)
> than making the users post their stuff to a suitable group. ;)

I think all we need is an easy way to create directories for people that 
insert a ton of documents and another way for people to populate directories 
with files that they find. And then of course some support for directories in
the client.

> > I'm not sure what you mean with messaging capabilities, but looking into
> > existing designs is definitely the right approach. I don't really know
> > frost, any references?
>
<snip>
> Basically, frost implements messaging and file sharing on
> freenet by creating two kinds of files: message files and
> index files. A header is added to the message and its
> inserted into freenet with a key like
>
> news.[group].[date].[daily_msg_counter]
>
> and frost polls for these keys to read messages posted
> by other users by incrementing daily_msg_counter until
> no more messages can be found. The problem is that
> if two users think daily_msg_counter is eg 5, both
> will insert at 6, and there are good chances that
> the insertions will not meet each other. So different
> people might see different messages and some messages
> can be entirely lost.

I see. Well, this problem sounds like it is inherent in the freenet-approach 
of 'unique keys' (there can only be one item of content matching a key, if 
you want to insert, you first have to check that the content does not yet 
exist in the network). This approach of course fails if the net ever gets 
split into two and has concurrency problems (as described above). 

> The idea of index (directory) files is similar,
>
> idx.[group].[date].[daily_idx_counter]
>
> They contain group-specific keylists. When one or more files
> is inserted by frost, it creates such an index file.

Well, that would be like generating a standardized format for keys -- with 
the advantage that data is easier to find and the disadvantage that it is 
easier to guess what people are searching for (and thus the possibility to 
censor certain queries; say I don't like posting number 421, then I tell my 
node to drop all queries for 421. While the high degree of connection in 
GNUnet will make this attack a lot less effective compared to other networks, 
it's still a problem).

> > Polling? As in repeatedly query or what?
>
> Yes. The above should explain it. If user wants indexes
> or messages, he polls for them, typically max 3 days to the
> past, always starting at daily_idx 0 and increasing until
> no more can be found. If such a system was implemented for
> gnunet, atleast the daily idx counters could be dropped
> because in gnunet keywords are not unique. Also the collision
> problem would not appear. The load to the system would
> probably be quite similar. In freenet it has caused trouble
> (fn developers have implemented a mechanism which puts keys
> to hold for a time if the content wasn't found on last query,
> in order to address the load generated by repeatedly
> polling for nonexisting material).

While I think it is ok to set some 'recommendations' for keywords
(read: you should add a keyword of this form AND use any other
format that you see fit), I think this is a different topic. Directories
are files that contain information about other files (including other 
directories), and that information identifies the file uniquely (see result 
of gnunet-search). Keywords for the search (input of gnunet-search, not 
output!) are a different topic! Let's try to keep these two separate. If you 
want to write an RFC for selecting keywords, write one, if you want to write 
an RFC for directories, fine. But they are 2 different problems.

> > > The thing is though that if some similar app
> > > isn't done intelligently for gnunet, it will be eventually made
> > > brainlessly by a third party and bad things will happen (on freenet
> > > there's now atleast four ways to transmit and find files: private,
> > > not announced anywhere, "freesites", "frost" and "fmb". this
> > > has unnecessarily split the available resources between
> > > incompatible methods of announcement/retrieval/etc) :(
> >
> > Well, I'm definitely for trying to estabish standards :-)
>
> Good. Here's something to specify. ;)
>
> - The hierarchy-or-flat -issue (and what should the result look like?)

I'm all for a graph! You take any (standardized or not) keyword to enter the 
graph and find a directory (the standardized keyword space can be a tree, the 
global keyword-space is naturally flat). From there, you find other 
directories and files -- and you can navigate like in the WWW. 

> - The keyword format for locating listings (w/ dates or nodates?)

I would tend to try to categorize by content, not by date. Dates can be 
misleading. Any other opinions on this one?

> - The actual listing format
>     - hash, crc, size, what else? same stuff as given by gnunet-search?

Description is definitely a good thing to have, what does just the hash tell 
you?

>     - signatures?

Certainly, and I would make it mandatory to keep it simple (you can of course 
make up a new public key for each directory). And include the public key.

>     - compression?

I would say optional and not for the first version, this complicates things 
and makes it harder to debug, too.

> > I would see it as an extention to the gnunet-filesharing library that can
> > then be used via options in the textui-clients or GUIs.
>
> That sounds good to me.


Christian



reply via email to

[Prev in Thread] Current Thread [Next in Thread]