[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] Distributed Keyword Searching

From: Christian Grothoff
Subject: Re: [GNUnet-developers] Distributed Keyword Searching
Date: Tue, 15 Jul 2003 04:45:50 +0200

Having a non-encrypted identifier portion in the RBlock introduces security 
problems since malicious peers can tamper with them. Worse, you do not 
specify how you would union the RBlocks over the network (I may have file X 
under keyword A and you have file X under keyword B. How do you put these two 
results together such that X does not match "(A OR B) AND NOT (A AND B)"? 
Also, the current approach of searching for A and B individually and putting 
the boolean expressions back together at the client side seem to work nicely. 
"OR" is trivially implemented by doing multiple searches, and "NOT" could be 
implemented theoretically but was not in the current code since we would need 
the ability to "revoke" results (X matches when we get back A, but does no 
longer match when we get back B _later_). I felt that it was more reasonable 
for the user to see all search results instantly and that it would be 
counter-intuitive (and difficult to code) if we had to remove them from the 
screen again.  But "NOT" (and "OR") can certainly be done without touching 
anything but the AFS CLI/GUI code.   I'm just not sure if that would not 
complicate the interface more than what the additional expressiveness would 
be worth.


Am Dienstag, 15. Juli 2003 01:55 schrieb Steven Barker:
> > Well, in my previous bunch of ideas I gave up on distributed keyword
> searching because of the problem of identifying RBlocks encrypted with
> different keywords that point to the same data.  However, I've thought
> of how a way to solve that problem, which is detailed below.
> Anyway, here's the newly complete writeup about distributed searching:
> * Distributed boolean keyword searches.
> It should be possible to search AFS in a distributed manner for any
> combination of keywords using AND, OR and NOT boolean operators.  This
> would let you do a searches like "GNUnet AND 0.5.4a AND NOT rpm" or
> "anime OR (Japan AND animation)".  Of course some searches that would
> obviously return infinate or empty sets should be excluded.
> I don't think think this would be hard to implement.  A boolean Query
> message would simply describe the boolean expression in a compact
> format. One way would be to rearrange the expression so that all the
> NOT operators occur directly on keywords (i.e. "NOT (foo AND bar)"
> would become "NOT foo OR NOT bar").  The expression could then be seen
> as a binary tree, where the leaf nodes are the (possibly negated)
> keywords and all the other nodes are ANDs and ORs.  The representation
> of the expression in the boolean Query would be a pre-order traversal
> of this tree (you could also describe this as putting the expression
> in "reverse-polish" notation: "(a AND b) OR (c AND d)" would become
> "OR AND a b AND c d").  An expression with N (possibly negated)
> keywords would have N leaf nodes (the keywords) and N-1 branch nodes
> (the ANDs and ORs).  If we use a single byte to identify the type of a
> node in our pre-order traversal, we can write an expression with N
> keywords in 22N-1 bytes (2N-1 1-byte node identifiers and N 20-byte
> keyword hashes).  A boolean Query could even use the normal Query
> message type (with upto 47 keywords per query, if my math is right),
> and rely on recievers to notice the unusual (odd number) size to know
> it's a boolean Query.
> Replies to boolean Queries would simply be one 3HASH reply per
> non-negated keyword, per data match.  A node that recieves a Query
> could decide to distribute the work, rather than trying to find the
> results entirely on it's own.  It would simply split off one or more
> subtrees of the boolean expression and send them to neighboring
> nodes.  When results come back, it would check them against the part
> of the tree that it kept and finally send any results that matched
> the whole expression back to the originator.
> The issue that made me skip posting this plan with my previous list
> was how to check if RBlocks encrypted with different keywords pointed
> to the same data.  This can be solved if we allow some RBlocks to have
> an unencrypted "FileIdentifier" field.  This could either be done by
> rearranging the current RBlock header so as to make encrypting just
> part of it possible, or the FileIdentifier could simply be repeated in
> the unencrypted (currently padding) portion of the block.
> I would like to claim that allowing nodes to see what data is pointed
> to by some RBlocks will not make censorship easier.  This is because
> the node still does not know what keyword that RBlock is indexed by.
> A dictionary attack could find RBlocks for certain keys, but that can
> be done to fully encrypted RBlocks already.  In fact, the only
> intersting thing I can imagine an AFS node doing with a semi-encrypted
> RBlock is deciding to drop a reference if the data is points to cannot
> be found for an extended period of time.  I hope someone will point
> out any attacks I'm not seeing.

Content-Type: application/pgp-signature; charset="us-ascii"; name="Anhang: 1"
Content-Transfer-Encoding: 7bit

> > _______________________________________________
> GNUnet-developers mailing list
> address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]