[lsd0004] branch master updated: refactor: new section on Bloom filters

gnunet-svn
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lsd0004] branch master updated: refactor: new section on Bloom filters

From:	gnunet
Subject:	[lsd0004] branch master updated: refactor: new section on Bloom filters
Date:	Fri, 11 Mar 2022 04:58:56 +0100
This is an automated email from the git hooks/post-receive script.

grothoff pushed a commit to branch master
in repository lsd0004.

The following commit(s) were added to refs/heads/master by this push:
     new a8d0df2  refactor: new section on Bloom filters
a8d0df2 is described below

commit a8d0df2b972d5a6d8938286e3758aecf874580ac
Author: Christian Grothoff <grothoff@gnunet.org>
AuthorDate: Fri Mar 11 04:58:50 2022 +0100

    refactor: new section on Bloom filters
---
 draft-schanzen-r5n.xml | 191 ++++++++++++++++++++++++++++---------------------
 1 file changed, 110 insertions(+), 81 deletions(-)

diff --git a/draft-schanzen-r5n.xml b/draft-schanzen-r5n.xml
index d0651b2..f998220 100644
--- a/draft-schanzen-r5n.xml
+++ b/draft-schanzen-r5n.xml
@@ -733,44 +733,87 @@ bchar = *(ALPHA / DIGIT)
         </t>
       </section>
     </section>
+    <section anchor="bloom_filters" numbered="true" toc="default">
+      <name>Bloom Filters</name>
+      <t>
+       R<sup>5</sup>N uses Bloom filters in several places.  This section
+       gives some general background on Bloom filters and defines functions
+       on this data structure shared by the various use-cases in 
R<sup>5</sup>N.
+      </t>
+      <t>
+       FIXME: general intro.
+      </t>
+      <t>
+        Bloom filters are always initially empty, consisting only of zeroes.
+        There are two functions which can be invoked on the Bloom filter:
+        BF-SET(bf, e) and BF-TEST(bf, e) where "e" is an element which is to
+        be added to the Bloom filter or queried against the set.
+        Any Bloom filter uses k=16 different hash functions each of which is
+        defined as follows:
+      </t>
+      <t>
+       FIXME: actually define the opertions...
+      </t>
+    </section>
     <section anchor="routing" numbered="true" toc="default">
       <name>Routing</name>
-      <section anchor="peer_storage">
-        <name>Peer Storage</name>
+      <t>
+        In order to select peers which are suitable destinations for
+        routing messages, R<sup>5</sup>N uses a hybrid approach:
+        Given an estimated network size N, the peer selection for the
+        first N hops is random. After the initial N hops, peer selection
+        follows an XOR-based peer distance calculation.
+      </t>
+      <t>
+        To enable routing, any R<sup>5</sup>N implementation must keep 
+       information about its current set of neighbours.
+        Upon receiving a connection notification from the Underlay through
+        <tt>PEER_CONNECTED</tt>, information on the new neighbour 
+        <bcp14>MUST</bcp14> be added, and similarly when
+        a disconnect is indicated by the Underlay through
+        <tt>PEER_DISCONNECTED</tt> the peer <bcp14>MUST</bcp14> be removed.
+      </t>
+      <t>
+        In order to achieve O(log n) routing performance,
+        the data structure for managing neighbours and their
+        metadata <bcp14>MUST</bcp14> be implemented using the k-buckets 
concept of
+        <xref target="Kademlia"/>  as defined in <xref 
target="routing_table"/>.
+        Maintenance of the routing table (after bootstrapping) is
+        described in <xref target="find_peer"/>.
+      </t>
+      <t>
+        Unlike <xref target="Kademlia"/>, routing decisions in
+        R<sup>5</sup>N are also influenced by a Bloom filter in the message
+        that prevents routing loops. This data structure is discussed in
+       <xref target="routing_bloomfilter"/>.  <xref 
target="routing_functions"/>
+        describes the key functions provided on top of these data structures.
+      </t>
+      <section anchor="routing_table">
+        <name>Routing Table</name>
         <t>
-          A R<sup>5</sup>N implementation must store the information on 
connected peers
-          and update changes accordingly in a local persistance component such
-          as a database.
-          Upon receiving a connection notification from the Underlay through
-          <tt>PEER_CONNECTED</tt>, information on the new peer is added to the
-          local peer storage.
-          When disconnect is indicated by the Underlay through
-          <tt>PEER_DISCONNECTED</tt> the peer <bcp14>MUST</bcp14> be removed 
from the local
-          peer storage.
-          In order to achieve O(log n) routing performance,
-          the data structure for managing connected peers and their
-          metadata <bcp14>MUST</bcp14> be implemented using the k-buckets 
concept of
-          <xref target="Kademlia"/>  as defined in <xref 
target="routing_table"/>.
+          The routing table consists of an array of k-buckets. Each
+          k-bucket contains a list of neighbours.
+          The i-th k-bucket stores neighbours whose 
+         identifiers are between distance 2^i and 2^(i+1) from the local peer.
+          System constraints will typically force an implementation to impose 
some
+          upper limit on the number of neighbours kept per k-bucket.
         </t>
-      </section>
-      <section anchor="routing_bloomfilter">
-        <name>Peer Bloom Filter</name>
         <t>
-          The peer Bloom filter is used to prevent circular routes.
-          Any peer which is forwarding GET or PUT messages
-          (<xref target="p2p_messages"/>) adds its own peer ID to the
-          message Bloom filter.
-          This allows other peers to lookup next hops while excluding already
-          traversed peers (<xref target="routing_table"/>).
+          Implementations <bcp14>SHOULD</bcp14> try to keep at least
+          5 entries per k-bucket.  Embedded systems that cannot manage
+          this number of connections <bcp14>MAY</bcp14> use connection-level
+          signalling to indicate that they are merely a client utilizing a
+          DHT and not able to participate in routing.  DHT peers receiving
+          such connections <bcp14>MUST NOT</bcp14> include connections to
+          such restricted systems in their k-buckets, thereby effectively 
+         excluding them when making routing decisions.
         </t>
         <t>
-          The Bloom filter is a 128-bit field.
-          It is initially empty, consisting only of zeroes.
-          There are two functions which can be invoked on the Bloom filter:
-          BF-SET(bf, e) and BF-TEST(bf, e) where "e" is an element which is to
-          be added to the Bloom filter or queried against the set.
-          Any Bloom filter uses k=16 different hash functions each of which is
-          defined as follows:
+          If a system hits constraints with respect to
+          the number of active connections, an implementation
+          <bcp14>MUST</bcp14> evict peers from those k-buckets with the
+          largest number of neighbours. The eviction strategy 
<bcp14>MUST</bcp14> be
+          to drop the shortest-lived connections first.
         </t>
       </section>
       <section anchor="find_peer">
@@ -778,9 +821,9 @@ bchar = *(ALPHA / DIGIT)
         <t>
           To build its routing table, a peer will send out requests
           asking for blocks of type HELLO using its own location as the key,
-          but filtering its own HELLO via the Bloom filter.
-          <!-- FIXME: CG: I think this last one is not implemented! -->
-          <!-- FIXME: CG: I also suspect we need to review how the block API 
filters HELLOs, to NOT use the full body / expiration time in the hash -->
+          but filtering its own HELLO via the Bloom filter described
+          in <xref target="result_bloomfilter"/>. 
+         <!-- FIXME: should we only filter our own, or ALL NEIGHBOURS? -->
           These requests <bcp14>MUST</bcp14> use the FindApproximate and 
DemultiplexEverywhere
           flags. FindApproximate will ensure that other peers will reply
           with keys they merely consider close-enough, while 
DemultiplexEverywhere
@@ -789,58 +832,44 @@ bchar = *(ALPHA / DIGIT)
         </t>
         <t>
           To facilitate peer discovery, each peer <bcp14>MUST</bcp14> 
broadcast its own
-          HELLO data to all peers in the routing table periodically.
-          <!-- FIXME: CG: specify frequency? -->
+          HELLO message to all peers in the routing table periodically.
+          The specific frequency <bcp14>MAY</bcp14> depend on available 
bandwidth
+          but <bcp14>SHOULD</bcp14> be a fraction of the expiration period.
           Whenever a peer receives such a HELLO message from another peer,
           it must cache it as long as that peer is in its routing table
           (or until the HELLO expires) and serve it in response to
           Peer Discovery requests.  Details about the format of the
-          HELLO message are given in section p2p_hello_wire.
+          HELLO message are given in <xref target="p2p_hello_wire"/>.
         </t>
       </section>
-      <section anchor="routing_table">
-        <name>Routing Table</name>
-        <t>
-          In order to select peers which are suitable destinations for
-          routing messages, R<sup>5</sup>N uses a hybrid approach:
-          Given an estimated network size N, the peer selection for the
-          first N hops is random. After the initial N hops, peer selection
-          follows an XOR-based peer distance calculation.
-        </t>
-        <t>
-          The routing table consists of an array of lists of connected peers.
-          The i-th list stores neighbours whose identifiers are between
-          distance 2^i and 2^(i+1) from the local peer.
-          System constraints will typically force an implementation to impose 
some
-          upper limit on the number of neighbours kept per k-bucket.
-        </t>
-        <t>
-          Implementations <bcp14>SHOULD</bcp14> try to keep at least
-          5 entries per k-bucket.  Embedded systems that cannot manage
-          this number of connections <bcp14>MAY</bcp14> use connection-level
-          signalling to indicate that they are merely a client utilizing a
-          DHT and not able to participate in routing.  DHT peers receiving
-          such connections <bcp14>MUST NOT</bcp14> include connections to
-          such restricted systems when making routing decisions.
-        </t>
-        <t>
-          If a system hits constraints with respect to
-          the number of active connections, an implementation
-          <bcp14>MUST</bcp14> evict peers from those k-buckets with the
-          largest number of peers. The eviction strategy <bcp14>MUST</bcp14> be
-          to drop the shortest-lived connections first.
-        </t>
+      <section anchor="routing_bloomfilter">
+        <name>Peer Bloom Filter</name>
         <t>
-          As the message traverses a random path through the network for the
+          As DHT GET and PUT messages traverse a random path through the 
network for the
           first N hops, it is essential that routing loops are avoided.
-          In R<sup>5</sup>N, a bloomfilter is used as part of the routing 
metadata in
-          messages. The bloomfilter is updates at each hop with the hops
+          In R<sup>5</sup>N, a Bloom filter is used as part of the routing 
metadata in
+          messages. The Bloom filter is updates at each hop with the hops
           peer identity.
           For the next hop selection in both the random and the deterministic
-          case, any peer which is in the bloomfilter for the respective message
+          case, any peer which is in the Bloom filter for the respective 
message
           is not included in the peer selection process.
         </t>
         <t>
+          The peer Bloom filter is used to prevent circular routes.
+          Any peer which is forwarding GET or PUT messages
+          (<xref target="p2p_messages"/>) adds its own peer ID to the
+          peer Bloom filter.
+          This allows other peers to (probabilistically) exclude already
+          traversed peers when searching for the next hops in the routing 
table.
+        </t>
+        <t>
+          The peer Bloom filter is always a 128-bit field. The elements
+         hashed into the Bloom filter are the 32 byte peer IDs.
+        </t>
+      </section>
+      <section anchor="routing_functions">
+        <name>Routing Functions</name>
+         <t>
           The R<sup>5</sup>N routing component <bcp14>MUST</bcp14> implement 
the following functions:
         </t>
         <dl>
@@ -860,7 +889,7 @@ bchar = *(ALPHA / DIGIT)
             routing table with the shortest XOR-distance to the key <tt>K</tt>.
             This means that for all other peers <tt>N'</tt> in the routing 
table
             <tt>GetDistance(N, K) &lt; GetDistance(N',K)</tt>.
-            peers in the bloomfilter <tt>B</tt> are not considered.
+            peers in the Bloom filter <tt>B</tt> are not considered.
           </dd>
           <dt>
             <tt>SelectRandompeer(B) -&gt; N</tt>
@@ -868,7 +897,7 @@ bchar = *(ALPHA / DIGIT)
           <dd>
             This function selects a random peer <tt>N</tt> from all connected
             peers.
-            peers in the bloomfilter <tt>B</tt> are not considered.
+            peers in the Bloom filter <tt>B</tt> are not considered.
           </dd>
           <dt>
             <tt>Selectpeer(K, H, B) -&gt; N</tt>
@@ -879,7 +908,7 @@ bchar = *(ALPHA / DIGIT)
             If <tt>H &lt; NETWORK_SIZE_ESTIMATE</tt>
             this function <bcp14>MUST</bcp14> return 
<tt>SelectRandompeer()</tt> and
             <tt>SelectClosestpeer(K)</tt> otherwise.
-            peers in the bloomfilter <tt>B</tt> are not considered.
+            peers in the Bloom filter <tt>B</tt> are not considered.
           </dd>
           <dt>
             <tt>IsClosestpeer(N, K, B) -&gt; true | false</tt>
@@ -887,7 +916,7 @@ bchar = *(ALPHA / DIGIT)
           <dd>
             checks if <tt>N</tt> is the closest peer for <tt>K</tt>
             (cf. <tt>SelectClosestpeer(K)</tt>).
-            peers in the bloomfilter <tt>B</tt> are not considered.
+            peers in the Bloom filter <tt>B</tt> are not considered.
           </dd>
         </dl>
       </section>
@@ -1235,7 +1264,7 @@ bchar = *(ALPHA / DIGIT)
             </dd>
             <dt>PEER_BF</dt>
             <dd>
-              A bloomfilter (for peer addresses) to stop circular routes.
+              A Bloom filter (for peer addresses) to stop circular routes.
             </dd>
             <dt>BLOCK_KEY</dt>
             <dd>
@@ -1371,7 +1400,7 @@ bchar = *(ALPHA / DIGIT)
             </dd>
             <dt>PEER_BF</dt>
             <dd>
-              A bloomfilter (for peer identities) to stop circular routes.
+              A Bloom filter (for peer identities) to stop circular routes.
             </dd>
             <dt>QUERY_HASH</dt>
             <dd>
@@ -1389,11 +1418,11 @@ bchar = *(ALPHA / DIGIT)
             </dd>
             <dt>BF_MUTATOR</dt>
             <dd>
-              The 32-bit bloomfilter mutator for the result bloomfilter.
+              The 32-bit Bloom filter mutator for the result Bloom filter.
             </dd>
             <dt>RESULT_BF</dt>
             <dd>
-              the variable-length result bloomfilter.
+              the variable-length result Bloom filter.
             </dd>
           </dl>
         </section>

-- 
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.
[Prev in Thread]
Current Thread
[Next in Thread]
[lsd0004] branch master updated: refactor: new section on Bloom filters, gnunet <=
Prev by Date: [lsd0004] branch master updated: -fix ftbfs
Next by Date: [lsd0004] branch master updated: more on Bloom filters
Previous by thread: [lsd0004] branch master updated: -fix ftbfs
Next by thread: [lsd0004] branch master updated: more on Bloom filters
Index(es):
- Date
- Thread