[lsd0004] branch master updated: result filter spec

gnunet-svn
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lsd0004] branch master updated: result filter spec

From:	gnunet
Subject:	[lsd0004] branch master updated: result filter spec
Date:	Sun, 13 Mar 2022 11:57:01 +0100
This is an automated email from the git hooks/post-receive script.

grothoff pushed a commit to branch master
in repository lsd0004.

The following commit(s) were added to refs/heads/master by this push:
     new 6ecbe70  result filter spec
6ecbe70 is described below

commit 6ecbe708a25488f9a55e74ff0f1586db72126d2c
Author: Christian Grothoff <grothoff@gnunet.org>
AuthorDate: Sun Mar 13 11:56:52 2022 +0100

    result filter spec
---
 draft-schanzen-r5n.xml | 155 +++++++++++++++++++++++++++++--------------------
 1 file changed, 92 insertions(+), 63 deletions(-)

diff --git a/draft-schanzen-r5n.xml b/draft-schanzen-r5n.xml
index 06e3f13..6096c24 100644
--- a/draft-schanzen-r5n.xml
+++ b/draft-schanzen-r5n.xml
@@ -372,10 +372,11 @@ Connectivity | |Underlay|  |Underlay|
           </dd>
           <dt>Result-Filter:</dt>
           <dd>
-            is a Bloom filter which allows applications to
-           probabilistically indicate results which are
+            is data for a <tt>Block-type</tt>-specific filter
+           which allows applications to
+           indicate results which are
            not relevant anymore to the
-            caller (see <xref target="result_bloomfilter"/>).
+            caller (see <xref target="result_filter"/>).
           </dd>
         </dl>
         <t>
@@ -659,7 +660,7 @@ Connectivity | |Underlay|  |Underlay|
       <t>
         Any implementation encountering a HELLO GET request 
<bcp14>MUST</bcp14> respond 
         with its own HELLO block except if that block is
-        filtered by the request's result filter (see <xref 
target="result_bloomfilter"/>).  
+        filtered by the request's result filter (see <xref 
target="result_filter"/>).  
         Implementations <bcp14>MAY</bcp14> respond 
         with additional valid HELLO blocks of other peers with keys
         closest to the key of the GET request.  A HELLO block is "valid"
@@ -762,16 +763,16 @@ bchar = *(ALPHA / DIGIT)
         32-bit integers in network byte order.
       </t>
       <t>
-        When adding an element to the bloom filter <tt>bf</tt> using
+        When adding an element to the Bloom filter <tt>bf</tt> using
         <tt>BF-SET(bf,e)</tt>, each integer <tt>n</tt> of the mapping
         <tt>M(e)</tt> is interpreted as a bit offset <tt>n mod L</tt> within
         <tt>bf</tt> and set to 1.
       </t>
       <t>
-        When testing if an element may be in the bloom filter <tt>bf</tt> using
+        When testing if an element may be in the Bloom filter <tt>bf</tt> using
         <tt>BF-TEST(bf,e)</tt>, each bit offset <tt>n mod L</tt> within
         <tt>bf</tt> <bcp14>MUST</bcp14> have been set to 1.
-        Otherwise, the element is not considered to be in the bloom filter.
+        Otherwise, the element is not considered to be in the Bloom filter.
       </t>
     </section>
     <section anchor="routing" numbered="true" toc="default">
@@ -841,7 +842,7 @@ bchar = *(ALPHA / DIGIT)
           To build its routing table, a peer will send out requests
           asking for blocks of type HELLO using its own location as the key,
           but filtering all of its neighbors via the Bloom filter described
-          in <xref target="result_bloomfilter"/>. 
+          in <xref target="result_filter"/>. 
           These requests <bcp14>MUST</bcp14> use the FindApproximate and 
DemultiplexEverywhere
           flags. FindApproximate will ensure that other peers will reply
           with keys they merely consider close-enough, while 
DemultiplexEverywhere
@@ -980,21 +981,21 @@ bchar = *(ALPHA / DIGIT)
          For each entry in the pending table, the DHT <bcp14>MUST</bcp14> track
          not only the query key and the origin, but also the
          extended query, requested block type and flags, and the
-         result Bloom filter.  If the query did not provide
-         a result Bloom filter, a fresh result Bloom filter
+         result filter.  If the query did not provide
+         a result filter, a fresh result filter
          <bcp14>MUST</bcp14> still be created to filter duplicate replies.
+         Details of how a result filter works depend on the
+         type, as described in <xref target="block_functions"/>.
        </t>
        <t>
          When a second query from the same origin for the
          same query hash is received, the DHT <bcp14>MUST</bcp14>
          attempt to merge the new request with the state for
-         the old request.  In particular, this means that if
-         the result Bloom filters have the same size and
-         mutator, they <bcp14>MUST</bcp14> be combined.  If
-         the result Bloom fitlers meta data differs, the
-         existing result Bloom filter <bcp14>MUST</bcp14> be
-         discarded and replaced with the incoming result
-         Bloom filter.
+         the old request.  If this is not possible, the
+         existing result filter <bcp14>MUST</bcp14> be
+         discarded and replaced with the result
+         filter of the incoming message.
+       </t>
        <t>
          We note that for local applications, a fixed limit on
          the number of concurrent requests may be problematic.
@@ -1536,63 +1537,61 @@ bchar = *(ALPHA / DIGIT)
             <dd>
               the variable-length extended query. Optional.
             </dd>
-            <dt>BF_MUTATOR</dt>
+            <dt>MUTATOR</dt>
             <dd>
-              The 32-bit Bloom filter mutator for the result Bloom filter.
+              The 32-bit mutator for the result filter.
             </dd>
-            <dt>RESULT_BF</dt>
+            <dt>RESULT_FILTER</dt>
             <dd>
-              the variable-length result Bloom filter, described in <xref 
target="result_bloomfilter"/>.
+              the variable-length result filter, described in <xref 
target="result_filter"/>.
             </dd>
           </dl>
         </section>
-       <section anchor="result_bloomfilter">
-          <name>Result Bloom Filter</name>
+       <section anchor="result_filter">
+          <name>Result Filter</name>
          <t>
-            The result Bloom filter is used to indicate to other peers which 
results
+            The result filter is used to indicate to other peers which results
             are not of interest when processing a <tt>GetMessage</tt>
             (<xref target="p2p_get"/>).
             Any peer which is processing <tt>GetMessage</tt>s and has a result
-            which matches the query key <bcp14>MUST</bcp14> check the result 
Bloom filter
+            which matches the query key <bcp14>MUST</bcp14> check the result 
filter
             and only send a reply message if the result does not test positive
-           under the result Bloom filter. Before forwarding the 
<tt>GetMessage</tt>, the
-           result Bloom filter <bcp14>MUST</bcp14> be updated to filter out 
all results
+           under the result filter. Before forwarding the <tt>GetMessage</tt>, 
the
+           result filter <bcp14>MUST</bcp14> be updated to filter out all 
results
            already returned by the local peer.
           </t>
          <t>
-            FIXME: say something about how to calculate the size of the result 
Bloom filter here!
-          </t>
-          <t>
-           Bloom filters are probabilistic data structures. Thus, especially
-           given the small size of the result Bloom filter, it is always 
possible
-           that a desireable result is filtered by the Bloom filter because of 
-           a false-positive match created by a collision in the hash values
-           between the desireable result and filtered items. 
+            How a result filter is implemented depends on the block type
+           as described in <xref target="block_functions"/>.
+           Result filters may be probabilistic data structures. Thus,
+           it is possible that a desireable result is filtered by a result
+           filter because of a false-positive test.
           </t>
           <t>
-           To address this problem, R<sup>5</sup>N uses a mutator value
-            to additionally randomize the process
-            when hashing results into the result Bloom filter. The mutator
+           To address this problem, R<sup>5</sup>N uses a <tt>MUTATOR</tt> 
value
+            which allows block implemenations that use probabilistic data
+           structures for result filters to additionally "randomize" the
+           computation of a probabilistic data structure while remaining
+           deterministic across peers.  The 32-bit <tt>MUTATOR</tt>
            value is set by the peer initiating the GET request, and changed
            every time the GET request is repeated by the initiator. Peers
            forwarding GET requests <bcp14>MUST</bcp14> not change the 
-           mutator value included in the <tt>GetMessage</tt> as they could not 
-           recalculate the Bloom filter with the new mutator value.
+           mutator value included in the <tt>GetMessage</tt> as they might not 
+           be able to recalculate the result filter with a different 
<tt>MUTATOR</tt>
+           value.
           </t>
          <t>
-           By including the mutator value in the hashing process, repeated
+           By properly including the <tt>MUTATOR</tt> value in a probabilistic 
process, repeated
            requests have statistically independent probabilities of creating
-           collisions in the result Bloom filter. Thus, even if for one request
-           a result Bloom filter collision may exclude a result as a 
false-positive
+           false-positives in a result filter. Thus, even if for one request
+           a result filter may exclude a result as a false-positive
            match, subsequent requests are likely to not have the same 
            false-positives.
           </t>
          <t>
-           How exactly a block result is hashed into the result Bloom filter
-           together with the mutator depends on the block type.  
-           For example, some block types may include full block
-           payload, certain parts of the block payload, or the block key
-           when hashing the block into the result Bloom filter.
+           How exactly a block result is added to a result filter
+           (together with the <tt>MUTATOR</tt>) <bcp14>MUST</bcp14> be
+           specified as part of the definition of a block type.  
           </t>
         </section>
         <section anchor="p2p_get_processing">
@@ -1604,8 +1603,7 @@ bchar = *(ALPHA / DIGIT)
           <ol>
             <li>
               The <tt>QUERY_KEY</tt> and <tt>XQUERY</tt> fields are validated
-              against the
-              requested <tt>BTYPE</tt> as defined by its respective
+              against the requested <tt>BTYPE</tt> as defined by its respective
               <tt>ValidateBlockQuery</tt> procedure.
               If validation
              function yields <tt>REQUEST_INVALID</tt>, the message 
<bcp14>MUST</bcp14> be discarded.
@@ -1641,7 +1639,7 @@ bchar = *(ALPHA / DIGIT)
                   <tt>RESULT_BF</tt>.  However, implementations also 
<bcp14>MUST</bcp14>
                  avoid an exhaustive search of their database, as there could 
be
                  cases where too many local results are filtered by the result 
-                 Bloom filter. To avoid denial of service attacks, 
implementations
+                 filter. To avoid denial of service attacks, implementations
                  <bcp14>MUST</bcp14> thus ensure that the cost of evaluating 
any
                  such query is reasonably small.                 
                 </li>
@@ -1833,8 +1831,10 @@ bchar = *(ALPHA / DIGIT)
                <li>
                   If the <tt>BTYPE</tt> is supported, result block 
<bcp14>MUST</bcp14>
                  be validated against the specific query using
-                 the respective <tt>FilterBlockResult</tt> function, possibly 
updating
-                 the result Bloom filter of the query in the process.
+                 the respective <tt>FilterBlockResult</tt> function. This 
function
+                 <bcp14>MUST</bcp14> update
+                 the result filter if a result is returned to the originator 
of the
+                 query.
                 </li>
                <li>
                  If the <tt>BTYPE</tt> is not supported, filtering of exact 
duplicate 
@@ -1933,8 +1933,17 @@ bchar = *(ALPHA / DIGIT)
                <dd>Block payload does not match the block type.
                </dd>
               </dl>            
+            </dd> 
+            <dt>SetupResultFilter(FilterSize, Mutator) -&gt; RF</dt>
+            <dd>
+             is used to setup an empty result filter.  The arguments
+             are the set of results that must be filtered at the
+             initiator, and a <tt>MUTATOR</tt> value which <bcp14>MAY</bcp14>
+             be used to deterministically re-randomize
+             probabilistic data structures.  The specification 
<bcp14>MUST</bcp14>
+             also include the wire format for BF.
             </dd>
-            <dt>FilterResult(Block, Key, RBF, XQuery) -&gt; 
(FilterEvaluationResult, RBF')</dt>
+            <dt>FilterResult(Block, Key, RF, XQuery) -&gt; 
(FilterEvaluationResult, RF')</dt>
             <dd>
              <t>
              is used to filter results against specific queries.  This function
@@ -1952,13 +1961,13 @@ bchar = *(ALPHA / DIGIT)
               <dt>FILTER_LAST</dt>
               <dd>Last possible valid result.</dd>
               <dt>FILTER_DUPLICATE</dt>
-              <dd>Valid result, but duplicate (was filtered by the result 
Bloom filter).</dd>
+              <dd>Valid result, but duplicate (was filtered by the result 
filter).</dd>
               <dt>FILTER_IRRELEVANT</dt>
               <dd>Block does not satisfy the constraints imposed by the 
XQuery.</dd>
               </dl>
              <t>
              If the main evaluation result is <tt>FILTER_MORE</tt>, the 
function also returns
-             and updated result Bloom filter where the block is added to the 
set of
+             and updated result filter where the block is added to the set of
              filtered replies.  An implementation is not expected to actually 
differenciate 
              between the <tt>FILTER_DUPLICATE</tt> and 
<tt>FILTER_IRRELEVANT</tt> return
              values: in both cases the block is ignored for this query.
@@ -2110,20 +2119,40 @@ gnunet+tcp://12.3.4.5/ \
            against the public key from the peer ID field.
           </t>
          <t>
-           To filter results of HELLO blocks
-           using the result Bloom filter, the
+           The result filter for HELLO blocks is implemented using a
+           Bloom filter. The <tt>K</tt>-value for the Bloom filter 
+           is always 16.  The size <tt>S</tt> of the Bloom filter in bytes 
depends on
+           the number of elements <tt>F</tt> known to be filtered at the
+           initiator. If <tt>F</tt> is zero, the size <tt>S</tt> is just 8 
(bytes).
+           Otherwise, <tt>S</tt> is set to the minimum of
+           2<sup>15</sup> and the lowest power
+           of 2 that is strictly larger than <tt>K*F/4</tt>
+           (in bytes).  The wire format of a HELLO block Bloom filter
+           is just the resulting byte array. In particular, <tt>K</tt> 
+           is not transmitted.
+          </t>
+         <t>
+           To filter results of HELLO blocks using the Bloom filter, the
            <tt>H_ADDRS</tt> field (as computed using SHA-512 over
-           the <tt>ADDRESSES</tt>) is XORed with the SHA-512
-           hash of the mutator (in network byte order).
+           the <tt>ADDRESSES</tt>) and XORed with the SHA-512
+           hash of the <tt>MUTATOR</tt> (in network byte order).
            The resulting value is then used when hashing into the
-           result Bloom filter. Consequently, HELLOs with
-           completely identical sets of addresses will be
-           filtered, but any small variation in the set of
+           Bloom filter as described in <xref target="bloom_filters" />. 
+           Consequently, HELLOs with completely identical sets of 
+           addresses will be filtered, but any small variation in the set of
            addresses will cause the block to no longer be
            filtered (with high probability).  The
            function thus always returns either
            <tt>FILTER_MORE</tt> or <tt>FILTER_DUPLICATE</tt>.
           </t>
+         <t>
+           HELLO result filters can be merged if the 
+           Bloom filters have the same size and
+           <tt>MUTATOR</tt> by setting all bits to 1 that are
+           set in either Bloom filter.  This is done whenever
+           a peer receives a query with the same <tt>MUTATOR</tt>,
+           predecessor and Bloom filter size.
+         </t>    
         </section>
         <section>
         <name>Persistence</name>

-- 
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.
[Prev in Thread]
Current Thread
[Next in Thread]
[lsd0004] branch master updated: result filter spec, gnunet <=
Prev by Date: [lsd0004] branch master updated: add pending table description
Next by Date: [libmicrohttpd] 04/08: Disallowed successful responses for CONNECT requests
Previous by thread: [lsd0004] branch master updated: add pending table description
Next by thread: [libmicrohttpd] branch master updated (1a46bfd6 -> f2565a62)
Index(es):
- Date
- Thread