[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lsd0004] branch master updated: result filter spec
From: |
gnunet |
Subject: |
[lsd0004] branch master updated: result filter spec |
Date: |
Sun, 13 Mar 2022 11:57:01 +0100 |
This is an automated email from the git hooks/post-receive script.
grothoff pushed a commit to branch master
in repository lsd0004.
The following commit(s) were added to refs/heads/master by this push:
new 6ecbe70 result filter spec
6ecbe70 is described below
commit 6ecbe708a25488f9a55e74ff0f1586db72126d2c
Author: Christian Grothoff <grothoff@gnunet.org>
AuthorDate: Sun Mar 13 11:56:52 2022 +0100
result filter spec
---
draft-schanzen-r5n.xml | 155 +++++++++++++++++++++++++++++--------------------
1 file changed, 92 insertions(+), 63 deletions(-)
diff --git a/draft-schanzen-r5n.xml b/draft-schanzen-r5n.xml
index 06e3f13..6096c24 100644
--- a/draft-schanzen-r5n.xml
+++ b/draft-schanzen-r5n.xml
@@ -372,10 +372,11 @@ Connectivity | |Underlay| |Underlay|
</dd>
<dt>Result-Filter:</dt>
<dd>
- is a Bloom filter which allows applications to
- probabilistically indicate results which are
+ is data for a <tt>Block-type</tt>-specific filter
+ which allows applications to
+ indicate results which are
not relevant anymore to the
- caller (see <xref target="result_bloomfilter"/>).
+ caller (see <xref target="result_filter"/>).
</dd>
</dl>
<t>
@@ -659,7 +660,7 @@ Connectivity | |Underlay| |Underlay|
<t>
Any implementation encountering a HELLO GET request
<bcp14>MUST</bcp14> respond
with its own HELLO block except if that block is
- filtered by the request's result filter (see <xref
target="result_bloomfilter"/>).
+ filtered by the request's result filter (see <xref
target="result_filter"/>).
Implementations <bcp14>MAY</bcp14> respond
with additional valid HELLO blocks of other peers with keys
closest to the key of the GET request. A HELLO block is "valid"
@@ -762,16 +763,16 @@ bchar = *(ALPHA / DIGIT)
32-bit integers in network byte order.
</t>
<t>
- When adding an element to the bloom filter <tt>bf</tt> using
+ When adding an element to the Bloom filter <tt>bf</tt> using
<tt>BF-SET(bf,e)</tt>, each integer <tt>n</tt> of the mapping
<tt>M(e)</tt> is interpreted as a bit offset <tt>n mod L</tt> within
<tt>bf</tt> and set to 1.
</t>
<t>
- When testing if an element may be in the bloom filter <tt>bf</tt> using
+ When testing if an element may be in the Bloom filter <tt>bf</tt> using
<tt>BF-TEST(bf,e)</tt>, each bit offset <tt>n mod L</tt> within
<tt>bf</tt> <bcp14>MUST</bcp14> have been set to 1.
- Otherwise, the element is not considered to be in the bloom filter.
+ Otherwise, the element is not considered to be in the Bloom filter.
</t>
</section>
<section anchor="routing" numbered="true" toc="default">
@@ -841,7 +842,7 @@ bchar = *(ALPHA / DIGIT)
To build its routing table, a peer will send out requests
asking for blocks of type HELLO using its own location as the key,
but filtering all of its neighbors via the Bloom filter described
- in <xref target="result_bloomfilter"/>.
+ in <xref target="result_filter"/>.
These requests <bcp14>MUST</bcp14> use the FindApproximate and
DemultiplexEverywhere
flags. FindApproximate will ensure that other peers will reply
with keys they merely consider close-enough, while
DemultiplexEverywhere
@@ -980,21 +981,21 @@ bchar = *(ALPHA / DIGIT)
For each entry in the pending table, the DHT <bcp14>MUST</bcp14> track
not only the query key and the origin, but also the
extended query, requested block type and flags, and the
- result Bloom filter. If the query did not provide
- a result Bloom filter, a fresh result Bloom filter
+ result filter. If the query did not provide
+ a result filter, a fresh result filter
<bcp14>MUST</bcp14> still be created to filter duplicate replies.
+ Details of how a result filter works depend on the
+ type, as described in <xref target="block_functions"/>.
</t>
<t>
When a second query from the same origin for the
same query hash is received, the DHT <bcp14>MUST</bcp14>
attempt to merge the new request with the state for
- the old request. In particular, this means that if
- the result Bloom filters have the same size and
- mutator, they <bcp14>MUST</bcp14> be combined. If
- the result Bloom fitlers meta data differs, the
- existing result Bloom filter <bcp14>MUST</bcp14> be
- discarded and replaced with the incoming result
- Bloom filter.
+ the old request. If this is not possible, the
+ existing result filter <bcp14>MUST</bcp14> be
+ discarded and replaced with the result
+ filter of the incoming message.
+ </t>
<t>
We note that for local applications, a fixed limit on
the number of concurrent requests may be problematic.
@@ -1536,63 +1537,61 @@ bchar = *(ALPHA / DIGIT)
<dd>
the variable-length extended query. Optional.
</dd>
- <dt>BF_MUTATOR</dt>
+ <dt>MUTATOR</dt>
<dd>
- The 32-bit Bloom filter mutator for the result Bloom filter.
+ The 32-bit mutator for the result filter.
</dd>
- <dt>RESULT_BF</dt>
+ <dt>RESULT_FILTER</dt>
<dd>
- the variable-length result Bloom filter, described in <xref
target="result_bloomfilter"/>.
+ the variable-length result filter, described in <xref
target="result_filter"/>.
</dd>
</dl>
</section>
- <section anchor="result_bloomfilter">
- <name>Result Bloom Filter</name>
+ <section anchor="result_filter">
+ <name>Result Filter</name>
<t>
- The result Bloom filter is used to indicate to other peers which
results
+ The result filter is used to indicate to other peers which results
are not of interest when processing a <tt>GetMessage</tt>
(<xref target="p2p_get"/>).
Any peer which is processing <tt>GetMessage</tt>s and has a result
- which matches the query key <bcp14>MUST</bcp14> check the result
Bloom filter
+ which matches the query key <bcp14>MUST</bcp14> check the result
filter
and only send a reply message if the result does not test positive
- under the result Bloom filter. Before forwarding the
<tt>GetMessage</tt>, the
- result Bloom filter <bcp14>MUST</bcp14> be updated to filter out
all results
+ under the result filter. Before forwarding the <tt>GetMessage</tt>,
the
+ result filter <bcp14>MUST</bcp14> be updated to filter out all
results
already returned by the local peer.
</t>
<t>
- FIXME: say something about how to calculate the size of the result
Bloom filter here!
- </t>
- <t>
- Bloom filters are probabilistic data structures. Thus, especially
- given the small size of the result Bloom filter, it is always
possible
- that a desireable result is filtered by the Bloom filter because of
- a false-positive match created by a collision in the hash values
- between the desireable result and filtered items.
+ How a result filter is implemented depends on the block type
+ as described in <xref target="block_functions"/>.
+ Result filters may be probabilistic data structures. Thus,
+ it is possible that a desireable result is filtered by a result
+ filter because of a false-positive test.
</t>
<t>
- To address this problem, R<sup>5</sup>N uses a mutator value
- to additionally randomize the process
- when hashing results into the result Bloom filter. The mutator
+ To address this problem, R<sup>5</sup>N uses a <tt>MUTATOR</tt>
value
+ which allows block implemenations that use probabilistic data
+ structures for result filters to additionally "randomize" the
+ computation of a probabilistic data structure while remaining
+ deterministic across peers. The 32-bit <tt>MUTATOR</tt>
value is set by the peer initiating the GET request, and changed
every time the GET request is repeated by the initiator. Peers
forwarding GET requests <bcp14>MUST</bcp14> not change the
- mutator value included in the <tt>GetMessage</tt> as they could not
- recalculate the Bloom filter with the new mutator value.
+ mutator value included in the <tt>GetMessage</tt> as they might not
+ be able to recalculate the result filter with a different
<tt>MUTATOR</tt>
+ value.
</t>
<t>
- By including the mutator value in the hashing process, repeated
+ By properly including the <tt>MUTATOR</tt> value in a probabilistic
process, repeated
requests have statistically independent probabilities of creating
- collisions in the result Bloom filter. Thus, even if for one request
- a result Bloom filter collision may exclude a result as a
false-positive
+ false-positives in a result filter. Thus, even if for one request
+ a result filter may exclude a result as a false-positive
match, subsequent requests are likely to not have the same
false-positives.
</t>
<t>
- How exactly a block result is hashed into the result Bloom filter
- together with the mutator depends on the block type.
- For example, some block types may include full block
- payload, certain parts of the block payload, or the block key
- when hashing the block into the result Bloom filter.
+ How exactly a block result is added to a result filter
+ (together with the <tt>MUTATOR</tt>) <bcp14>MUST</bcp14> be
+ specified as part of the definition of a block type.
</t>
</section>
<section anchor="p2p_get_processing">
@@ -1604,8 +1603,7 @@ bchar = *(ALPHA / DIGIT)
<ol>
<li>
The <tt>QUERY_KEY</tt> and <tt>XQUERY</tt> fields are validated
- against the
- requested <tt>BTYPE</tt> as defined by its respective
+ against the requested <tt>BTYPE</tt> as defined by its respective
<tt>ValidateBlockQuery</tt> procedure.
If validation
function yields <tt>REQUEST_INVALID</tt>, the message
<bcp14>MUST</bcp14> be discarded.
@@ -1641,7 +1639,7 @@ bchar = *(ALPHA / DIGIT)
<tt>RESULT_BF</tt>. However, implementations also
<bcp14>MUST</bcp14>
avoid an exhaustive search of their database, as there could
be
cases where too many local results are filtered by the result
- Bloom filter. To avoid denial of service attacks,
implementations
+ filter. To avoid denial of service attacks, implementations
<bcp14>MUST</bcp14> thus ensure that the cost of evaluating
any
such query is reasonably small.
</li>
@@ -1833,8 +1831,10 @@ bchar = *(ALPHA / DIGIT)
<li>
If the <tt>BTYPE</tt> is supported, result block
<bcp14>MUST</bcp14>
be validated against the specific query using
- the respective <tt>FilterBlockResult</tt> function, possibly
updating
- the result Bloom filter of the query in the process.
+ the respective <tt>FilterBlockResult</tt> function. This
function
+ <bcp14>MUST</bcp14> update
+ the result filter if a result is returned to the originator
of the
+ query.
</li>
<li>
If the <tt>BTYPE</tt> is not supported, filtering of exact
duplicate
@@ -1933,8 +1933,17 @@ bchar = *(ALPHA / DIGIT)
<dd>Block payload does not match the block type.
</dd>
</dl>
+ </dd>
+ <dt>SetupResultFilter(FilterSize, Mutator) -> RF</dt>
+ <dd>
+ is used to setup an empty result filter. The arguments
+ are the set of results that must be filtered at the
+ initiator, and a <tt>MUTATOR</tt> value which <bcp14>MAY</bcp14>
+ be used to deterministically re-randomize
+ probabilistic data structures. The specification
<bcp14>MUST</bcp14>
+ also include the wire format for BF.
</dd>
- <dt>FilterResult(Block, Key, RBF, XQuery) ->
(FilterEvaluationResult, RBF')</dt>
+ <dt>FilterResult(Block, Key, RF, XQuery) ->
(FilterEvaluationResult, RF')</dt>
<dd>
<t>
is used to filter results against specific queries. This function
@@ -1952,13 +1961,13 @@ bchar = *(ALPHA / DIGIT)
<dt>FILTER_LAST</dt>
<dd>Last possible valid result.</dd>
<dt>FILTER_DUPLICATE</dt>
- <dd>Valid result, but duplicate (was filtered by the result
Bloom filter).</dd>
+ <dd>Valid result, but duplicate (was filtered by the result
filter).</dd>
<dt>FILTER_IRRELEVANT</dt>
<dd>Block does not satisfy the constraints imposed by the
XQuery.</dd>
</dl>
<t>
If the main evaluation result is <tt>FILTER_MORE</tt>, the
function also returns
- and updated result Bloom filter where the block is added to the
set of
+ and updated result filter where the block is added to the set of
filtered replies. An implementation is not expected to actually
differenciate
between the <tt>FILTER_DUPLICATE</tt> and
<tt>FILTER_IRRELEVANT</tt> return
values: in both cases the block is ignored for this query.
@@ -2110,20 +2119,40 @@ gnunet+tcp://12.3.4.5/ \
against the public key from the peer ID field.
</t>
<t>
- To filter results of HELLO blocks
- using the result Bloom filter, the
+ The result filter for HELLO blocks is implemented using a
+ Bloom filter. The <tt>K</tt>-value for the Bloom filter
+ is always 16. The size <tt>S</tt> of the Bloom filter in bytes
depends on
+ the number of elements <tt>F</tt> known to be filtered at the
+ initiator. If <tt>F</tt> is zero, the size <tt>S</tt> is just 8
(bytes).
+ Otherwise, <tt>S</tt> is set to the minimum of
+ 2<sup>15</sup> and the lowest power
+ of 2 that is strictly larger than <tt>K*F/4</tt>
+ (in bytes). The wire format of a HELLO block Bloom filter
+ is just the resulting byte array. In particular, <tt>K</tt>
+ is not transmitted.
+ </t>
+ <t>
+ To filter results of HELLO blocks using the Bloom filter, the
<tt>H_ADDRS</tt> field (as computed using SHA-512 over
- the <tt>ADDRESSES</tt>) is XORed with the SHA-512
- hash of the mutator (in network byte order).
+ the <tt>ADDRESSES</tt>) and XORed with the SHA-512
+ hash of the <tt>MUTATOR</tt> (in network byte order).
The resulting value is then used when hashing into the
- result Bloom filter. Consequently, HELLOs with
- completely identical sets of addresses will be
- filtered, but any small variation in the set of
+ Bloom filter as described in <xref target="bloom_filters" />.
+ Consequently, HELLOs with completely identical sets of
+ addresses will be filtered, but any small variation in the set of
addresses will cause the block to no longer be
filtered (with high probability). The
function thus always returns either
<tt>FILTER_MORE</tt> or <tt>FILTER_DUPLICATE</tt>.
</t>
+ <t>
+ HELLO result filters can be merged if the
+ Bloom filters have the same size and
+ <tt>MUTATOR</tt> by setting all bits to 1 that are
+ set in either Bloom filter. This is done whenever
+ a peer receives a query with the same <tt>MUTATOR</tt>,
+ predecessor and Bloom filter size.
+ </t>
</section>
<section>
<name>Persistence</name>
--
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [lsd0004] branch master updated: result filter spec,
gnunet <=