gnunet-svn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lsd0003] branch master updated: chapter 2


From: gnunet
Subject: [lsd0003] branch master updated: chapter 2
Date: Sat, 12 Jun 2021 17:22:43 +0200

This is an automated email from the git hooks/post-receive script.

grothoff pushed a commit to branch master
in repository lsd0003.

The following commit(s) were added to refs/heads/master by this push:
     new 2e15bfd  chapter 2
2e15bfd is described below

commit 2e15bfdc45ea02d516126f2947e8d3ffb409d084
Author: Christian Grothoff <christian@grothoff.org>
AuthorDate: Sat Jun 12 17:19:59 2021 +0200

    chapter 2
---
 draft-summermatter-set-union.xml | 114 +++++++++++++++++++++------------------
 1 file changed, 63 insertions(+), 51 deletions(-)

diff --git a/draft-summermatter-set-union.xml b/draft-summermatter-set-union.xml
index 66fdfab..87793e6 100644
--- a/draft-summermatter-set-union.xml
+++ b/draft-summermatter-set-union.xml
@@ -174,16 +174,17 @@
         <section anchor="background" numbered="true" toc="default">
             <name>Background</name>
             <section anchor="bf" numbered="true" toc="default">
-                <name>Bloom Filters</name>
+                <name>Bloom Filter</name>
                 <t>
-                    A Bloom filter (BF) is a space-efficient datastructure to 
test if an element is part of a set of elements.
-                    Elements are identified by an element ID.
-                    Since a BF is a probabilistic datastructure, it is 
possible to have false-positives: when asked
-                    if an element is in the set, the answer from a BF is 
either "no" or "maybe".
+                  A Bloom filter (BF) is a space-efficient probabilistic
+                  datastructure to test if an element is part of a set of 
elements.
+                  Elements are identified by an element ID.
+                  Since a BF is a probabilistic datastructure, it is possible 
to have false-positives: when asked
+                  if an element is in the set, the answer from a BF is either 
"no" or "maybe".
                 </t>
                 <t>
                     A BF consists of L buckets. Every bucket is a binary value 
that can be either 0 or 1. All buckets are initialized
-                    to 0.  A mapping function M is used to map each ID of each 
element from the set to a subset of k buckets.  M is non-injective
+                    to 0.  A mapping function M is used to map each ID of each 
element from the set to a subset of k buckets.  In the original proposal by 
Bloom, M is non-injective
                     and can thus map the same element multiple times to the 
same bucket.
                     The type of the mapping function can thus be described by 
the following mathematical notation:
                 </t>
@@ -211,13 +212,11 @@
                     To check if an element may be in the set, one tests if all 
buckets under the map M are set to 1.
                 </t>
                 <t>
-                    Further in this document a bitstream output by the mapping 
function is represented by
-                    a set of numeric values for example (0101) = (2,4).
                     In the BF the buckets are set to 1 if the corresponding 
bit in the bitstream is 1.
-                    If there is a collision and a bucket is already set to 1, 
the bucket stays 1.
+                    If there is a collision and a bucket is already set to 1, 
the bucket stays at 1.
                 </t>
                 <t>
-                    In the following example the element M(element) = (1,3) 
has been added:
+                    In the following example the element e0 with M(e0) = {1,3} 
has been added:
                 </t>
                     <figure anchor="figure_bf_insert_0">
                         <artwork name="" type="" align="left" alt=""><![CDATA[
@@ -228,8 +227,10 @@
                      ]]></artwork>
                     </figure>
                 <t>
-                    It is easy to see that the M(element) = (0,3) could be in 
the BF below and M(element) = (0,2) cannot be
-                    in the BF below:
+                  It is easy to see that an element e1 with M(e1) = {0,3}
+                  could have been added to the BF below, while an element e2
+                  with M(e2) = {0,2} cannot be in the set represented by the
+                  BF below:
                 </t>
 
                 <figure anchor="figure_bf_contains">
@@ -255,14 +256,18 @@
             <section anchor="cbf" numbered="true" toc="default">
                 <name>Counting Bloom Filter</name>
                 <t>
-                  A Counting Bloom Filter (CBF) is an extension of the <xref 
target="bf" format="title" />. In the CBF, buckets are
-                  unsigned numbers instead of binary values.  This allows the 
removal of an element from the CBF.
+                  A Counting Bloom Filter (CBF) is a variation on the idea
+                  of a <xref target="bf" format="title" />. With a CBF, 
buckets are
+                  unsigned numbers instead of binary values.
+                  This allows the removal of an element from the CBF.
                 </t>
                 <t>
-                  Adding an element to the CBF is similar to the adding 
operation of the BF. However, instead of setting the bucket on hit to 1 the
-                  numeric value stored in the bucket is increased by 1. For 
example if two colliding elements M(element1) = (1,3) and
-                    M(element2) = (0,3) are added to the CBF, bucket 0 and 1 
are set to 1 and bucket 3 (the colliding bucket) is set
-                    to 2:
+                  Adding an element to the CBF is similar to the adding 
operation of the BF.
+                  However, instead of setting the buckets to 1 the
+                  numeric value stored in the bucket is increased by 1.
+                  For example, if two colliding elements M(e1) = {1,3} and
+                  M(e2) = {0,3} are added to the CBF, bucket 0 and 1 are set
+                  to 1 and bucket 3 (the colliding bucket) is set to 2:
                 </t>
                 <figure anchor="figure_cbf_insert_0">
                     <artwork name="" type="" align="left" alt=""><![CDATA[
@@ -273,13 +278,15 @@
                      ]]></artwork>
                 </figure>
                 <t>
-                    The counter stored in the bucket is also called the order 
of the bucket.
+                  The counter stored in the bucket is also called the order of 
the bucket.
                 </t>
                 <t>
-                    To remove an element form the CBF the counters of all 
buckets the element is mapped to are decreased by 1.
+                  To remove an element form the CBF the counters of all buckets
+                  the element is mapped to are decreased by 1.
                 </t>
                 <t>
-                    Removing M(element2) = (1,3) from the CBF above:
+                  For example, removing M(e2) = {1,3} from the CBF above
+                  results in:
                 </t>
                 <figure anchor="figure_cbf_remove_0">
                     <artwork name="" type="" align="left" alt=""><![CDATA[
@@ -290,15 +297,19 @@
                      ]]></artwork>
                 </figure>
                 <t>
-                  In practice, the number of bits available for the counters 
is usually finite. For example, given a 4-bit
-                  counter, a CBF bucket would overflow 16 elements are mapped 
to the same bucket. To efficiently
-                  handle this case, the maximum value (15 in our example) is 
considered to represent "infinity". Once the
+                  In practice, the number of bits available for the counters
+                  is often finite. For example, given a 4-bit
+                  counter, a CBF bucket would overflow 16 elements are mapped
+                  to the same bucket. To handle this case, the maximum value
+                  (15 in our example) is considered to represent "infinity". 
Once the
                   order of a bucket reaches "infinity", it is no longer 
incremented or decremented.
                 </t>
                 <t>
-                  The parameters L and k and the number of bits allocated to 
the counters depend on the set size.
-                  An IBF will degenerate when subjected to insert and remove 
iterations of different elements, and eventually all
-                  buckets will reach "infinity".  The speed of the degradation 
will depend on the choice of L and k in
+                  The parameters L and k and the number of bits allocated to 
the counters
+                  SHOULD depend on the set size.
+                  A CBF will degenerate when subjected to insert and remove 
iterations of
+                  different elements, and eventually all buckets will reach 
"infinity".
+                  The speed of the degradation will depend on the choice of L 
and k in
                   relation to the number of elements stored in the IBF.
                 </t>
             </section>
@@ -309,34 +320,34 @@
             <t>
                 An Invertible Bloom Filter (IBF) is a further extension of the 
<xref target="cbf" format="title" />.
                 An IBF extends the <xref target="cbf" format="title" /> with 
two more operations:
-                decode and set difference. This two extra operations are 
useful to efficiently extract
+                decode and set difference. This two extra operations are key 
to efficiently obtain
                 small differences between large sets.
             </t>
             <section anchor="ibf_structure" numbered="true" toc="default">
                 <name>Structure</name>
                 <t>
-                    An IBF consists of a mapping function M and
-                    L buckets that each store a signed
-                    counter and an XHASH.  An XHASH is the XOR of various
-                    hash values.  As before, the
-                    values used for k, L and the number of bits used
-                    for the signed counter and the XHASH depend
-                    on the set size and various other trade-offs,
-                    including the CPU architecture.
-                </t>
-                <t>
-                    If the IBF size is too small or the mapping
-                    function does not spread out the elements
-                    uniformly, the signed counter can overflow or
-                    underflow. As with the CBF, the "maximum" value is
-                    thus used to represent "infinite".  As there is no
-                    need to distinguish between overflow and
-                    underflow, the most canonical representation of
-                    "infinite" would be the minimum value of the
-                    counter in the canonical 2-complement
-                    interpretation.  For example, given a 4-bit
-                    counter a value of -8 would be used to represent
-                    "infinity".
+                  An IBF consists of an injective mapping function M mapping
+                  elements to k out of L buckets. Each of the L buckets stores
+                  a signed COUNTER, an IDSUM and an XHASH.
+                  An IDSUM is the XOR of various element IDs.
+                  An XHASH is the XOR of various hash values.
+                  As before, the values used for k, L and the number of bits 
used
+                  for the signed counter and the XHASH depend
+                  on the set size and various other trade-offs.
+                </t>
+                <t>
+                  If the IBF size is too small or the mapping
+                  function does not spread out the elements
+                  uniformly, the signed counter can overflow or
+                  underflow. As with the CBF, the "maximum" value is
+                  thus used to represent "infinite".  As there is no
+                  need to distinguish between overflow and
+                  underflow, the most canonical representation of
+                  "infinite" would be the minimum value of the
+                  counter in the canonical 2-complement
+                  interpretation.  For example, given a 4-bit
+                  counter a value of -8 would be used to represent
+                  "infinity".
                 </t>
                     <figure anchor="figure_ibf_structure">
                         <artwork name="" type="" align="left" alt=""><![CDATA[
@@ -737,7 +748,8 @@ FUNCTION id_calculation (element,ibf_salt):
                 <section anchor="ibf_format_bucket_identification" 
numbered="true" toc="default">
                     <name>Mapping Function</name>
                     <t>
-                        The mapping function M as described above in the 
figure <xref target="bf_mapping_function_math" format="default" />
+                      For an IBF, it is beneficial to use an injective mapping 
function M.
+                      The mapping function M as described above in the figure 
<xref target="bf_mapping_function_math" format="default" />
                         decides in which buckets the ID and HASH have to be 
binary XORed to. In practice
                         the following algorithm is used:
                     </t>

-- 
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]