gnunet-svn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnunet-scheme] 148/324: utils: tokeniser: Split message streams into in


From: gnunet
Subject: [gnunet-scheme] 148/324: utils: tokeniser: Split message streams into individual messages.
Date: Tue, 21 Sep 2021 13:23:08 +0200

This is an automated email from the git hooks/post-receive script.

maxime-devos pushed a commit to branch master
in repository gnunet-scheme.

commit 7eabf920d71545902624632892e3687c84f1d51b
Author: Maxime Devos <maximedevos@telenet.be>
AuthorDate: Fri Jul 2 19:55:29 2021 +0200

    utils: tokeniser: Split message streams into individual messages.
    
    * Makefile.am
      (modules): Add new module (gnu gnunet utils tokeniser).
      (SCM_TESTS): Add new test tests/tokeniser.scm.
    * README.org (Message queues): Describe new module.
    * gnu/gnunet/util/struct.scm (/:message-header): Note that
      the new module depends on the exact layout.
    * gnu/gnunet/utils/tokeniser.scm
      (make-tokeniser, tokeniser?, <tokeniser>): New record type.
      (&interrupted-tokeniser-volation)
      make-interrupted-tokeniser-violation)
      interrupted-tokeniser-violation?): New condition type.
      (&kaput-tokeniser-error, make-tokeniser-error,
      kaput-tokeniser-error?): New condition type.
      (add-bytevector!): New procedure.
    * tests/tokeniser.scm
      (fluffed-bytevector, no-return/overly-small, no-return/done)
      (no-handle/message, merge-bytevectors, catch-errors): New procedures
      for tests.
      (choose-message, choose-many-message, positions->ranges)
      (choose-split-positions): New QuickCheck generators for tests.
      ($messages-and-ranges): New QuickCheck arbitrary for tests.
      ("[prop] complete messages are passed through")
      ("message fragmented on header/data boundary reassembled")
      ("message fragmented in size field and after message header, some data")
      ("[prop] all fragmented & multiple messages received")
      ("overly small message error (complete header)")
      ("overly small message error (header split in size field)")
      ("huge message, split early")
      ("re-entrancy from message handler is detected (complete message)")
      ("tokeniser becomes kaput, split after size field")
      ("tokeniser becomes kaput, split inside size field"): New tests.
      (huge-bv): New variable.
    * tests/utils.scm (calls-in-tail-position?): New buggy procedure for
      tests.
---
 Makefile.am                    |   4 +-
 README.org                     |   4 +
 gnu/gnunet/util/struct.scm     |   4 +
 gnu/gnunet/utils/tokeniser.scm | 283 +++++++++++++++++++++++++
 tests/tokeniser.scm            | 471 +++++++++++++++++++++++++++++++++++++++++
 tests/utils.scm                |  29 ++-
 6 files changed, 793 insertions(+), 2 deletions(-)

diff --git a/Makefile.am b/Makefile.am
index 4ab6680..7dda556 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -47,6 +47,7 @@ modules = \
   gnu/gnunet/utils/hat-let.scm \
   gnu/gnunet/utils/netstruct.scm \
   gnu/gnunet/utils/platform-enum.scm \
+  gnu/gnunet/utils/tokeniser.scm \
   \
   gnu/gnunet/config/parser.scm \
   gnu/gnunet/config/value-parser.scm \
@@ -107,7 +108,8 @@ SCM_TESTS = \
   tests/config-value-parser.scm \
   tests/config-expander.scm \
   tests/config-db.scm \
-  tests/netstruct.scm
+  tests/netstruct.scm \
+  tests/tokeniser.scm
 
 SCM_TESTS_ENVIRONMENT = \
   GUILE_AUTO_COMPILE=0 \
diff --git a/README.org b/README.org
index d2a6979..a02d613 100644
--- a/README.org
+++ b/README.org
@@ -88,6 +88,10 @@
 
    + TODO actual queues?  Maybe we don't need them?
    + TODO filling the queues
+
+   Message queue implementations based on streams I/O can use
+   (gnu gnunet utils tokeniser), to split the message stream
+   into separate messages.
 ** Configuration                                                  :test:good:
    + gnu/gnunet/config/parser.scm: Parse configuration files.
    + gnu/gnunet/config/expand.scm: Perform variable expansion.
diff --git a/gnu/gnunet/util/struct.scm b/gnu/gnunet/util/struct.scm
index abec52b..98b6d10 100644
--- a/gnu/gnunet/util/struct.scm
+++ b/gnu/gnunet/util/struct.scm
@@ -36,6 +36,10 @@
              (documentation
               "This is represented as an array of uint32 in GNUnet"))))
 
+    ;; If this definition is ever changed (breaking compatibility with
+    ;; C GNUnet), make sure to change (gnu gnunet utils tokeniser)
+    ;; appropriately as well -- the field types and positions are
+    ;; hardcoded there.
     (define-type /:message-header
       (structure/packed
        (synopsis "Header for all communications")
diff --git a/gnu/gnunet/utils/tokeniser.scm b/gnu/gnunet/utils/tokeniser.scm
new file mode 100644
index 0000000..aa8a041
--- /dev/null
+++ b/gnu/gnunet/utils/tokeniser.scm
@@ -0,0 +1,283 @@
+;; This file is part of scheme-GNUnet, a Scheme port of GNUnet .
+;;  Copyright (C) 2010, 2016, 2017 GNUnet e.V.
+;;  Copyright (C) 2021 Maxime Devos <maximedevos@telenet.be>
+;;
+;; GNUnet is free software: you can redistribute it and/or modify it
+;; under the terms of the GNU Affero General Public License as published
+;; by the Free Software Foundation, either version 3 of the License,
+;;  or (at your option) any later version.
+;;
+;; GNUnet is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; Affero General Public License for more details.
+;;
+;; You should have received a copy of the GNU Affero General Public License
+;;  along with this program.  If not, see <http://www.gnu.org/licenses/>.
+;;
+;; SPDX-License-Identifier: AGPL3.0-or-later
+
+;; C file: util/mst.c
+;; Brief: convenience functions for handling inbound message buffers
+;; Author: Christian Grothoff
+;; Adapted to Scheme by Maxime Devos
+;;
+;; The most prominent use would be the implementation of message queues
+;; over stream sockets, where the separate messages need to be split
+;; from each other.  However, it is used in some other places as
+;; well.
+;;
+;; The Scheme implementation does not support the 'purge' and 'one-shot'
+;; modes. 'purge' should be simple to implement though and 'one-shot'
+;; could be implemented with delimited continuations.
+
+(define-library (gnu gnunet utils tokeniser)
+  (export make-tokeniser
+         tokeniser?
+
+         &interrupted-tokeniser-violation
+         make-interrupted-tokeniser-violation
+         interrupted-tokeniser-violation?
+
+         &kaput-tokeniser-error
+         make-kaput-tokeniser-error
+         kaput-tokeniser-error?
+
+         add-bytevector!)
+  (import (only (rnrs base)
+               define and < assert begin quote lambda
+               >= integer? exact? <= expt = cond
+               let + - eq? > * min if)
+         (only (rnrs conditions)
+               define-condition-type condition make-who-condition
+               &violation &error)
+         (only (rnrs exceptions)
+               raise)
+         (only (rnrs bytevectors)
+               endianness bytevector-copy! bytevector? bytevector-length
+               bytevector-u8-ref bytevector-u8-set! bytevector-u16-ref
+               bytevector-u16-set! make-bytevector)
+         (only (rnrs records syntactic)
+               define-record-type)
+         (only (srfi srfi-26) cut)
+         (only (guile) lambda*)
+         (only (gnu gnunet util struct)
+               /:message-header)
+         (only (gnu gnunet netstruct syntactic)
+               sizeof)
+         (only (gnu gnunet utils hat-let) let^))
+  (begin
+    (define-condition-type &interrupted-tokeniser-violation &violation
+      make-interrupted-tokeniser-violation interrupted-tokeniser-violation?)
+    (define-condition-type &kaput-tokeniser-error &error
+      make-kaput-tokeniser-error kaput-tokeniser-error?)
+
+    (define-record-type
+       (<tokeniser> make-tokeniser tokeniser-state?)
+      ;; Current buffer.
+      ;;
+      ;; Alternatively, when @code{add-bytevector-copy!} is
+      ;; being called, this is temporarily set to @code{#f},
+      ;; to detect re-entrancy. And if a message with size
+      ;; less than the message header is found, then it is set
+      ;; to @code{#t}, marking the tokeniser as ‘kaput’.
+      (fields (mutable buffer tokeniser-buffer set-tokeniser-buffer!)
+             ;; Number of bytes in the buffer.
+             (mutable position tokeniser-position set-tokeniser-position!))
+      (protocol
+       (lambda (%make)
+        (lambda* (#:key
+                  (initial-size (sizeof /:message-header '())))
+          "Make an empty tokeniser.  A buffer of size @var{initial-size}
+will be pre-allocated.  This size must be an exact natural and it
+might be adjusted."
+          (assert (and (integer? initial-size) (exact? initial-size)
+                       (>= initial-size 0)))
+          (%make (make-bytevector
+                  (cond ((<= initial-size (sizeof /:message-header '()))
+                         (sizeof /:message-header '()))
+                        ((>= initial-size (expt 2 16))
+                         (expt 2 16))
+                        (#t initial-size)))
+                 0))))
+      (opaque #t)
+      (sealed #t))
+
+    (define (add-bytevector! tok bv offset length
+                            handle/message
+                            return/done
+                            return/overly-small)
+      "Feed up to @var{length} bytes from the bytevector @var{bv}
+starting at @var{offset} to the tokeniser @var{tok}.
+
+When a complete message is assembled, the callback @var{handle/message}
+is called with an appropriate bytevector region.  This bytevector region
+is part of the passed bytevector range (@var{bv}, @var{offset}, @var{length})
+or the tokeniser's internal buffer.
+
+If a message size was overly small, i.e., smaller than its header,
+then @var{return/overly-small} is called in tail position with the
+specified message type (as an integer) and message size.  In that case,
+@var{tok} will be marked as kaput.  As the message type is not always
+available, sometimes @code{#false} will be pased instead.
+
+On success, @code{return/done} is called in tail position without
+arguments.
+
+This procedure may only be called if @var{tok} isn't kaput,
+and it may not be called re-entrantly.  In the former case,
+a @code{&kaput-tokeniser-error} is raised.  In the latter case,
+a @code{&interrupted-tokeniser-violation} may be raised
+but this cannot be guaranteed."
+      ;; ^ mainly due to parallelism reasons
+      (define set-buffer! (cut set-tokeniser-buffer! tok <>))
+      (define set-position! (cut set-tokeniser-position! tok <>))
+      (define mark-kaput! (cut set-tokeniser-buffer! tok #t))
+      (define (maybe-reallocate/no-move buffer minimal-size)
+       "Return a fresh bytevector or the bytevector @var{buffer} of
+at least size @var{minimal-size}.  Avoid allocations."
+       (if (<= minimal-size (bytevector-length buffer))
+           buffer
+           (make-bytevector minimal-size)))
+      ;; Possibilities:
+      ;;  (a) @var{length} is zero. Then there's nothing to do!
+      ;;      The other possibilities will assume @var{length}
+      ;;      is at least one.
+      ;;
+      ;;  (b) If the tokeniser buffer is empty and @var{bv} starts
+      ;;      with a complete message, then call the processor
+      ;;      on the message and continue.
+      ;;
+      ;;  (c) If the tokeniser buffer is empty and @var{bv} starts
+      ;;      with an incomplete message, then copy the partial message
+      ;;      to the tokeniser buffer (reallocating it if necessary)
+      ;;      and stop.
+      ;;
+      ;;      If the message size is known, it is considered necessary
+      ;;      for the tokeniser buffer to be at least that size.
+      ;;
+      ;;  (d) If the tokeniser buffer is non-empty and the message size
+      ;;      of the partial message in that buffer is unknown,
+      ;;      then determine the size, if necessary reallocate the tokeniser
+      ;;      buffer to be at least that size, copy the size into the
+      ;;      buffer and continue.
+      ;;
+      ;;      The message won't be complete yet, as a 'type' field always
+      ;;      comes after the 'size' field in the message header.
+      ;;
+      ;;  (e) If the tokeniser buffer is non-empty and the message size
+      ;;      of the partial message in that buffer is known,
+      ;;      then copy the remainder of the message into the tokeniser
+      ;;      buffer (as far as possible).
+      ;;
+      ;;      If this makes the message complete, then process the message
+      ;;      and continue.  If the message isn't complete, then just stop,
+      ;;      as all of @var{bv} has been copied.
+      (define (continue buffer position bv offset length)
+       ;; The buffer is set before calls to return/done or return/overly-small.
+       ;; The position is set after it changes and before the tail-iteration
+       ;; into 'continue'.
+       (cond ((= length 0)
+              (set-buffer! buffer)
+              (return/done)) ; possibility (a)
+             ((and (= position 0)
+                   ;; possibility (c), length unknown
+                   (< length 2))
+              (bytevector-u8-set! buffer position
+                                  (bytevector-u8-ref bv offset))
+              (set-buffer! buffer)
+              (set-position! 1)
+              (return/done))
+             ((= position 0) ; and (>= length 2)
+              (let ((size (bytevector-u16-ref bv offset (endianness big))))
+                (cond ((< size (sizeof /:message-header '()))
+                       (mark-kaput!)
+                       (return/overly-small
+                        (and (>= length 4)
+                             ;; + 2: skip the "size" field and read
+                             ;; the 'type' field
+                             (bytevector-u16-ref bv (+ offset 2)
+                                                 (endianness big)))
+                        size))
+                      ;; possibility (b)
+                      ((<= size length)
+                       (handle/message bv offset size)
+                       (continue buffer position bv (+ offset size)
+                                 (- length size)))
+                      ;; Now, (< length size) -- possibility (c).
+                      (#t
+                       (let ((buffer
+                              ;; Re-allocate the buffer if required.
+                              (maybe-reallocate/no-move buffer size)))
+                         ;; Write the partial message to the buffer
+                         (bytevector-copy! bv offset buffer 0 length)
+                         (set-buffer! buffer)
+                         (set-position! (+ position length))
+                         (return/done))))))
+             ((>= position 2) ; possibility (e)
+              (let^ ((! size (bytevector-u16-ref buffer 0 (endianness big)))
+                     (!! (<= (sizeof /:message-header '()) size))
+                     (!! (<= size (bytevector-length buffer)))
+                     (!! (< position size))
+                     ;; How many bytes must be copied?
+                     (! extra (min length (- size position)))
+                     ;; Copy the bytes.
+                     (_ (bytevector-copy! bv offset buffer position extra))
+                     (! position (+ position extra))
+                     (!! (<= position size))
+                     ;; do not set the buffer yet, such that
+                     ;; re-entrancy from the 'handle/message' callback
+                     ;; can be detected.
+                     (? (< position size)
+                        ;; Message is not yet complete --> stop.
+                        (assert (= length extra))
+                        (set-buffer! buffer)
+                        ;; some bytes have been copied
+                        (set-position! position)
+                        (return/done)))
+                    ;; Message is complete --> process it and continue
+                    ;; (there may be other messages as well!)
+                    (handle/message buffer 0 size)
+                    (set-position! 0)
+                    (continue buffer 0 bv (+ offset extra) (- length extra))))
+             ;; (< position 2), possibility (d)
+             (#t
+              (let^ ((! size/byte-0 (bytevector-u8-ref buffer 0))
+                     (! size/byte-1 (bytevector-u8-ref bv offset))
+                     (! size (+ (* (expt 2 8) size/byte-0)
+                                size/byte-1))
+                     (? (< size (sizeof /:message-header '()))
+                        (mark-kaput!)
+                        (return/overly-small
+                         (and (>= length 3)
+                              (bytevector-u16-ref bv (+ offset 1)
+                                                  (endianness big)))
+                         size))
+                     (! buffer (maybe-reallocate/no-move buffer size)))
+                    (bytevector-u16-set! buffer 0 size (endianness big))
+                    (set-position! 2)
+                    (continue buffer 2 bv (+ offset 1) (- length 1))))))
+      (let^ ((! buffer (tokeniser-buffer tok))
+            (! position (tokeniser-position tok))
+            (? (eq? buffer #t)
+               (raise (condition
+                       (make-who-condition 'add-bytevector!)
+                       (make-kaput-tokeniser-error))))
+            (? (eq? buffer #f)
+               (raise (condition
+                       (make-who-condition 'add-bytevector!)
+                       (make-interrupted-tokeniser-violation))))
+            (!! (and (bytevector? buffer)
+                     (integer? position)
+                     (exact? position)
+                     (integer? offset)
+                     (exact? offset)
+                     (integer? length)
+                     (exact? length)
+                     (<= (+ offset length) (bytevector-length bv))
+                     (<= 0 position)
+                     (< position (bytevector-length buffer)))))
+           ;; The buffer will be restored at the call to
+           ;; 'return/done' or 'return/overly-small'.
+           (set-buffer! #f)
+           (continue buffer position bv offset length)))))
diff --git a/tests/tokeniser.scm b/tests/tokeniser.scm
new file mode 100644
index 0000000..c26d272
--- /dev/null
+++ b/tests/tokeniser.scm
@@ -0,0 +1,471 @@
+;;   This file is part of scheme-GNUnet, a partial Scheme port of GNUnet.
+;;   Copyright (C) 2021 Maxime Devos <maximedevos@telenet.be>
+;;
+;;   scheme-GNUnet is free software: you can redistribute it and/or modify it
+;;   under the terms of the GNU Affero General Public License as published
+;;   by the Free Software Foundation, either version 3 of the License,
+;;   or (at your option) any later version.
+;;
+;;   scheme-GNUnet is distributed in the hope that it will be useful, but
+;;   WITHOUT ANY WARRANTY; without even the implied warranty of
+;;   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;;   Affero General Public License for more details.
+;;
+;;   You should have received a copy of the GNU Affero General Public License
+;;   along with this program.  If not, see <http://www.gnu.org/licenses/>.
+;;
+;;   SPDX-License-Identifier: AGPL-3.0-or-later
+(import (tests utils)
+       (quickcheck)
+       (quickcheck property)
+       (quickcheck arbitrary)
+       (quickcheck generator)
+       (gnu gnunet utils tokeniser)
+       (gnu gnunet utils bv-slice)
+       (srfi srfi-1)
+       (srfi srfi-8)
+       (srfi srfi-43)
+       (only (ice-9 control) let/ec)
+       (ice-9 match)
+       (only (system foreign)
+             pointer->bytevector bytevector->pointer)
+       (only (rnrs base) assert)
+       (only (rnrs exceptions) guard)
+       (only (rnrs conditions)
+             assertion-violation? condition-who)
+       (rnrs bytevectors)
+        (gnu gnunet netstruct syntactic)
+       (gnu gnunet util struct))
+
+(define (fluffed-bytevector %size %off fluff)
+  ;; Returned bytevector is a complete message.
+  (let* ((size (+ %size (sizeof /:message-header '())))
+        (bv (make-bytevector (+ %off size)))
+        (s (bv-slice/read-write bv)))
+    (bytevector-copy! fluff 0 bv 0
+                     (min (bytevector-length fluff)
+                          (bytevector-length bv)))
+    (set%! /:message-header '(size)
+          (slice-slice s %off (sizeof /:message-header '()))
+          size)
+    (values bv %off size)))
+
+(test-begin "tokeniser")
+
+(define (no-return/overly-small . _)
+  (error "unexpected call to return/overly-small"))
+
+(define (no-return/done . _)
+  (error "unexpected call to return/done"))
+
+(define (no-handle/message . _)
+  (error "unexpected call to handle/message"))
+
+;; Some bugs this found:
+;;  * in some places, the 'offset' argument was ignored
+;;    and always the first or first two bytes of 'bv'
+;;    in 'continue' in 'add-bytevector!' would be used.
+;;  * some incorrect assertions in the tokeniser code
+;;  * when a message was fragmented (between header and data),
+;;    the data was not copied
+;;  * the type of a message was calculated incorrectly
+;;    whe ‘overly small message errors’ are reported
+;;  * the type of a message could not be calculated
+;;    for some fragmented messages, if the first 'length'
+;;    was 1 and the second 'length' was 3.
+
+(test-assert "[prop] complete messages are passed through"
+  (quickcheck
+   (property
+     ((%size $natural)
+      (%off  $natural)
+      (fluff $bytevector))
+     (receive (bv offset size)
+        (fluffed-bytevector %size %off fluff)
+       (let ((handled? #f))
+        (add-bytevector!
+         (make-tokeniser)
+         bv offset size
+         (lambda (bv2 offset2 length)
+           (assert (not handled?))
+           (assert (eq? bv bv2))
+           (assert (= offset offset2))
+           (assert (= length size))
+           (set! handled? #t))
+         (lambda _ handled?)
+         no-return/overly-small))))))
+
+;; Test fragmented messages and multiple messages
+;; are properly handled.
+
+(define choose-message
+  (generator-let*
+   ((length (choose-one/weighted
+            ;; Very small
+            `((1 . ,(choose-integer 4 5))
+              (1 . ,(choose-integer 5 6))
+              ;; Some length
+              (2 . ,(choose-integer 4 9)))))
+    ;; Arbitrary 'type' field and data
+    (filler (choose-bytevector (- length 2))))
+   (let ((bv (make-bytevector length)))
+     (bytevector-u16-set! bv 0 length (endianness big))
+     (bytevector-copy! filler 0 bv 2 (bytevector-length filler))
+     (generator-return bv))))
+
+;; Generate a list of message bytevectors
+(define choose-many-messages
+  (sized-generator
+   (cut choose-list choose-message <>)))
+
+(define (merge-bytevectors messages)
+  (define size (reduce + 0 (map bytevector-length messages)))
+  (define bv (make-bytevector size))
+  (let loop ((offset 0) (messages messages))
+    (if (null? messages)
+       bv
+       (let* ((head (car messages))
+              (tail (cdr messages))
+              (message-size (bytevector-length head)))
+         (bytevector-copy! head 0 bv offset message-size)
+         (loop (+ offset message-size) tail)))))
+
+;; Try to occassionally split message in annoying places,
+;; and avoid splitting at message boundaries.
+(define (choose-split-positions messages)
+  (let loop ((offset 0) (messages messages))
+    (if (null? messages)
+       (generator-return '())
+       (let* ((head (car messages))
+              (tail (cdr messages))
+              (message-size (bytevector-length head))
+              (data-splittable? (> message-size 5)))
+         (generator-let*
+          ((rest-positions
+            (loop (+ offset message-size) tail))
+           (data-split-positions
+            (if data-splittable?
+                (generator-lift
+                 list
+                 (choose-integer 4 message-size))
+                (generator-return '())))
+           (end-split-positions
+            (choose-one/weighted
+             `((2 . ,(generator-return '()))
+               (1 . ,(generator-return (list message-size))))))
+           (head-split-positions
+            (choose-one/weighted
+             `((3 . ,(generator-return '()))  ; don't split header
+               (2 . ,(generator-return '(1))) ; split inside size field
+               (2 . ,(generator-return '(2))) ; split between size field and 
type
+               (1 . ,(generator-return '(1 2))))))) ; both of above
+          (let* ((all-positions
+                  (append head-split-positions data-split-positions
+                          end-split-positions))
+                 (fixed-positions
+                  (map (cut + <> offset) all-positions)))
+            (generator-return
+             (append fixed-positions rest-positions))))))))
+
+;; A list of (start . length).
+;; Starts at the minimal 'start', and ends at 'end' (exclusive)
+(define* (positions->ranges positions end)
+  (match positions
+    (() `((,end . 0)))
+    ((start) `((,start . ,(- end start))))
+    ((start next . rest)
+     `((,start . ,(- next start))
+       ,@(positions->ranges `(,next ,@rest) end)))))
+
+(define $messages-and-ranges
+  (arbitrary
+   (gen (generator-let*
+        ((messages choose-many-messages)
+         (bv (generator-return
+              (merge-bytevectors messages)))
+         (split-positions
+          (choose-split-positions messages))
+         (ranges
+          (generator-return
+           (positions->ranges (cons 0 split-positions)
+                              (bytevector-length bv)))))
+        (generator-return
+         `#(,messages ,bv ,ranges))))
+   (xform #f))) ; unneeded
+
+;; A simplified test failure case of
+;; "[prop] all fragmented & multiple messages received".
+;; The issue was that (1 2 3 4) was not copied.
+(test-equal "message fragmented on header/data boundary reassembled"
+  #vu8(0 8 50 50 1 2 3 4)
+  (let ((tok (make-tokeniser))
+       ;; Message size: 8
+       (received? #f)
+       (bv #vu8(0 8 50 50 1 2 3 4)))
+    (add-bytevector! tok bv 0 4
+                    no-handle/message (const #t) no-return/overly-small)
+    (add-bytevector! tok bv 4 4
+                    (lambda (bv offset length)
+                      ;; These two assertions are actually an implementation
+                      ;; detail, and test no overly large allocations are
+                      ;; made.
+                      (assert (= 0 offset))
+                      (assert (= length (bytevector-length bv)))
+                      (assert (not received?))
+                      (set! received? (bytevector-copy bv)))
+                    (const #t) no-return/overly-small)
+    received?))
+
+;; Found when debugging a test failure of
+;; "[prop] all fragmented & multiple messages received".
+;; The bug was a missing set-position! call.
+(test-equal "message fragmented in size field and after message header, some 
data"
+  #vu8(0 6 236 197 216 19)
+  (let ((tok (make-tokeniser))
+       (received? #f)
+       (bv #vu8(0 6 236 197 216 19)))
+    ;; copy the zero
+    (add-bytevector! tok bv 0 1
+                    no-handle/message (const #t) no-return/overly-small)
+    ;; copy the rest of the message header
+    (add-bytevector! tok bv 1 3
+                    no-handle/message (const #t) no-return/overly-small)
+    ;; copy the data
+    (add-bytevector! tok bv 4 2
+                    (lambda (bv offset length)
+                      ;; see previous test case
+                      (assert (= 0 offset))
+                      (assert (= length (bytevector-length bv)))
+                      (assert (not received?))
+                      (set! received? (bytevector-copy bv)))
+                    (const #t)
+                    no-return/overly-small)
+    received?))
+
+;; And return/done is called in tail position.
+(test-assert "[prop] all fragmented & multiple messages received"
+  (quickcheck
+   (property
+     ((messages-and-ranges $messages-and-ranges))
+     (match messages-and-ranges
+       (#(messages bv ranges)
+       (assert (= (apply + (map cdr ranges))
+                  (bytevector-length bv)))
+       (guard (e ((assertion-violation? e)
+                  ;; 2: don't include 'make-stack' or
+                  ;; this guard
+                  (display-backtrace (make-stack #t 2) (current-error-port))
+                  (print-exception (current-error-port) #f '%exception (list 
e))
+                  #f))
+         (let ((tok (make-tokeniser))
+               (remove-message!
+                (lambda (bv offset length)
+                  (define bv/range
+                    (pointer->bytevector
+                     (bytevector->pointer bv offset)
+                     length))
+                  ;; Sanity check
+                  (assert (<= 0 offset))
+                  (assert (<= (+ offset length) (bytevector-length bv)))
+                  (let/ec ec
+                    (pair-for-each
+                     (match-lambda
+                       (((and message (set! set-message!)) . rest)
+                        (when (and (bytevector? message)
+                                   (bytevector=? message bv/range))
+                          (set-message! #f) ; mark it as received
+                          (ec))))
+                     messages) ; stop searching
+                    (assert (and #f
+                                 "message not added but still received"))))))
+           (for-each
+            (match-lambda
+              ((start . length)
+               (assert
+                (calls-in-tail-position?
+                 (lambda (return/done)
+                   (add-bytevector! tok bv start length
+                                    remove-message!
+                                    (lambda () (return/done))
+                                    no-return/overly-small))))))
+            ranges)))
+       ;; All messages should have been received.
+       (not (any identity messages)))))))
+
+;; The type was read at an incorrect offset.
+(test-equal "overly small message error (complete header)"
+  (map (lambda (n)
+        `(#t ; in tail position
+          ,(+ (* 256 n) (+ n 1)) ; message type
+          ,n)) ; message size
+       (iota 4))
+  (map (lambda (n)
+        (call-with-values
+            (lambda ()
+              (calls-in-tail-position?
+               (lambda (return/overly-small)
+                 (add-bytevector! (make-tokeniser)
+                                  (u8-list->bytevector
+                                   ;; n (+ n 1): arbitrary message type.
+                                   ;; Two separate values are used for
+                                   ;; the two halves of the u16, to
+                                   ;; detect little / big endianness issues.
+                                   ;;
+                                   ;; GNUnet usually (always?) uses
+                                   ;; big-endian.
+                                   (list 0 n n (+ n 1)))
+                                  0 4
+                                  no-handle/message
+                                  no-return/done
+                                  return/overly-small))))
+          list))
+       ;; 4: size of message header
+       ;; iota makes a list '(0 1 2 3)
+       (iota 4)))
+
+;; A bounds check at the call to return/overly-small
+;; was overly strict, resulting in the message type being missing.
+(test-equal "overly small message error (header split in size field)"
+  (map (lambda (n)
+        `(#t ; in tail position
+          ,(+ (* 256 (+ n 1)) n) ; message type
+          ,n))
+       (iota 4))
+  (map (lambda (n)
+        (let ((tok (make-tokeniser))
+              (bv (u8-list->bytevector
+                   ;; see previous test case for why (+ n 1) n
+                   (list 0 n (+ n 1) n))))
+          (add-bytevector! tok bv 0 1
+                           no-handle/message
+                           (const #t)
+                           no-return/overly-small)
+          (call-with-values
+              (lambda ()
+                (calls-in-tail-position?
+                 (lambda (return/overly-small)
+                   (add-bytevector! tok bv 1 3
+                                    no-handle/message
+                                    no-return/done
+                                    return/overly-small))))
+            list)))
+       (iota 4))) ; see previous test case for why (iota 4)
+
+;; All the previous tests use 'small' messages. That is,
+;; the message sizes were always < 256.  However, messages
+;; with size >= 256 definitely exist.
+;;
+;; This test detects the mutation
+;;   (bytevector-u8-ref bv offset) --> 0
+;; in (! size/byte-0 [...]).
+
+(define huge-bv
+  (let ((bv (make-bytevector #xfffe 17)))
+    (bytevector-u16-set! bv 0 #xfffe (endianness big))
+    bv))
+
+;; Tests:
+;;   * the whole message is received
+;;   * return/done is called in tail position
+(test-equal "huge message, split early"
+  (map (const #t) (iota 16))
+  (map (lambda (split-position)
+        (let ((tok (make-tokeniser))
+              (received? #f))
+          (receive (in-tail-position?)
+              (calls-in-tail-position?
+               (lambda (return/done)
+                 (add-bytevector! tok huge-bv 0 split-position
+                                  no-handle/message
+                                  return/done
+                                  no-return/overly-small)))
+            (assert in-tail-position?))
+          (receive (in-tail-position?)
+              (calls-in-tail-position?
+               (lambda (return/done)
+                 (add-bytevector! tok huge-bv split-position
+                                  (- #xfffe split-position)
+                                  (lambda (bv offset length)
+                                    (assert (not received?))
+                                    ;; really an implementation detail,
+                                    ;; but no bytevector-range-copy
+                                    ;; exists.
+                                    (assert (= 0 offset))
+                                    (assert (= length (bytevector-length bv)))
+                                    (set! received?
+                                          (bytevector-copy bv)))
+                                  return/done
+                                  no-return/overly-small)))
+            (assert in-tail-position?))
+          (equal? huge-bv received?)))
+       (iota 16)))
+
+(define (catch-errors thunk)
+  (guard (e ((interrupted-tokeniser-violation? e)
+            `(,(condition-who e) . interrupted))
+           ((kaput-tokeniser-error? e)
+            `(,(condition-who e) . kaput)))
+    (thunk)))
+
+(test-equal "re-entrancy from message handler is detected (complete message)"
+  '(add-bytevector! . interrupted)
+  (let ((tok (make-tokeniser)))
+    (catch-errors
+     (lambda ()
+       (add-bytevector! tok #vu8(0 4 0 0) 0 4
+                       (lambda (bv offset length)
+                         (add-bytevector! tok #vu8(0 4 1 1) 0 4
+                                          no-handle/message
+                                          no-return/done
+                                          no-return/overly-small)
+                         (assert #f))
+                       no-return/done
+                       no-return/overly-small)))))
+
+(test-equal "tokeniser becomes kaput, split after size field"
+  '(add-bytevector! . kaput)
+  (let ((tok (make-tokeniser))
+       (bv #vu8(0 3)))
+    (receive (tail? type size)
+       (calls-in-tail-position?
+        (lambda (return/overly-small)
+          (add-bytevector! tok bv 0 2 no-handle/message
+                           no-return/done
+                           return/overly-small)))
+      (assert (eq? #f type))
+      (assert (= size 3))
+      (assert tail?))
+    (catch-errors
+     (lambda ()
+       (add-bytevector! tok #vu8(0) 0 1
+                       no-handle/message no-return/done no-return/overly-small)
+       (error "unreachable")))))
+
+(test-equal "tokeniser becomes kaput, split inside size field"
+  '(add-bytevector! . kaput)
+  (let ((tok (make-tokeniser))
+       (bv #vu8(0 3 4 5)))
+    (receive (tail?)
+       (calls-in-tail-position?
+        (lambda (return/done)
+          (add-bytevector! tok bv 0 1 no-handle/message
+                           return/done
+                           no-return/overly-small)))
+      (assert tail?))
+    (receive (tail? type size)
+       (calls-in-tail-position?
+        (lambda (return/overly-small)
+          (add-bytevector! tok bv 1 2 no-handle/message
+                           no-return/done
+                           return/overly-small)))
+      (assert tail?)
+      (assert (= size 3))
+      (assert (eq? type #f)))
+    (catch-errors
+     (lambda ()
+       (add-bytevector! tok bv 2 2
+                       no-handle/message no-return/done
+                       no-return/overly-small)
+       (error "unreachable")))))
+
+(test-end "tokeniser")
diff --git a/tests/utils.scm b/tests/utils.scm
index e6ba993..0d91a9a 100644
--- a/tests/utils.scm
+++ b/tests/utils.scm
@@ -16,7 +16,9 @@
 ;;
 ;; SPDX-License-Identifier: AGPL3.0-or-later
 (define-module (tests utils)
-  #:export (conservative-gc?))
+  #:use-module (srfi srfi-8)
+  #:use-module ((rnrs base) #:select (assert))
+  #:export (conservative-gc? calls-in-tail-position?))
 
 ;; Current versions of guile (at least 3.0.5) use a conservative
 ;; garbage collector, so some tests concerning garbage collection
@@ -27,3 +29,28 @@
   (if (equal? "yes" (getenv "TOLERATE_CONSERVATIVE_COLLECTORS"))
       #t
       #f))
+
+(define (calls-in-tail-position? proc)
+  "Does @var{proc} calls its argument in tail position?
+Additionally, return the values returned to the argument
+of @var{proc} in-order. @var{proc} should not return multiple
+times."
+  (receive (continuation . arguments)
+      (let ((t (make-prompt-tag 'tail-position?)))
+       (call-with-prompt t
+         (lambda ()
+           (proc (lambda args (apply abort-to-prompt t args))))
+         (lambda _ (apply values _))))
+    (apply values
+          (= 1 (stack-length (make-stack continuation)))
+          arguments)))
+
+;; Some basic checks
+(assert (calls-in-tail-position? (lambda (thunk) (thunk))))
+;; TODO figure out why these fail ...
+#;
+(assert (not (calls-in-tail-position? (lambda (thunk) (thunk) 1))))
+#;
+(assert (not (calls-in-tail-position? (lambda (thunk) (+ 1 (thunk))))))
+#;
+(assert (not (calls-in-tail-position? (lambda (thunk) (for-each thunk '("bla" 
"bla"))))))

-- 
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]