[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gnunet-scheme] 148/324: utils: tokeniser: Split message streams into in
From: |
gnunet |
Subject: |
[gnunet-scheme] 148/324: utils: tokeniser: Split message streams into individual messages. |
Date: |
Tue, 21 Sep 2021 13:23:08 +0200 |
This is an automated email from the git hooks/post-receive script.
maxime-devos pushed a commit to branch master
in repository gnunet-scheme.
commit 7eabf920d71545902624632892e3687c84f1d51b
Author: Maxime Devos <maximedevos@telenet.be>
AuthorDate: Fri Jul 2 19:55:29 2021 +0200
utils: tokeniser: Split message streams into individual messages.
* Makefile.am
(modules): Add new module (gnu gnunet utils tokeniser).
(SCM_TESTS): Add new test tests/tokeniser.scm.
* README.org (Message queues): Describe new module.
* gnu/gnunet/util/struct.scm (/:message-header): Note that
the new module depends on the exact layout.
* gnu/gnunet/utils/tokeniser.scm
(make-tokeniser, tokeniser?, <tokeniser>): New record type.
(&interrupted-tokeniser-volation)
make-interrupted-tokeniser-violation)
interrupted-tokeniser-violation?): New condition type.
(&kaput-tokeniser-error, make-tokeniser-error,
kaput-tokeniser-error?): New condition type.
(add-bytevector!): New procedure.
* tests/tokeniser.scm
(fluffed-bytevector, no-return/overly-small, no-return/done)
(no-handle/message, merge-bytevectors, catch-errors): New procedures
for tests.
(choose-message, choose-many-message, positions->ranges)
(choose-split-positions): New QuickCheck generators for tests.
($messages-and-ranges): New QuickCheck arbitrary for tests.
("[prop] complete messages are passed through")
("message fragmented on header/data boundary reassembled")
("message fragmented in size field and after message header, some data")
("[prop] all fragmented & multiple messages received")
("overly small message error (complete header)")
("overly small message error (header split in size field)")
("huge message, split early")
("re-entrancy from message handler is detected (complete message)")
("tokeniser becomes kaput, split after size field")
("tokeniser becomes kaput, split inside size field"): New tests.
(huge-bv): New variable.
* tests/utils.scm (calls-in-tail-position?): New buggy procedure for
tests.
---
Makefile.am | 4 +-
README.org | 4 +
gnu/gnunet/util/struct.scm | 4 +
gnu/gnunet/utils/tokeniser.scm | 283 +++++++++++++++++++++++++
tests/tokeniser.scm | 471 +++++++++++++++++++++++++++++++++++++++++
tests/utils.scm | 29 ++-
6 files changed, 793 insertions(+), 2 deletions(-)
diff --git a/Makefile.am b/Makefile.am
index 4ab6680..7dda556 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -47,6 +47,7 @@ modules = \
gnu/gnunet/utils/hat-let.scm \
gnu/gnunet/utils/netstruct.scm \
gnu/gnunet/utils/platform-enum.scm \
+ gnu/gnunet/utils/tokeniser.scm \
\
gnu/gnunet/config/parser.scm \
gnu/gnunet/config/value-parser.scm \
@@ -107,7 +108,8 @@ SCM_TESTS = \
tests/config-value-parser.scm \
tests/config-expander.scm \
tests/config-db.scm \
- tests/netstruct.scm
+ tests/netstruct.scm \
+ tests/tokeniser.scm
SCM_TESTS_ENVIRONMENT = \
GUILE_AUTO_COMPILE=0 \
diff --git a/README.org b/README.org
index d2a6979..a02d613 100644
--- a/README.org
+++ b/README.org
@@ -88,6 +88,10 @@
+ TODO actual queues? Maybe we don't need them?
+ TODO filling the queues
+
+ Message queue implementations based on streams I/O can use
+ (gnu gnunet utils tokeniser), to split the message stream
+ into separate messages.
** Configuration :test:good:
+ gnu/gnunet/config/parser.scm: Parse configuration files.
+ gnu/gnunet/config/expand.scm: Perform variable expansion.
diff --git a/gnu/gnunet/util/struct.scm b/gnu/gnunet/util/struct.scm
index abec52b..98b6d10 100644
--- a/gnu/gnunet/util/struct.scm
+++ b/gnu/gnunet/util/struct.scm
@@ -36,6 +36,10 @@
(documentation
"This is represented as an array of uint32 in GNUnet"))))
+ ;; If this definition is ever changed (breaking compatibility with
+ ;; C GNUnet), make sure to change (gnu gnunet utils tokeniser)
+ ;; appropriately as well -- the field types and positions are
+ ;; hardcoded there.
(define-type /:message-header
(structure/packed
(synopsis "Header for all communications")
diff --git a/gnu/gnunet/utils/tokeniser.scm b/gnu/gnunet/utils/tokeniser.scm
new file mode 100644
index 0000000..aa8a041
--- /dev/null
+++ b/gnu/gnunet/utils/tokeniser.scm
@@ -0,0 +1,283 @@
+;; This file is part of scheme-GNUnet, a Scheme port of GNUnet .
+;; Copyright (C) 2010, 2016, 2017 GNUnet e.V.
+;; Copyright (C) 2021 Maxime Devos <maximedevos@telenet.be>
+;;
+;; GNUnet is free software: you can redistribute it and/or modify it
+;; under the terms of the GNU Affero General Public License as published
+;; by the Free Software Foundation, either version 3 of the License,
+;; or (at your option) any later version.
+;;
+;; GNUnet is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+;; Affero General Public License for more details.
+;;
+;; You should have received a copy of the GNU Affero General Public License
+;; along with this program. If not, see <http://www.gnu.org/licenses/>.
+;;
+;; SPDX-License-Identifier: AGPL3.0-or-later
+
+;; C file: util/mst.c
+;; Brief: convenience functions for handling inbound message buffers
+;; Author: Christian Grothoff
+;; Adapted to Scheme by Maxime Devos
+;;
+;; The most prominent use would be the implementation of message queues
+;; over stream sockets, where the separate messages need to be split
+;; from each other. However, it is used in some other places as
+;; well.
+;;
+;; The Scheme implementation does not support the 'purge' and 'one-shot'
+;; modes. 'purge' should be simple to implement though and 'one-shot'
+;; could be implemented with delimited continuations.
+
+(define-library (gnu gnunet utils tokeniser)
+ (export make-tokeniser
+ tokeniser?
+
+ &interrupted-tokeniser-violation
+ make-interrupted-tokeniser-violation
+ interrupted-tokeniser-violation?
+
+ &kaput-tokeniser-error
+ make-kaput-tokeniser-error
+ kaput-tokeniser-error?
+
+ add-bytevector!)
+ (import (only (rnrs base)
+ define and < assert begin quote lambda
+ >= integer? exact? <= expt = cond
+ let + - eq? > * min if)
+ (only (rnrs conditions)
+ define-condition-type condition make-who-condition
+ &violation &error)
+ (only (rnrs exceptions)
+ raise)
+ (only (rnrs bytevectors)
+ endianness bytevector-copy! bytevector? bytevector-length
+ bytevector-u8-ref bytevector-u8-set! bytevector-u16-ref
+ bytevector-u16-set! make-bytevector)
+ (only (rnrs records syntactic)
+ define-record-type)
+ (only (srfi srfi-26) cut)
+ (only (guile) lambda*)
+ (only (gnu gnunet util struct)
+ /:message-header)
+ (only (gnu gnunet netstruct syntactic)
+ sizeof)
+ (only (gnu gnunet utils hat-let) let^))
+ (begin
+ (define-condition-type &interrupted-tokeniser-violation &violation
+ make-interrupted-tokeniser-violation interrupted-tokeniser-violation?)
+ (define-condition-type &kaput-tokeniser-error &error
+ make-kaput-tokeniser-error kaput-tokeniser-error?)
+
+ (define-record-type
+ (<tokeniser> make-tokeniser tokeniser-state?)
+ ;; Current buffer.
+ ;;
+ ;; Alternatively, when @code{add-bytevector-copy!} is
+ ;; being called, this is temporarily set to @code{#f},
+ ;; to detect re-entrancy. And if a message with size
+ ;; less than the message header is found, then it is set
+ ;; to @code{#t}, marking the tokeniser as ‘kaput’.
+ (fields (mutable buffer tokeniser-buffer set-tokeniser-buffer!)
+ ;; Number of bytes in the buffer.
+ (mutable position tokeniser-position set-tokeniser-position!))
+ (protocol
+ (lambda (%make)
+ (lambda* (#:key
+ (initial-size (sizeof /:message-header '())))
+ "Make an empty tokeniser. A buffer of size @var{initial-size}
+will be pre-allocated. This size must be an exact natural and it
+might be adjusted."
+ (assert (and (integer? initial-size) (exact? initial-size)
+ (>= initial-size 0)))
+ (%make (make-bytevector
+ (cond ((<= initial-size (sizeof /:message-header '()))
+ (sizeof /:message-header '()))
+ ((>= initial-size (expt 2 16))
+ (expt 2 16))
+ (#t initial-size)))
+ 0))))
+ (opaque #t)
+ (sealed #t))
+
+ (define (add-bytevector! tok bv offset length
+ handle/message
+ return/done
+ return/overly-small)
+ "Feed up to @var{length} bytes from the bytevector @var{bv}
+starting at @var{offset} to the tokeniser @var{tok}.
+
+When a complete message is assembled, the callback @var{handle/message}
+is called with an appropriate bytevector region. This bytevector region
+is part of the passed bytevector range (@var{bv}, @var{offset}, @var{length})
+or the tokeniser's internal buffer.
+
+If a message size was overly small, i.e., smaller than its header,
+then @var{return/overly-small} is called in tail position with the
+specified message type (as an integer) and message size. In that case,
+@var{tok} will be marked as kaput. As the message type is not always
+available, sometimes @code{#false} will be pased instead.
+
+On success, @code{return/done} is called in tail position without
+arguments.
+
+This procedure may only be called if @var{tok} isn't kaput,
+and it may not be called re-entrantly. In the former case,
+a @code{&kaput-tokeniser-error} is raised. In the latter case,
+a @code{&interrupted-tokeniser-violation} may be raised
+but this cannot be guaranteed."
+ ;; ^ mainly due to parallelism reasons
+ (define set-buffer! (cut set-tokeniser-buffer! tok <>))
+ (define set-position! (cut set-tokeniser-position! tok <>))
+ (define mark-kaput! (cut set-tokeniser-buffer! tok #t))
+ (define (maybe-reallocate/no-move buffer minimal-size)
+ "Return a fresh bytevector or the bytevector @var{buffer} of
+at least size @var{minimal-size}. Avoid allocations."
+ (if (<= minimal-size (bytevector-length buffer))
+ buffer
+ (make-bytevector minimal-size)))
+ ;; Possibilities:
+ ;; (a) @var{length} is zero. Then there's nothing to do!
+ ;; The other possibilities will assume @var{length}
+ ;; is at least one.
+ ;;
+ ;; (b) If the tokeniser buffer is empty and @var{bv} starts
+ ;; with a complete message, then call the processor
+ ;; on the message and continue.
+ ;;
+ ;; (c) If the tokeniser buffer is empty and @var{bv} starts
+ ;; with an incomplete message, then copy the partial message
+ ;; to the tokeniser buffer (reallocating it if necessary)
+ ;; and stop.
+ ;;
+ ;; If the message size is known, it is considered necessary
+ ;; for the tokeniser buffer to be at least that size.
+ ;;
+ ;; (d) If the tokeniser buffer is non-empty and the message size
+ ;; of the partial message in that buffer is unknown,
+ ;; then determine the size, if necessary reallocate the tokeniser
+ ;; buffer to be at least that size, copy the size into the
+ ;; buffer and continue.
+ ;;
+ ;; The message won't be complete yet, as a 'type' field always
+ ;; comes after the 'size' field in the message header.
+ ;;
+ ;; (e) If the tokeniser buffer is non-empty and the message size
+ ;; of the partial message in that buffer is known,
+ ;; then copy the remainder of the message into the tokeniser
+ ;; buffer (as far as possible).
+ ;;
+ ;; If this makes the message complete, then process the message
+ ;; and continue. If the message isn't complete, then just stop,
+ ;; as all of @var{bv} has been copied.
+ (define (continue buffer position bv offset length)
+ ;; The buffer is set before calls to return/done or return/overly-small.
+ ;; The position is set after it changes and before the tail-iteration
+ ;; into 'continue'.
+ (cond ((= length 0)
+ (set-buffer! buffer)
+ (return/done)) ; possibility (a)
+ ((and (= position 0)
+ ;; possibility (c), length unknown
+ (< length 2))
+ (bytevector-u8-set! buffer position
+ (bytevector-u8-ref bv offset))
+ (set-buffer! buffer)
+ (set-position! 1)
+ (return/done))
+ ((= position 0) ; and (>= length 2)
+ (let ((size (bytevector-u16-ref bv offset (endianness big))))
+ (cond ((< size (sizeof /:message-header '()))
+ (mark-kaput!)
+ (return/overly-small
+ (and (>= length 4)
+ ;; + 2: skip the "size" field and read
+ ;; the 'type' field
+ (bytevector-u16-ref bv (+ offset 2)
+ (endianness big)))
+ size))
+ ;; possibility (b)
+ ((<= size length)
+ (handle/message bv offset size)
+ (continue buffer position bv (+ offset size)
+ (- length size)))
+ ;; Now, (< length size) -- possibility (c).
+ (#t
+ (let ((buffer
+ ;; Re-allocate the buffer if required.
+ (maybe-reallocate/no-move buffer size)))
+ ;; Write the partial message to the buffer
+ (bytevector-copy! bv offset buffer 0 length)
+ (set-buffer! buffer)
+ (set-position! (+ position length))
+ (return/done))))))
+ ((>= position 2) ; possibility (e)
+ (let^ ((! size (bytevector-u16-ref buffer 0 (endianness big)))
+ (!! (<= (sizeof /:message-header '()) size))
+ (!! (<= size (bytevector-length buffer)))
+ (!! (< position size))
+ ;; How many bytes must be copied?
+ (! extra (min length (- size position)))
+ ;; Copy the bytes.
+ (_ (bytevector-copy! bv offset buffer position extra))
+ (! position (+ position extra))
+ (!! (<= position size))
+ ;; do not set the buffer yet, such that
+ ;; re-entrancy from the 'handle/message' callback
+ ;; can be detected.
+ (? (< position size)
+ ;; Message is not yet complete --> stop.
+ (assert (= length extra))
+ (set-buffer! buffer)
+ ;; some bytes have been copied
+ (set-position! position)
+ (return/done)))
+ ;; Message is complete --> process it and continue
+ ;; (there may be other messages as well!)
+ (handle/message buffer 0 size)
+ (set-position! 0)
+ (continue buffer 0 bv (+ offset extra) (- length extra))))
+ ;; (< position 2), possibility (d)
+ (#t
+ (let^ ((! size/byte-0 (bytevector-u8-ref buffer 0))
+ (! size/byte-1 (bytevector-u8-ref bv offset))
+ (! size (+ (* (expt 2 8) size/byte-0)
+ size/byte-1))
+ (? (< size (sizeof /:message-header '()))
+ (mark-kaput!)
+ (return/overly-small
+ (and (>= length 3)
+ (bytevector-u16-ref bv (+ offset 1)
+ (endianness big)))
+ size))
+ (! buffer (maybe-reallocate/no-move buffer size)))
+ (bytevector-u16-set! buffer 0 size (endianness big))
+ (set-position! 2)
+ (continue buffer 2 bv (+ offset 1) (- length 1))))))
+ (let^ ((! buffer (tokeniser-buffer tok))
+ (! position (tokeniser-position tok))
+ (? (eq? buffer #t)
+ (raise (condition
+ (make-who-condition 'add-bytevector!)
+ (make-kaput-tokeniser-error))))
+ (? (eq? buffer #f)
+ (raise (condition
+ (make-who-condition 'add-bytevector!)
+ (make-interrupted-tokeniser-violation))))
+ (!! (and (bytevector? buffer)
+ (integer? position)
+ (exact? position)
+ (integer? offset)
+ (exact? offset)
+ (integer? length)
+ (exact? length)
+ (<= (+ offset length) (bytevector-length bv))
+ (<= 0 position)
+ (< position (bytevector-length buffer)))))
+ ;; The buffer will be restored at the call to
+ ;; 'return/done' or 'return/overly-small'.
+ (set-buffer! #f)
+ (continue buffer position bv offset length)))))
diff --git a/tests/tokeniser.scm b/tests/tokeniser.scm
new file mode 100644
index 0000000..c26d272
--- /dev/null
+++ b/tests/tokeniser.scm
@@ -0,0 +1,471 @@
+;; This file is part of scheme-GNUnet, a partial Scheme port of GNUnet.
+;; Copyright (C) 2021 Maxime Devos <maximedevos@telenet.be>
+;;
+;; scheme-GNUnet is free software: you can redistribute it and/or modify it
+;; under the terms of the GNU Affero General Public License as published
+;; by the Free Software Foundation, either version 3 of the License,
+;; or (at your option) any later version.
+;;
+;; scheme-GNUnet is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+;; Affero General Public License for more details.
+;;
+;; You should have received a copy of the GNU Affero General Public License
+;; along with this program. If not, see <http://www.gnu.org/licenses/>.
+;;
+;; SPDX-License-Identifier: AGPL-3.0-or-later
+(import (tests utils)
+ (quickcheck)
+ (quickcheck property)
+ (quickcheck arbitrary)
+ (quickcheck generator)
+ (gnu gnunet utils tokeniser)
+ (gnu gnunet utils bv-slice)
+ (srfi srfi-1)
+ (srfi srfi-8)
+ (srfi srfi-43)
+ (only (ice-9 control) let/ec)
+ (ice-9 match)
+ (only (system foreign)
+ pointer->bytevector bytevector->pointer)
+ (only (rnrs base) assert)
+ (only (rnrs exceptions) guard)
+ (only (rnrs conditions)
+ assertion-violation? condition-who)
+ (rnrs bytevectors)
+ (gnu gnunet netstruct syntactic)
+ (gnu gnunet util struct))
+
+(define (fluffed-bytevector %size %off fluff)
+ ;; Returned bytevector is a complete message.
+ (let* ((size (+ %size (sizeof /:message-header '())))
+ (bv (make-bytevector (+ %off size)))
+ (s (bv-slice/read-write bv)))
+ (bytevector-copy! fluff 0 bv 0
+ (min (bytevector-length fluff)
+ (bytevector-length bv)))
+ (set%! /:message-header '(size)
+ (slice-slice s %off (sizeof /:message-header '()))
+ size)
+ (values bv %off size)))
+
+(test-begin "tokeniser")
+
+(define (no-return/overly-small . _)
+ (error "unexpected call to return/overly-small"))
+
+(define (no-return/done . _)
+ (error "unexpected call to return/done"))
+
+(define (no-handle/message . _)
+ (error "unexpected call to handle/message"))
+
+;; Some bugs this found:
+;; * in some places, the 'offset' argument was ignored
+;; and always the first or first two bytes of 'bv'
+;; in 'continue' in 'add-bytevector!' would be used.
+;; * some incorrect assertions in the tokeniser code
+;; * when a message was fragmented (between header and data),
+;; the data was not copied
+;; * the type of a message was calculated incorrectly
+;; whe ‘overly small message errors’ are reported
+;; * the type of a message could not be calculated
+;; for some fragmented messages, if the first 'length'
+;; was 1 and the second 'length' was 3.
+
+(test-assert "[prop] complete messages are passed through"
+ (quickcheck
+ (property
+ ((%size $natural)
+ (%off $natural)
+ (fluff $bytevector))
+ (receive (bv offset size)
+ (fluffed-bytevector %size %off fluff)
+ (let ((handled? #f))
+ (add-bytevector!
+ (make-tokeniser)
+ bv offset size
+ (lambda (bv2 offset2 length)
+ (assert (not handled?))
+ (assert (eq? bv bv2))
+ (assert (= offset offset2))
+ (assert (= length size))
+ (set! handled? #t))
+ (lambda _ handled?)
+ no-return/overly-small))))))
+
+;; Test fragmented messages and multiple messages
+;; are properly handled.
+
+(define choose-message
+ (generator-let*
+ ((length (choose-one/weighted
+ ;; Very small
+ `((1 . ,(choose-integer 4 5))
+ (1 . ,(choose-integer 5 6))
+ ;; Some length
+ (2 . ,(choose-integer 4 9)))))
+ ;; Arbitrary 'type' field and data
+ (filler (choose-bytevector (- length 2))))
+ (let ((bv (make-bytevector length)))
+ (bytevector-u16-set! bv 0 length (endianness big))
+ (bytevector-copy! filler 0 bv 2 (bytevector-length filler))
+ (generator-return bv))))
+
+;; Generate a list of message bytevectors
+(define choose-many-messages
+ (sized-generator
+ (cut choose-list choose-message <>)))
+
+(define (merge-bytevectors messages)
+ (define size (reduce + 0 (map bytevector-length messages)))
+ (define bv (make-bytevector size))
+ (let loop ((offset 0) (messages messages))
+ (if (null? messages)
+ bv
+ (let* ((head (car messages))
+ (tail (cdr messages))
+ (message-size (bytevector-length head)))
+ (bytevector-copy! head 0 bv offset message-size)
+ (loop (+ offset message-size) tail)))))
+
+;; Try to occassionally split message in annoying places,
+;; and avoid splitting at message boundaries.
+(define (choose-split-positions messages)
+ (let loop ((offset 0) (messages messages))
+ (if (null? messages)
+ (generator-return '())
+ (let* ((head (car messages))
+ (tail (cdr messages))
+ (message-size (bytevector-length head))
+ (data-splittable? (> message-size 5)))
+ (generator-let*
+ ((rest-positions
+ (loop (+ offset message-size) tail))
+ (data-split-positions
+ (if data-splittable?
+ (generator-lift
+ list
+ (choose-integer 4 message-size))
+ (generator-return '())))
+ (end-split-positions
+ (choose-one/weighted
+ `((2 . ,(generator-return '()))
+ (1 . ,(generator-return (list message-size))))))
+ (head-split-positions
+ (choose-one/weighted
+ `((3 . ,(generator-return '())) ; don't split header
+ (2 . ,(generator-return '(1))) ; split inside size field
+ (2 . ,(generator-return '(2))) ; split between size field and
type
+ (1 . ,(generator-return '(1 2))))))) ; both of above
+ (let* ((all-positions
+ (append head-split-positions data-split-positions
+ end-split-positions))
+ (fixed-positions
+ (map (cut + <> offset) all-positions)))
+ (generator-return
+ (append fixed-positions rest-positions))))))))
+
+;; A list of (start . length).
+;; Starts at the minimal 'start', and ends at 'end' (exclusive)
+(define* (positions->ranges positions end)
+ (match positions
+ (() `((,end . 0)))
+ ((start) `((,start . ,(- end start))))
+ ((start next . rest)
+ `((,start . ,(- next start))
+ ,@(positions->ranges `(,next ,@rest) end)))))
+
+(define $messages-and-ranges
+ (arbitrary
+ (gen (generator-let*
+ ((messages choose-many-messages)
+ (bv (generator-return
+ (merge-bytevectors messages)))
+ (split-positions
+ (choose-split-positions messages))
+ (ranges
+ (generator-return
+ (positions->ranges (cons 0 split-positions)
+ (bytevector-length bv)))))
+ (generator-return
+ `#(,messages ,bv ,ranges))))
+ (xform #f))) ; unneeded
+
+;; A simplified test failure case of
+;; "[prop] all fragmented & multiple messages received".
+;; The issue was that (1 2 3 4) was not copied.
+(test-equal "message fragmented on header/data boundary reassembled"
+ #vu8(0 8 50 50 1 2 3 4)
+ (let ((tok (make-tokeniser))
+ ;; Message size: 8
+ (received? #f)
+ (bv #vu8(0 8 50 50 1 2 3 4)))
+ (add-bytevector! tok bv 0 4
+ no-handle/message (const #t) no-return/overly-small)
+ (add-bytevector! tok bv 4 4
+ (lambda (bv offset length)
+ ;; These two assertions are actually an implementation
+ ;; detail, and test no overly large allocations are
+ ;; made.
+ (assert (= 0 offset))
+ (assert (= length (bytevector-length bv)))
+ (assert (not received?))
+ (set! received? (bytevector-copy bv)))
+ (const #t) no-return/overly-small)
+ received?))
+
+;; Found when debugging a test failure of
+;; "[prop] all fragmented & multiple messages received".
+;; The bug was a missing set-position! call.
+(test-equal "message fragmented in size field and after message header, some
data"
+ #vu8(0 6 236 197 216 19)
+ (let ((tok (make-tokeniser))
+ (received? #f)
+ (bv #vu8(0 6 236 197 216 19)))
+ ;; copy the zero
+ (add-bytevector! tok bv 0 1
+ no-handle/message (const #t) no-return/overly-small)
+ ;; copy the rest of the message header
+ (add-bytevector! tok bv 1 3
+ no-handle/message (const #t) no-return/overly-small)
+ ;; copy the data
+ (add-bytevector! tok bv 4 2
+ (lambda (bv offset length)
+ ;; see previous test case
+ (assert (= 0 offset))
+ (assert (= length (bytevector-length bv)))
+ (assert (not received?))
+ (set! received? (bytevector-copy bv)))
+ (const #t)
+ no-return/overly-small)
+ received?))
+
+;; And return/done is called in tail position.
+(test-assert "[prop] all fragmented & multiple messages received"
+ (quickcheck
+ (property
+ ((messages-and-ranges $messages-and-ranges))
+ (match messages-and-ranges
+ (#(messages bv ranges)
+ (assert (= (apply + (map cdr ranges))
+ (bytevector-length bv)))
+ (guard (e ((assertion-violation? e)
+ ;; 2: don't include 'make-stack' or
+ ;; this guard
+ (display-backtrace (make-stack #t 2) (current-error-port))
+ (print-exception (current-error-port) #f '%exception (list
e))
+ #f))
+ (let ((tok (make-tokeniser))
+ (remove-message!
+ (lambda (bv offset length)
+ (define bv/range
+ (pointer->bytevector
+ (bytevector->pointer bv offset)
+ length))
+ ;; Sanity check
+ (assert (<= 0 offset))
+ (assert (<= (+ offset length) (bytevector-length bv)))
+ (let/ec ec
+ (pair-for-each
+ (match-lambda
+ (((and message (set! set-message!)) . rest)
+ (when (and (bytevector? message)
+ (bytevector=? message bv/range))
+ (set-message! #f) ; mark it as received
+ (ec))))
+ messages) ; stop searching
+ (assert (and #f
+ "message not added but still received"))))))
+ (for-each
+ (match-lambda
+ ((start . length)
+ (assert
+ (calls-in-tail-position?
+ (lambda (return/done)
+ (add-bytevector! tok bv start length
+ remove-message!
+ (lambda () (return/done))
+ no-return/overly-small))))))
+ ranges)))
+ ;; All messages should have been received.
+ (not (any identity messages)))))))
+
+;; The type was read at an incorrect offset.
+(test-equal "overly small message error (complete header)"
+ (map (lambda (n)
+ `(#t ; in tail position
+ ,(+ (* 256 n) (+ n 1)) ; message type
+ ,n)) ; message size
+ (iota 4))
+ (map (lambda (n)
+ (call-with-values
+ (lambda ()
+ (calls-in-tail-position?
+ (lambda (return/overly-small)
+ (add-bytevector! (make-tokeniser)
+ (u8-list->bytevector
+ ;; n (+ n 1): arbitrary message type.
+ ;; Two separate values are used for
+ ;; the two halves of the u16, to
+ ;; detect little / big endianness issues.
+ ;;
+ ;; GNUnet usually (always?) uses
+ ;; big-endian.
+ (list 0 n n (+ n 1)))
+ 0 4
+ no-handle/message
+ no-return/done
+ return/overly-small))))
+ list))
+ ;; 4: size of message header
+ ;; iota makes a list '(0 1 2 3)
+ (iota 4)))
+
+;; A bounds check at the call to return/overly-small
+;; was overly strict, resulting in the message type being missing.
+(test-equal "overly small message error (header split in size field)"
+ (map (lambda (n)
+ `(#t ; in tail position
+ ,(+ (* 256 (+ n 1)) n) ; message type
+ ,n))
+ (iota 4))
+ (map (lambda (n)
+ (let ((tok (make-tokeniser))
+ (bv (u8-list->bytevector
+ ;; see previous test case for why (+ n 1) n
+ (list 0 n (+ n 1) n))))
+ (add-bytevector! tok bv 0 1
+ no-handle/message
+ (const #t)
+ no-return/overly-small)
+ (call-with-values
+ (lambda ()
+ (calls-in-tail-position?
+ (lambda (return/overly-small)
+ (add-bytevector! tok bv 1 3
+ no-handle/message
+ no-return/done
+ return/overly-small))))
+ list)))
+ (iota 4))) ; see previous test case for why (iota 4)
+
+;; All the previous tests use 'small' messages. That is,
+;; the message sizes were always < 256. However, messages
+;; with size >= 256 definitely exist.
+;;
+;; This test detects the mutation
+;; (bytevector-u8-ref bv offset) --> 0
+;; in (! size/byte-0 [...]).
+
+(define huge-bv
+ (let ((bv (make-bytevector #xfffe 17)))
+ (bytevector-u16-set! bv 0 #xfffe (endianness big))
+ bv))
+
+;; Tests:
+;; * the whole message is received
+;; * return/done is called in tail position
+(test-equal "huge message, split early"
+ (map (const #t) (iota 16))
+ (map (lambda (split-position)
+ (let ((tok (make-tokeniser))
+ (received? #f))
+ (receive (in-tail-position?)
+ (calls-in-tail-position?
+ (lambda (return/done)
+ (add-bytevector! tok huge-bv 0 split-position
+ no-handle/message
+ return/done
+ no-return/overly-small)))
+ (assert in-tail-position?))
+ (receive (in-tail-position?)
+ (calls-in-tail-position?
+ (lambda (return/done)
+ (add-bytevector! tok huge-bv split-position
+ (- #xfffe split-position)
+ (lambda (bv offset length)
+ (assert (not received?))
+ ;; really an implementation detail,
+ ;; but no bytevector-range-copy
+ ;; exists.
+ (assert (= 0 offset))
+ (assert (= length (bytevector-length bv)))
+ (set! received?
+ (bytevector-copy bv)))
+ return/done
+ no-return/overly-small)))
+ (assert in-tail-position?))
+ (equal? huge-bv received?)))
+ (iota 16)))
+
+(define (catch-errors thunk)
+ (guard (e ((interrupted-tokeniser-violation? e)
+ `(,(condition-who e) . interrupted))
+ ((kaput-tokeniser-error? e)
+ `(,(condition-who e) . kaput)))
+ (thunk)))
+
+(test-equal "re-entrancy from message handler is detected (complete message)"
+ '(add-bytevector! . interrupted)
+ (let ((tok (make-tokeniser)))
+ (catch-errors
+ (lambda ()
+ (add-bytevector! tok #vu8(0 4 0 0) 0 4
+ (lambda (bv offset length)
+ (add-bytevector! tok #vu8(0 4 1 1) 0 4
+ no-handle/message
+ no-return/done
+ no-return/overly-small)
+ (assert #f))
+ no-return/done
+ no-return/overly-small)))))
+
+(test-equal "tokeniser becomes kaput, split after size field"
+ '(add-bytevector! . kaput)
+ (let ((tok (make-tokeniser))
+ (bv #vu8(0 3)))
+ (receive (tail? type size)
+ (calls-in-tail-position?
+ (lambda (return/overly-small)
+ (add-bytevector! tok bv 0 2 no-handle/message
+ no-return/done
+ return/overly-small)))
+ (assert (eq? #f type))
+ (assert (= size 3))
+ (assert tail?))
+ (catch-errors
+ (lambda ()
+ (add-bytevector! tok #vu8(0) 0 1
+ no-handle/message no-return/done no-return/overly-small)
+ (error "unreachable")))))
+
+(test-equal "tokeniser becomes kaput, split inside size field"
+ '(add-bytevector! . kaput)
+ (let ((tok (make-tokeniser))
+ (bv #vu8(0 3 4 5)))
+ (receive (tail?)
+ (calls-in-tail-position?
+ (lambda (return/done)
+ (add-bytevector! tok bv 0 1 no-handle/message
+ return/done
+ no-return/overly-small)))
+ (assert tail?))
+ (receive (tail? type size)
+ (calls-in-tail-position?
+ (lambda (return/overly-small)
+ (add-bytevector! tok bv 1 2 no-handle/message
+ no-return/done
+ return/overly-small)))
+ (assert tail?)
+ (assert (= size 3))
+ (assert (eq? type #f)))
+ (catch-errors
+ (lambda ()
+ (add-bytevector! tok bv 2 2
+ no-handle/message no-return/done
+ no-return/overly-small)
+ (error "unreachable")))))
+
+(test-end "tokeniser")
diff --git a/tests/utils.scm b/tests/utils.scm
index e6ba993..0d91a9a 100644
--- a/tests/utils.scm
+++ b/tests/utils.scm
@@ -16,7 +16,9 @@
;;
;; SPDX-License-Identifier: AGPL3.0-or-later
(define-module (tests utils)
- #:export (conservative-gc?))
+ #:use-module (srfi srfi-8)
+ #:use-module ((rnrs base) #:select (assert))
+ #:export (conservative-gc? calls-in-tail-position?))
;; Current versions of guile (at least 3.0.5) use a conservative
;; garbage collector, so some tests concerning garbage collection
@@ -27,3 +29,28 @@
(if (equal? "yes" (getenv "TOLERATE_CONSERVATIVE_COLLECTORS"))
#t
#f))
+
+(define (calls-in-tail-position? proc)
+ "Does @var{proc} calls its argument in tail position?
+Additionally, return the values returned to the argument
+of @var{proc} in-order. @var{proc} should not return multiple
+times."
+ (receive (continuation . arguments)
+ (let ((t (make-prompt-tag 'tail-position?)))
+ (call-with-prompt t
+ (lambda ()
+ (proc (lambda args (apply abort-to-prompt t args))))
+ (lambda _ (apply values _))))
+ (apply values
+ (= 1 (stack-length (make-stack continuation)))
+ arguments)))
+
+;; Some basic checks
+(assert (calls-in-tail-position? (lambda (thunk) (thunk))))
+;; TODO figure out why these fail ...
+#;
+(assert (not (calls-in-tail-position? (lambda (thunk) (thunk) 1))))
+#;
+(assert (not (calls-in-tail-position? (lambda (thunk) (+ 1 (thunk))))))
+#;
+(assert (not (calls-in-tail-position? (lambda (thunk) (for-each thunk '("bla"
"bla"))))))
--
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.
- [gnunet-scheme] 87/324: mq: define message queue module, (continued)
- [gnunet-scheme] 87/324: mq: define message queue module, gnunet, 2021/09/21
- [gnunet-scheme] 96/324: doc: Fix typo in README.org., gnunet, 2021/09/21
- [gnunet-scheme] 102/324: tests: config-parser: Don't generate inexact numbers., gnunet, 2021/09/21
- [gnunet-scheme] 116/324: utils: hat-let: Fix inline procedure definitions., gnunet, 2021/09/21
- [gnunet-scheme] 117/324: mq: Define envelope data type, again., gnunet, 2021/09/21
- [gnunet-scheme] 126/324: netstruct: Verify there is a setter (not a reader) in set%!., gnunet, 2021/09/21
- [gnunet-scheme] 128/324: netstruct: Raise &unwritable, not an &unreadable, in set%!., gnunet, 2021/09/21
- [gnunet-scheme] 131/324: tests: mq: Work-around guile-fibers bug., gnunet, 2021/09/21
- [gnunet-scheme] 143/324: tests: Extract conservative-gc? in a library., gnunet, 2021/09/21
- [gnunet-scheme] 150/324: Merge branch 'master' into proper-mq, gnunet, 2021/09/21
- [gnunet-scheme] 148/324: utils: tokeniser: Split message streams into individual messages.,
gnunet <=
- [gnunet-scheme] 149/324: mq: Delete unused and obsolete message-io module., gnunet, 2021/09/21
- [gnunet-scheme] 155/324: enum: Fix compilation error on Guile 3.0.7., gnunet, 2021/09/21
- [gnunet-scheme] 160/324: enum: symbol->value: Return #f if the symbol doesn't exist., gnunet, 2021/09/21
- [gnunet-scheme] 159/324: Makefile.am: Correct file name of test., gnunet, 2021/09/21
- [gnunet-scheme] 154/324: mq: Implement 'inject-error!'., gnunet, 2021/09/21
- [gnunet-scheme] 156/324: mq-impl/stream: Implement on top of ports., gnunet, 2021/09/21
- [gnunet-scheme] 162/324: Merge branch 'proper-mq', gnunet, 2021/09/21
- [gnunet-scheme] 161/324: guix: Patch 'guile' to fix some bugs., gnunet, 2021/09/21
- [gnunet-scheme] 152/324: utils: tokeniser: Implement 'add-from-port!'., gnunet, 2021/09/21
- [gnunet-scheme] 157/324: README.org: Note Guile 3.0.7 cannot be used for compilation., gnunet, 2021/09/21