chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-hackers] Benchmark for "values" - interesting results


From: Arthur Maciel
Subject: [Chicken-hackers] Benchmark for "values" - interesting results
Date: Mon, 20 May 2013 18:58:00 -0300


Dear Joerg, certainly I'm not the best person to delve into details about whether this should be done or not. But recalling some requests I've made to this list in order to change chicken's core for the sake of 'optimizations' I realized that it is better to change it only if we touch a critical point that can't be bypassed in other ways.

Is it really critical to you and other users?

Not specifically related to your post, but to my own memory and to not miss the (awful) humor: "Optimizing your chicken to output faster fiddling with its cloaca is certainly doable, but trying to feed it better is less painful, more sane and at least decent".

I hope someone can help you better.

Cheers,
Arthur

Date: 20 May 2013 22:40:49 +0200
From: J?rg F. Wittenberger      <address@hidden>
To: address@hidden
Subject: [Chicken-hackers] Benchmark for "values" - interesting
        results
Message-ID: <address@hidden>
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"

Hi,

so far I lived in the believe that returning multiple values would be rather
slow under Chicken.  This goes back to a remark from Felix, which is several
years old.

(However I personally love using multiple values mostly for clarity of the
code; but also because I know that some Scheme implementations have
zero overhead, even less that explicitly passing a receiver continuation.)

Turns out that Felix is correct even years later.

Maybe it's worth to consider to change the way arguments and return values
are passed in Chicken.  I know this would be the usual hell to implement.
But: I'm kinda familiar with RScheme's internals.  It does not waste
a fresh location for each parameter, thereby lowering the load in
garbage collection.  It just has a (list of) vector-like object holding
the parameters and return values.  Passing parameters (including the
[hidden] current continuation) is a matter of storing them in the array
and adjusting the current position and current top.
Returns are handled likewise.

Doing so in Chicken should be possible as well.  (Though the FFI code would
have to copy them onto the return stack to keep the interface as is.)
This should save a lot of minor gc's since most procedures return no
more values than they consume.  And call/cc is rare enough.


To the benchmark I did: I took "partition" from SRFI-1 to be a nice
example and ran the following tests.  The resulting times are in
the "profile.ods" file attached.  Code attached as "partition.scm".

A) In csi
B) compiled with -O3
C) compiled with -O5

1) Labeled "native/values" in the spreadsheet.
   The code as copied from srfi-1.scm.
2) "adhoc 1": using an ad-hoc alternative of values/call-with-values
   in portable Scheme
3) "adhoc 2": a second ad-hoc alternative, minor modifications.
4) "Novals/1": Converted into CPS avoiding "values" and "receive"
    a.k.a. "call-with-values".
5) "Novals/2": A slightly re-arranged version from (4) saving a "let".
6) "Iterative": a procedural version; this one drops the "share common
   tail" property.  (IMHO not required by SRFI, just done by the code.)

To summarize the results:

* Version (1) is the looser here.
* (2) and (3) are slightly (around 10%) faster.
  Under -O5 the difference disappear mostly.
  - That is, (2) and (3) run slightly slower under -O5 while (1) gains a
little.
  - Notice that both ad-hoc implementations are *only two lines of code*,
    one for "values" and one for "call-with-values".  Since they are shorter
    code wise and still a little faster than (1), I wonder why we need
    the native C code version at all.
* (4) is about twice as fast as (1).
  - Felix is correct.
* (5) is about 2% faster than (4), but not always.
* (6) beats them all hands down.  Almost 20x as fast as (1) in csi.
  Almost 9x with -O3 and slightly more than 9x with -O5.


As an immediate consequence I'd suggest to just drop this into srfi-1.scm
in place of the "partition" code:


;; In constract to the SRFI-1 implementation, this one does not share
;; a common tail.
(define (partition pred lst)
  (let ((t (cons #f '()))
        (f (cons #f '())))
    (let ((tl t) (fl f))
      (do ((lst lst (cdr lst)))
          ((null? lst) (values (cdr t) (cdr f)))
        (let ((elt (car lst)))
          (if (pred elt)
              (let ((p (cons elt (cdr tl))))
                (set-cdr! tl p)
                (set! tl p))
              (let ((p (cons elt (cdr fl))))
                (set-cdr! fl p)
                (set! fl p))))))))


Best regards

/J?rg

reply via email to

[Prev in Thread] Current Thread [Next in Thread]