chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] Any thoughts on performance woes?


From: Felix Winkelmann
Subject: Re: [Chicken-hackers] Any thoughts on performance woes?
Date: Thu, 09 Apr 2015 00:24:45 +0200 (CEST)

>> This is a terribly written program. It uses 3-element lists as vectors
>> (including higher-order "vector" arithmetic using "map") and allocates
>> like hell. The compiler can not do much with this code, and it
>> produces CPS calls everywhere.
> 
> I take it you are referring to the {add, sub, scale, mul, dot,
> squared-length, normal} functions? It seems clear to me that there's
> plenty of room for optimization in them, but I wonder if you're being a
> bit hasty to dismiss the naive implementation so readily. IMHO, use of
> `map` to transform lists is sort of idiomatically lisp.

Using map to transform lists is indeed, but using lists to store
fixed-size collections that are expected to be heavily used is
probably not, I'd say. On the other hand, this benchmark appears to be
intentionally inefficient to stress the GC by allocating a lot.

> 
> That is to say, in this case there are alternatives for numerical
> computation, but transforming a list in a similar manner ought to be
> commonplace in lispy programmes. If it is this sort of behaviour that
> drives the GC into agony then I expect that there are many Chicken
> programmes that may benefit from any form of optimization that could be
> had from `map`.

That's right, of course. Nobody would complain if we tried to catch
this special case. But every additional optimization complicates the
optimizer (which is already pretty hairy) and increases the
possibility of bugs. Still, "(map <op> ...)" must allocate, unless one
can infer some lifetime information about the argument (or result)
lists.

CHICKEN may pay an extra penalty by producing relatively bad code
for certain loops, like those introduced by an already existing optimization
of "map". So, for example:

(define (foo x y)
  (map + x y))

(foo (read) (read))

gets translated and optimized to:

csc -debug 7 x.scm -O5
[optimized]
(lambda (k205)
  (let ((k207 (##core#lambda
                (t33)
                (let ((k209 (##core#lambda
                              (t34)
                              (let ((k260 (##core#lambda
                                            (t36)
                                            (let ((k262 (##core#lambda (t37) 
(k205 (##core#undefined)))))
                                              (let ((k264 (##core#lambda (r265) 
(r265 k262))))
                                                (##sys#implicit-exit-handler 
k264))))))
                                (let ((k268 (##core#lambda
                                              (a267)
                                              (let ((k271 (##core#lambda
                                                            (a270)
                                                            (let ((x1 a267))
                                                              (let ((g919 '()))
                                                                (let ((g1020 
#f))
                                                                  (let 
((map-loop524 (##core#undefined)))
                                                                    (let ((t218 
(set! map-loop524
                                                                                
  (lambda (k220 g1725 g1826)
                                                                                
    (let ((r255 (##core#inline "C_i_pairp" g1725)))
                                                                                
      (let ((r225 (##core#cond
                                                                                
                    r255
                                                                                
                    (##core#inline "C_i_pairp" g1826)
                                                                                
                    #f)))
                                                                                
        (if r225
                                                                                
          (let ((a248 (##core#inline "C_slot" g1725 0)))
                                                                                
            (let ((a251 (##core#inline "C_slot" g1826 0)))
                                                                                
              (let ((a245 (##core#inline_allocate
                                                                                
                            ("C_a_i_plus" 4)
                                                                                
                            a248
                                                                                
                            a251)))
                                                                                
                (let ((g628 (##core#inline_allocate
                                                                                
                              ("C_a_i_cons" 3)
                                                                                
                              a245
                                                                                
                              '())))
                                                                                
                  (let ((k229 (##core#lambda
                                                                                
                                (t29)
                                                                                
                                (let ((t231 (set! g1020 g628)))
                                                                                
                                  (let ((a235 (##core#inline "C_slot" g1725 1)))
                                                                                
                                    (let ((a238 (##core#inline "C_slot" g1826 
1)))
                                                                                
                                      (map-loop524 k220 a235 a238)))))))
                                                                                
                    (if g1020
                                                                                
                      (k229 (##core#inline "C_i_setslot" g1020 1 g628))
                                                                                
                      (let ((t244 (set! g919 g628))) (k229 t244))))))))
                                                                                
          (let ((f_221274 g919)) (k220 f_221274)))))))))
                                                                      
(map-loop524 k260 x1 a270)))))))))
                                                (read k271)))))
                                  (read k268))))))
                  (##core#callunit "eval" k209)))))
    (##core#callunit "library" k207)))

Note the unnecessary lambda "k229", which could be beta-reduced in the
conditional following it (and the conditional simplified), with some
effort, removing the need for a CPS call and the related allocating of
a continuation.

Any takers?

Moreover, code produced by CHICKEN already allocates a lot
(activation-frames are heap-allocated), so allocation-heavy code will
increase this amount, in other words, CHICKEN allocates more than
non-CPS implementations. Allocation is already quite fast, which
compensates the cost in code that allocates in a "normal" manner (yes,
I know: whatever that means...)


cheers
felix



reply via email to

[Prev in Thread] Current Thread [Next in Thread]