chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-hackers] Floating point performance


From: Peter Bex
Subject: [Chicken-hackers] Floating point performance
Date: Thu, 18 Apr 2019 18:28:11 +0200
User-agent: NeoMutt/20170113 (1.7.2)

Hi guys,

I came across this post[1] and could not resist writing something like it
in CHICKEN:

(import srfi-4 (chicken fixnum) (chicken flonum))

(let* ((size (* 32000000))
       (v (make-f64vector size)))
  (time
   (do ((i 0 (fx+ i 1))
        (sum 0.0 (fp+ sum (f64vector-ref v i))))
       ((fx= i size) (print sum)))))

The above code is actually slower with fp+ instead of just +!

I checked, and the reason is that the generated C contains a goto loop
if we use +, due to using C_s_a_i_plus() which is inlineable.

Now, fp+ is only inlineable if the scrutinizer can prove that it's adding
flonums, otherwise it falls back to a CPS call.  I'm sure we can change
that relatively easily by making it into an inline function that uses
check_flonum or so.  We could rename the current one to C_a_u_i_flonum_plus,
which is more correct anyway since it's unsafe and may crash when given
another kind of object.

Of course this means several more intrinsics will have to be added as
safe versions for each of the specific flonum operators.  Thoughts?

I also wonder why the scrutinizer can't detect that these are always
flonum arguments.  Probably because it's a recursive loop?  These
local vars never escape, so it should be possible to make assumptions
about them.

For completeness, I also tried named let:

(let* ((size (* 32000000))
       (v (make-f64vector size)))
  (time
   (let lp ((i 0)
            (sum 0.0))
     (if (fx= i size)
         (print sum)
         (lp (fx+ i 1) (fp+ sum (f64vector-ref v i)))))))

That's only slower due to a C_trace call.  With -d0 it produces
more or less identical code with -O3.

With -O5 we get into more interesting territory, as we get unboxed
flonum references as a result of choosing the unsafe fp+ call.
We could get the same speedup at safe optimization levels if the
scrutinizer were able to deduce the types here.

Cheers,
Peter

[1] https://jackmott.github.io//programming/2016/07/22/making-obvious-fast.html

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]