[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-hackers] Floating point performance
From: |
megane |
Subject: |
Re: [Chicken-hackers] Floating point performance |
Date: |
Fri, 19 Apr 2019 18:16:43 +0300 |
User-agent: |
mu4e 1.0; emacs 25.1.1 |
Peter Bex <address@hidden> writes:
[...]
>
> Of course this means several more intrinsics will have to be added as
> safe versions for each of the specific flonum operators. Thoughts?
Can't see a reason why not, except it's a lot of code to write.
>
> I also wonder why the scrutinizer can't detect that these are always
> flonum arguments. Probably because it's a recursive loop? These
> local vars never escape, so it should be possible to make assumptions
> about them.
Here's the relevant -debug 2 output:
(##core#app
(let ((doloop1617 (##core#undefined)))
(let ((t20 (set! doloop1617
(##core#lambda
(i18 sum19)
(if (chicken.fixnum#fx= i18 size13)
(chicken.base#print sum19)
(##core#app
doloop1617
(chicken.fixnum#fx+ i18 '1)
(chicken.flonum#fp+
sum19
(srfi-4#f64vector-ref v14 i18))))))))
(let () doloop1617)))
'0
'0.0)
For the fp+ call to be specialized both of the arguments would have to
be inferred to be floats. The second one is. The 'sum19' however is not.
In the scrutinizer the function arguments are always of type * at the
beginning of a function's body. Then their types are refined with
predicates in 'if' branches or with calls to enforcing functions. Here
'sum19' is only used as an argument to print, which doesn't refine
anything, before the fp+ call.
To make this code specialize two things are needed:
1. Infer more specific types for (recursive) functions.
2. Prove that a function is always called with correct arguments.
If we can do this then we can effectively re-walk the function with the
arguments assumed to be of the correct types. This should cause all
calls to be specialized.
Feature 1. is hard and lots of work, but doable. This needs for example
adding support for unifying 2 type variables.
Feature 2. is probably not that hard for the simple cases like above.
And handling the simple cases might be enough.
[...]
>
> That's only slower due to a C_trace call. With -d0 it produces
> more or less identical code with -O3.
I do get a nice speedup with -O3 if I annotate the sum inside the fp+
call: (the float sum).