Re: optimizing totalorder

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: optimizing totalorder

From:	Adhemerval Zanella Netto
Subject:	Re: optimizing totalorder
Date:	Mon, 16 Oct 2023 20:47:02 -0300
User-agent:	Mozilla Thunderbird

On 15/10/23 11:59, Bruno Haible wrote:
> With the new benchmark in place, I measured the run time of
>   - the glibc 2.35 implementation of totalorder,
>   - the gnulib implementation (picked by configuring with
>       gl_cv_func_totalorder_in_libm=no gl_cv_func_totalorder_no_libm=no \
>       gl_cv_func_totalorderf_in_libm=no gl_cv_func_totalorderf_no_libm=no \
>       gl_cv_func_totalorderl_in_libm=no gl_cv_func_totalorderl_no_libm=no \
>   - the gnulib implementation with some disabled NaN tests.
>     This change (see attached patch) is correct: it still passes the unit
>     tests.
> 
> Here are the running times (on x86_64) of "./bench-totalorder fdl 1000000":
> 
>                      f       d       l
> 
> glibc              1.816   1.671   2.078
> gnulib             1.445   1.425   8.690
> gnulib with patch  1.798   1.974  14.032
> 
> Conclusion:
>   * My patch is a slowdown. It apparently "optimized" the fast path away. :-D
>   * The gnulib implementation is significantly faster than glibc, except for
>     the long-double case. I'll redo the measurements on various CPU types and
>     then tell the glibc people...

Most of the old math implementations in libm do not take in consideration recent
compiler optimization, such as builtin for nan/inf checks; and also tries to
favor integer code over floating poin. Recent implementations, like
the ones provided by ARM optimized routines and hypot/fmod/exp10, try to 
improve by 
leveraging both the compiler, use better algorithms, and favor FP.

Also, did you use the same compiler flags / environment as usually distro does? 
On simple algorithms like this, -fstack-protector can be quite a hit; as well
PLT overhead.

I also checked the resulting code and it is larger than glibc one for double
(375 vs 375 using same compiler and flags), but it should not really matter.

> 
> Kudos to you, Paul, for an implementation that is not only standards-compliant
> and portable, but also faster than glibc!

If you may, it would be good to have such improvement on glibc as well.  For
math code we have some benchmark on benchtest.

> 
> Bruno
>

[Prev in Thread]

Current Thread

[Next in Thread]

optimizing totalorder, Bruno Haible, 2023/10/15
- Re: optimizing totalorder, Adhemerval Zanella Netto <=

Prev by Date: Re: signalling NaNs on i386 and x86_64
Next by Date: Re: totalorder*: Fix test failures on PA-RISC and MIPS CPUs
Previous by thread: optimizing totalorder
Next by thread: lock: Make Autoconf macro more robust
Index(es):
- Date
- Thread