octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mixed type operations in Octave


From: Jaroslav Hajek
Subject: Re: mixed type operations in Octave
Date: Wed, 10 Sep 2008 08:09:01 +0200

On Tue, Sep 9, 2008 at 10:38 PM, John W. Eaton <address@hidden> wrote:
> On  9-Sep-2008, Jaroslav Hajek wrote:
>
> | Addition and subtraction can be done using just the native operations.
> | For instance, addition can be done like this:
> |
> | T add (T x, T y)
> | {
> |   T u;
> |    if (x >= 0)
> |      {
> |         u = std::numeric_limits<T>::max () - x;
> |         if (y > u)
> |            ftruncate = true;
> |         else
> |            u = y;
> |       }
> |     else
> |      {
> |         u = std::numeric_limits<T>::min () - x;
> |         if (y < u)
> |            ftruncate = true;
> |         else
> |            u = y;
> |       }
> |     return x + u;
> | }
>
> Where is ftruncate declared and how is it used?

I converted the single int flag to individual boolean flags for each
exception (truncate, nan, non-int). After all, they were always used
separately.
And it's faster to raise a flag using flag = true (a single write)
than int_flag |= mask (a read, or, and write). The first operation is
also atomic and thus thread-safe.

>
> Maybe you would also want to check x == 0 and y == 0?  Something like

Why? It's handled in the code above. But I can try and report the
benchmark results.

>
>  template <typename T>
>  T
>  add (const T& x, const T& y)
>  {
>    if (x == 0)
>      return y;
>    else if (y == 0)
>      return x;
>    else if (x > 0)
>      {
>         static T mx = std::numeric_limits<T>::max ();
>         return (y > mx - x) ? mx : x + y;
>      }
>    else
>      {
>         static T mn = std::numeric_limits<T>::min ();
>         return (y < mn - x) ? mn : x + y;
>      }
>  }
>
> ?
>
> | or like this (uses the bit properties of two's complement signed int
> | representation,
> | is significantly faster)
> |
> | T add (T x, T y)
> | {
> |       T u = x + y;
> |       T ux = u ^ x, uy = u ^ y;
> |       if ((ux & uy) < 0)
> |         {
> |           u = std::numeric_limits<T>::max () + signbit (~u);
> |           ftruncate = true;
> |         }
> |       return u;
> | }
>
> I'd think that which version is faster would depend on the compiler
> and hardware, at least to some degree.

I bet version 2 is faster on any current architecture (if it works).
Using modest optimization, it translates to machine code with only a
single conditional jump, that is only taken on overflow, and thus
handled gracefully by branch prediction. The question is, of course,
how much faster.

> And I'd really rather not use
> bit twiddling tricks that are not guaranteed to work for all
> arithmetic models unless they are accompanied by some configure checks
> so that this code won't cause trouble in the future if/when someone
> tries to port Octave to some exotic system.
>

Exactly my idea - I wanted to do a configure check and define a flag
if this is usable.

> | Both versions seem to be faster than going via double, the current
> | implementation.
> | While version 1 is million percent portable, version 2 may not be if
> | the target machine does not use two's complement signed integers (i.e.
> | the signed arithmetic is identical to unsigned). I don't know if there
> | are any such architectures we actually care of that do not support
> | this. In fact, even the autoconf manual says that assuming two's
> | complement is, for practical purposes, safe.
> | OTOH, on most, if not all, architectures, version 2 will work and is
> | still considerably faster than version 1.
> |
> | Multiplication is best done by promoting to a wider integer type,
> | multiplying, and fit-to-range. Again, avoiding the int-real
> | conversions is a performance win. If 128-bit int is not available,
> | 64-bit multiplication can be done by using bit shifts.
>
> Before making claims about what is faster, I guess I'd like to see
> some actual numbers for the different versions on several different
> architectures.
>

You'll get them. I contributed the benchmark into OctaveForge, and
I'll want people try the patch and report what results they've seen.


> BTW, how much faster are we talking about here?

I already did some benchmarks, but I have changed the code
significantly since, so I don't have up-to-date numbers now.

>From the old results, I see that in adding two signed 32-bit integers
there was 113% speed-up with version 1 compared to current
implementation and 173% with version 2 (the speed-up is measured as
`old_time / new_time - 1' (in %)). This may not correspond exactly to
the codes above. Also, especially with version 2, it somewhat depends
on how often the overflow occurs (normally, it occurs infrequently,
and I think this gives version 2 another speed advantage on
architectures using branch prediction).

>
> Although speed is nice, I think it would be better to spend time
> working on missing features first.
>
Well, as I said before, it did start as a hunt for missing feature
(int64 arithmetics). And I realized that the present method (using
doubles) would not work, and when I wrote addition and subtraction for
64-bit ints, I could just as well make it a template and use it for
other integers, especially if it would improve performance, which it
did, etc.


Maybe you're right, all this may really be quite useless. Perhaps
someone can comment on whether they use (or want to use) integer
arithmetics in Octave, and whether they care about its speed?


-- 
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]