[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] Optimized integral power
From: |
Greg Chicares |
Subject: |
Re: [lmi] Optimized integral power |
Date: |
Thu, 22 Dec 2016 23:17:41 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.4.0 |
On 2016-12-22 14:45, Vadim Zeitlin wrote:
> On Thu, 22 Dec 2016 02:36:49 +0000 Greg Chicares <address@hidden> wrote:
>
> GC> Please take a look at commit 569b775d3ffc4afc526a83cb1a769f6d604101a8
> GC> and let me know if you see any reason to keep the std::pow() alternative,
> GC> now in a #if 0 ... #endif block. I'm inclined to remove it soon.
>
> I agree, I don't see how std::pow() could do better than what the new code
https://www.sgi.com/tech/stl/power.html#3
| This is in fact not the minimum possible number of multiplications:
| it is possible to compute the fifteenth power of x using only five
| multiplications, but power(x, 15) uses six.
But we needn't worry about that. As an amusing digression, though...
Instead of looking up that example, I was going to say X^9:
X3 = X*X*X 2 total multiplications
X9 = X3*X3 3 " "
whereas with binary powers:
X2 = X*X 1 total multiplications
X4 = X2*X2 2 total multiplications
X8 = X4*X4 3 total multiplications
X9 = X*X8 4 total multiplications
Am I missing something, or is Matt Austern's example not minimal
as I would have expected it to be, just because that's the way I
thought his mind might work?
Preliminarily, let
P = the power to which a number is to be raised
N = a number base (3 for ternary, e.g.)
RN = the base-N representation of P
NOD = the number of digits in a particular Rb
SOD = the sum of the digits in a particular Rb
QN = the total number of multiplications required to
calculate x^P with N-ary power precalculations
For any base-N representation RN, the first digit is free because
it's multiplied by N^0, which is unity. The next digit costs (N-1)
multiplications, and each subsequent digit costs one more. Then the
number of multiplications in a base-N Russian peasant algorithm
(ignoring tiny P for which RN < 2) would be A+B+C, where
A = N - 1
B = NOD(RN) - 2
C = SOD(RN) - 1
because
- the first digit costs nothing
- the second digit costs A
- K digits cost (K-2)=B (we already took care of the first two)
- and to form the final product we must select SOD(RN) of the
powers already calculated, and multiplying K numbers take
(K-1)=C multiplications
Thus
QN(P,N) = A+B+C, or, more directly,
= (N - 1) + (NOD(RN) - 2) + (SOD(RN) - 1)
= N + NOD(RN) + SOD(RN) - 4
For P=9:
A B C | A B + C | total
9 = 1001 binary: (2-1) + (4-2) + (2-1) = 1 + 2 + 1 = 4
9 = 100 ternary: (3-1) + (3-2) + (1-1) = 2 + 1 + 0 = 3
Testing the simplified formula:
N + NOD + SOD - 4 = QN
9 = 1001 binary: 2 + 4 + 2 - 4 = 4
9 = 100 ternary: 3 + 3 + 1 - 4 = 3
Let's try 35 to see whether that disproves the formula:
N + NOD + SOD - 4 = QN
35 = 100011 binary: 2 + 6 + 3 - 4 = 7
35 = 1022 ternary: 3 + 4 + 5 - 4 = 8
35 = 120 quinary: 5 + 3 + 3 - 4 = 7
...and work that example out as a demonstration:
X2 = X*X 1 total multiplications
X4 = X2*X2 2
X8 = X4*X4 3
X16 = X8*X8 4
X32 = X16*X16 5
X35 = X32*X2*X 7 = Q2(35)
X3 = X*X*X 2
X9 = X3*X3 3
X27 = X9*X9 4
X35 = X27*X3*X3*X*X 8 = Q3(35)
X5 = X*X*X*X*X 4
X25 = X5*X5 5
X35 = X25*X5*X5 7 = Q5(35)
It seems interesting that the thirty-fifth power can be done no more
efficiently with quinary than with binary grouping of multiplications.
But that's not really startling: after all, the 323rd power can be
calculated more efficiently than by multiplying X by itself 17 times,
and that intermediate value by itself 19 times, because the 17th power
takes less work with binary grouping. That leads to questions like:
what is the lowest exponent P for which Z-ary grouping is better than
for any lower-order grouping--i.e., QZ(P) < QX(P) for all X < Z? and
is there a maximum Z for which that question has an answer? But we'll
postpone such matters for Ugolyok's birthday, which begins within the
hour in the Zulu timezone.
> does (well, in theory some CPU could have a dedicated instruction for
> performing this operation, but in practice I don't know of any such
> instruction).
Like those wonderful REP instructions on the 8086.
> On a completely unrelated note, I've looked at stl_extensions.hpp while
> reviewing this and noticed copy_n() and iota() functions in it. It's true
> that they were only available in some extensions of C++98 STL, but now they
> are part of C++11 standard, so I think they should be removed now and the
> corresponding std versions should be used instead. This looks especially
> simple to do because each of these functions is only used once (in
> ihs_acctval.cpp for copy_n and in gpt_test.cpp for iota). As always, I'm
> ready to make a patch (or two) if it could be useful.
What I was going to say, until I started wondering why Austern picked
15 as a counterexample above, was that I had hesitated to make this
change for fear it would collide with C++11-ization work that you might
already have done locally, but--concluding that you haven't--I've done
it (including is_sorted(), too), and finally pushed it a moment ago.
It seemed better for me to do it myself because it's quite mechanical
and I can use the vim practice more than you.
- Re: [lmi] MinGW-w64 anomaly?, (continued)
- Re: [lmi] MinGW-w64 anomaly?, Greg Chicares, 2016/12/25
- Re: [lmi] MinGW-w64 anomaly?, Vadim Zeitlin, 2016/12/25
- Re: [lmi] MinGW-w64 anomaly?, Greg Chicares, 2016/12/26
- Re: [lmi] MinGW-w64 anomaly?, Vadim Zeitlin, 2016/12/26
- Re: [lmi] MinGW-w64 anomaly?, Greg Chicares, 2016/12/27
- Re: [lmi] MinGW-w64 anomaly?, Vadim Zeitlin, 2016/12/27
- [lmi] Optimized integral power [Was: MinGW-w64 anomaly?], Greg Chicares, 2016/12/21
- Re: [lmi] Optimized integral power, Vadim Zeitlin, 2016/12/22
- Re: [lmi] Optimized integral power,
Greg Chicares <=
- Re: [lmi] Optimized integral power, Vadim Zeitlin, 2016/12/22
- Re: [lmi] Optimized integral power, Greg Chicares, 2016/12/23
- Re: [lmi] libstdc++ anomaly? [was: MinGW-w64 anomaly?], Vadim Zeitlin, 2016/12/19
- [lmi] gcc -flto [Was: libstdc++ anomaly?], Greg Chicares, 2016/12/24
- Re: [lmi] gcc -flto, Vadim Zeitlin, 2016/12/24
- Re: [lmi] gcc -flto, Greg Chicares, 2016/12/24
- [lmi] gcc -fprofile-generate and -fprofile-use [Was: gcc -flto], Greg Chicares, 2016/12/27
- [lmi] gcc -fprofile-generate and -fprofile-use [Was: gcc -flto], Greg Chicares, 2016/12/27
- Re: [lmi] gcc -fprofile-generate and -fprofile-use [Was: gcc -flto], Vadim Zeitlin, 2016/12/27
- Re: [lmi] gcc -fprofile-generate and -fprofile-use [Was: gcc -flto], Greg Chicares, 2016/12/27