[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How to Generate a Long String of the Same Character
From: |
Wolfgang Laun |
Subject: |
Re: How to Generate a Long String of the Same Character |
Date: |
Tue, 20 Jul 2021 11:49:05 +0200 |
On Mon, 19 Jul 2021 at 19:34, Neil R. Ormos <ormos-gnulists17@ormos.org>
wrote:
> Wolfgang Laun wrote:
>
> > The results for the four versions:
> > *rec* 0m1,436s
> > *dbl* 0m2.322s
> > *rpt* 0m13.543s
> > *sub* 0m27.290s
>
> I was a little surprised that the recursive
> algorithm was so much faster in Wolfgang's tests.
>
Why? gawk programs execute on a virtual machine with a CIS and a couple of
stacks. A function call isn't much worse than a goto.
It is mainly the number of interpreter instructions that counts (e.g., h =
h h x; is better than h = h h; h = h x;)
function neil(n, s, l, s0l){
> l=1;
> s0l=length(s);
> while (l*2<=n) {
> l=l+l;
> s=s s;
> };
> if (l<n) s=s substr(s, 1, (n-l)*s0l);
> return s;
> };
>
> You need to add
if( n == 0 ) return "";
as the first instruction in neil. You can try to optimize one 2+l from the
loop. But I still get results where your non-recursive function is somewhat
slower than my recursive one.
I have noticed that gawk 5.1.0 appears to execute both versions a little
faster than 5.0.1, but the difference remains. I can send you all the
details about my environment but I don't think that this would tell you
anything noteworthy.
(I have never been looking at gawk internals before, so all of the
statements below are quite unreliable.)
A somewhat enlightening procedure is to read a dump of the interpreter
code. *rec *results in 32 VM instructions whereas *neil *results in 49. The
salient numbers are the number of instructions executed for each
iteration. *rec
*loops over 22 or 24 VM instructions, the while in *neil *loops over 14
instructions; out of the remainder of 35 instructions 26 are executed with
each call. Iteration count in *rec *is one less for 2^(n-1)+1 to 2^n-1.
Some instructions are remarkably "heavy", length() for a string is one.
Timing the single call
x = srep( 300000000, "abc" );
also shows that *rec *is faster.
/usr/bin/time is very unreliable. I have done all runs on a machine where a
browser but no other program (except demons) is running. I don't know what
emacs or eclipse do when they are just sitting there, enjoying their idle
time. But if they affect the results of /usr/bin/time, it should not be
partisan.
All of this doesn't really explain why the results differ in this
irrational way.
Cheers
Wolfgang