bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#37241: large performance gap when start+inc specified with 'seq'


From: L A Walsh
Subject: bug#37241: large performance gap when start+inc specified with 'seq'
Date: Wed, 04 Sep 2019 13:42:56 -0700
User-agent: Thunderbird


On 2019/09/03 18:51, Pádraig Brady wrote:
> Yes we could be better here.
> Attached is a fairly simple improvement:
> 
> $ time seq.new 1 1 1e8 >/dev/null
> real  0m1.516s
> 
> $ time seq.new 1 2 1e8 >/dev/null
> real  0m0.834s
> 
> $ time seq.orig 1 2 1e8 >/dev/null
> real  0m40.435s
> 
> It might be improved further with BCD addition of the step string,
> but this should be good for now.
---
        Thanks, um, do you know what the time would have been
on your machine of the original, non-explicit case, i.e.:

time seq.new 1e8 >/dev/null

I'm wondering if it is about the same, now, as
the "1 1 1e8" case, as the original question I had was
why the case doing half as many iterations with
"1 2 1e8" took almost 33x as long as the "1e8" case (or,
perhaps logically equivalent) why
"1 1 1e8" wouldn't be the same as "1e8" by itself.

I had a bit of a problem following the logic of the original source
and as to how it determined to use the simpler algorithm over
the more general.

Logically, I thought "1 1 1e8" and "1e8" should have take about the
same amount of time as the first fit the defaults of the 2nd (1e8)
case.  
If that was the same, then the case of
"1 2 1e8", logically, should take half as long (because it was
half as many iterations).  In cases, where all values could be 
determined to be simple integers in the range 1<=X<=1e8 
with integer start and step sizes and where the limit was
'<=' than the native int size of the machine could be
done with a native, non-complex library call.

Other than not being able to use 'INC <REG>' (native inc on a register),
but needing to do an add-immediate 'ADD <REG>,CONS' the algorithms would
be the same.

So, logically, I'd think for all items within the native add/sub 
word size, the timing would be close to the same, wouldn't it?

I'll admit to the work being of questionable use, but not so much
less useful than using/needed 'seq' in the first place.  I.e. if 
coreutils was going to include a special binary to produce sequences
in the first place, it should probably be very efficient so a
user would never feel a need for a specialized version for native
integers.

The reason for that would be so if the work done in making the sequence
was significant enough, it could be split by 'N' physical cores
in a machine, one could see about a {90..100}*N Speed-boost over
the original.

Example of such: 
"Some version" of the 'factor-parallel.sh' test
in the coreutils 'tests/misc' dir, which I thought might actually
do something like that...
:-)














reply via email to

[Prev in Thread] Current Thread [Next in Thread]