[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow
From: |
Rik |
Subject: |
[Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2 |
Date: |
Mon, 16 Aug 2021 12:20:20 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36 |
Follow-up Comment #11, bug #60928 (project octave):
I'm narrowing in on the cause of the problem which now seems to be in the
gather phase.
First, I created benchmark data so there would be no differences between
various test runs do to actual differences in the data.
The N-D array is of size 5x5x5x5000. The baseline times for sorting along
each dimension were
## Baseline
DIM1: 0.072642
DIM2: 2.146125
DIM3: 0.478395
DIM4: 0.137138
Next, I shut off the special case for dimension 1 and forced everything
through the gather, sort, scatter path. Timings were:
## Everything through second code path
DIM1: 11.184590
DIM2: 2.272407
DIM3: 0.468403
DIM4: 0.146637
Clearly, it was very, very bad for dimensions 1 and 2.
Next I used a std::chrono::high_resolution_clock to bracket the gather, sort,
and scatter blocks of code. For the first three dimensions, the only thing
that differs is the stride
octave_idx_type ns = dv(dim);
octave_idx_type iter = dv.numel () / ns;
octave_idx_type stride = 1;
for (int i = 0; i < dim; i++)
stride *= dv(i);
The variable ns (number of elements per sort) is 5. The number of loop
iterations is 125,000. Only the stride changes from 1 to 5 to 125 depending
on dimension.
The sort and scatter phases always takes ~1 microsecond regardless of
dimension (1, 2, or 3). However, the gather phase can take a maximum of
~250, ~75 , ~9 microseconds. Clearly something is going on in this phase.
The instrumented code is
auto start = high_resolution_clock::now();
octave_idx_type offset = j;
octave_idx_type offset2 = 0;
while (offset >= stride)
{
offset -= stride;
offset2++;
}
offset += offset2 * stride * ns;
// gather and partition out NaNs.
// FIXME: impact on integer types noticeable?
octave_idx_type kl = 0;
octave_idx_type ku = ns;
for (octave_idx_type i = 0; i < ns; i++)
{
T tmp = ov[i*stride + offset];
if (sort_isnan<T> (tmp))
buf[--ku] = tmp;
else
buf[kl++] = tmp;
}
auto stop = high_resolution_clock::now ();
auto duration = duration_cast<microseconds> (stop - start);
std::cout << "gather: " << duration.count () << std::endl;
I'm not sure what the while loop for calculation of offset and offset2 is
doing, but I don't like the look of it. That's the next benchmark timing to
run.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?60928>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2, Rik, 2021/08/14
- [Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2, Rik, 2021/08/14
- [Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2, Michael Leitner, 2021/08/15
- [Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2, Michael Leitner, 2021/08/15
- [Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2,
Rik <=
- [Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2, Rik, 2021/08/16
- [Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2, Rik, 2021/08/16
- [Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2, Rik, 2021/08/16
- [Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2, anonymous, 2021/08/16
- [Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2, Michael Leitner, 2021/08/16
- [Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2, anonymous, 2021/08/16
- [Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2, anonymous, 2021/08/16
- [Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2, Rik, 2021/08/16
- [Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2, anonymous, 2021/08/16
- [Octave-bug-tracker] [bug #60928] Performance of sort unexpectedly slow for DIM=2, Rik, 2021/08/17