[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Making __contourc__.cc (in Octave) more efficient
From: |
Ben Abbott |
Subject: |
Re: Making __contourc__.cc (in Octave) more efficient |
Date: |
Sun, 30 Jun 2013 13:16:59 +0800 |
On Jun 30, 2013, at 9:40 AM, Michael Rodby wrote:
> There is a pattern in __contourc__.cc that I don’t understand. The
> mark_facets function mainly consists of two loops which each mark edges going
> in one direction (horizontal or vertical). The part I don’t understand is why
> the code is split into two loops; it seems to me that a single loop would
> work just as well, with half the overhead. Since loop overhead is the
> majority of the time this function takes, it would run almost twice as fast
> if it were one loop rather than two.
>
> Here is the original function:
>
> static void
> mark_facets (const Matrix& Z, charMatrix& mark, double lvl)
> {
> unsigned int nr = mark.rows ();
> unsigned int nc = mark.cols ();
>
> double f[4];
>
> for (unsigned int c = 0; c < nc; c++)
> for (unsigned int r = 0; r < nr; r++)
> {
> f[0] = Z(r, c) - lvl;
> f[1] = Z(r, c+1) - lvl;
> f[3] = Z(r+1, c) - lvl;
> f[2] = Z(r+1, c+1) - lvl;
>
> for (unsigned int i = 0; i < 4; i++)
> if (fabs(f[i]) < DBL_EPSILON)
> f[i] = DBL_EPSILON;
>
> if (f[1] * f[2] < 0)
> mark(r, c) += 2;
>
> if (f[0] * f[3] < 0)
> mark(r, c) += 8;
> }
>
> for (unsigned int r = 0; r < nr; r++)
> for (unsigned int c = 0; c < nc; c++)
> {
> f[0] = Z(r, c) - lvl;
> f[1] = Z(r, c+1) - lvl;
> f[3] = Z(r+1, c) - lvl;
> f[2] = Z(r+1, c+1) - lvl;
>
> for (unsigned int i = 0; i < 4; i++)
> if (fabs(f[i]) < DBL_EPSILON)
> f[i] = DBL_EPSILON;
>
> if (f[0] * f[1] < 0)
> mark(r, c) += 1;
>
> if (f[2] * f[3] < 0)
> mark(r, c) += 4;
> }
> }
>
> And my suggested revised version:
>
> static void
> mark_facets (const Matrix& Z, charMatrix& mark, double lvl)
> {
> unsigned int nr = mark.rows ();
> unsigned int nc = mark.cols ();
>
> double f[4];
>
> for (unsigned int c = 0; c < nc; c++)
> for (unsigned int r = 0; r < nr; r++)
> {
> f[0] = Z(r, c) - lvl;
> f[1] = Z(r, c+1) - lvl;
> f[3] = Z(r+1, c) - lvl;
> f[2] = Z(r+1, c+1) - lvl;
>
> for (unsigned int i = 0; i < 4; i++)
> if (fabs(f[i]) < DBL_EPSILON)
> f[i] = DBL_EPSILON;
>
> if (f[0] * f[1] < 0)
> mark(r, c) += 1;
>
> if (f[1] * f[2] < 0)
> mark(r, c) += 2;
>
> if (f[2] * f[3] < 0)
> mark(r, c) += 4;
>
> if (f[3] * f[0] < 0)
> mark(r, c) += 8;
> }
> }
>
> Could someone either enlighten me as to what I am missing, or confirm that
> this change would speed up this function without changing what it does?
>
> Since Octave uses column-major order, I used the nested loop with columns in
> the outside loop and rows in the inside loop, to take best advantage of
> memory caching. Since this is my first foray into Octave source code, please
> let me know if this is a valid strategy. If so, the similar nested loop in
> cntr() should also be reversed.
>
> -mkr
I've attached a diff of Michael's suggestion. I did a quick fgrep of
scripts/plot. The functions contour(), contour3(), surfc(), and meshc() each
call __contourc__. There are no demos for meshc(). All other demos look to
work correctly to me.
I'm currently chasing a problem with legend(). I'm not sure if/when I'll be
able to get back to this (I've only had a superficial look).
Michael, can you submit a patch to the tracker so that it doesn't get lost.
We'd prefer a mercurial changeset (which include s ChangeLog), but if you don't
have the time, please use include the attached diff. Thanks.
https://savannah.gnu.org/patch/?group=octave
Ben
__contourc__.diff
Description: Binary data