Making __contourc__.cc (in Octave) more efficient

There is a pattern in __contourc__.cc that I don’t understand. The mark_facets function mainly consists of two loops which each mark edges going in one direction (horizontal or vertical). The part I don’t understand is why the code is split into two loops; it seems to me that a single loop would work just as well, with half the overhead. Since loop overhead is the majority of the time this function takes, it would run almost twice as fast if it were one loop rather than two.

Here is the original function:

static void

mark_facets (const Matrix& Z, charMatrix& mark, double lvl)

{

unsigned int nr = mark.rows ();

unsigned int nc = mark.cols ();

double f[4];

for (unsigned int c = 0; c < nc; c++)

for (unsigned int r = 0; r < nr; r++)

{

f[0] = Z(r, c) - lvl;

f[1] = Z(r, c+1) - lvl;

f[3] = Z(r+1, c) - lvl;

f[2] = Z(r+1, c+1) - lvl;

for (unsigned int i = 0; i < 4; i++)

if (fabs(f[i]) < DBL_EPSILON)

f[i] = DBL_EPSILON;

if (f[1] * f[2] < 0)

mark(r, c) += 2;

if (f[0] * f[3] < 0)

mark(r, c) += 8;

}

for (unsigned int r = 0; r < nr; r++)

for (unsigned int c = 0; c < nc; c++)

{

f[0] = Z(r, c) - lvl;

f[1] = Z(r, c+1) - lvl;

f[3] = Z(r+1, c) - lvl;

f[2] = Z(r+1, c+1) - lvl;

for (unsigned int i = 0; i < 4; i++)

if (fabs(f[i]) < DBL_EPSILON)

f[i] = DBL_EPSILON;

if (f[0] * f[1] < 0)

mark(r, c) += 1;

if (f[2] * f[3] < 0)

mark(r, c) += 4;

}

And my suggested revised version:

static void

mark_facets (const Matrix& Z, charMatrix& mark, double lvl)

  unsigned int nr = mark.rows ();

  unsigned int nc = mark.cols ();

  double f[4];

for (unsigned int c = 0; c < nc; c++)

for (unsigned int r = 0; r < nr; r++)

        f[0] = Z(r, c) - lvl;

        f[1] = Z(r, c+1) - lvl;

        f[3] = Z(r+1, c) - lvl;

        f[2] = Z(r+1, c+1) - lvl;

        for (unsigned int i = 0; i < 4; i++)

          if (fabs(f[i]) < DBL_EPSILON)

            f[i] = DBL_EPSILON;

        if (f[0] * f[1] < 0)

          mark(r, c) += 1;

        if (f[1] * f[2] < 0)

          mark(r, c) += 2;

        if (f[2] * f[3] < 0)

          mark(r, c) += 4;

        if (f[3] * f[0] < 0)

          mark(r, c) += 8;

Could someone either enlighten me as to what I am missing, or confirm that this change would speed up this function without changing what it does?

Since Octave uses column-major order, I used the nested loop with columns in the outside loop and rows in the inside loop, to take best advantage of memory caching. Since this is my first foray into Octave source code, please let me know if this is a valid strategy. If so, the similar nested loop in cntr() should also be reversed.

-mkr

From:	Michael Rodby
Subject:	Making __contourc__.cc (in Octave) more efficient
Date:	Sat, 29 Jun 2013 15:40:00 -1000