[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Five other functions that are in Matlab core ported to Octave
From: |
David Bateman |
Subject: |
Re: Five other functions that are in Matlab core ported to Octave |
Date: |
Thu, 06 Sep 2007 23:04:33 +0200 |
User-agent: |
Thunderbird 1.5.0.7 (X11/20060921) |
John W. Eaton wrote:
>
> I agree that there is probably a lot of room for improvementhere, but
> any change to subsasgn should wait until the object branch is merged
> (after 3.0).
Well, if there is some minimal changes we might do that speed things up
we should probably try. I compiled a version of octave for use with
gprof and ran
m = 1000; n = 1000;
col = ones (m, 1);
a = zeros(m, n);
for i = 1: n;
a(:,i) = col;
end
quit
and then did a flat profile on it. The top consuming functions are
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
25.00 0.20 0.20 1000000 0.00 0.00
Array<double>::elem(int, i
nt)
21.25 0.37 0.17 4958 0.03 0.03
file_ops::tilde_expand(std
::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
13.75 0.48 0.11 1000 0.11 0.31 int
assignN<double, double
>(Array<double>&, Array<double> const&, double const&)
13.75 0.59 0.11 2 55.00 55.01
fill_matrix(octave_value_l
ist const&, int, char const*)
2.50 0.61 0.02 2 10.00 10.00 Array<octave_value
(*)(oct
ave_base_value&, octave_value_list const&, octave_base_value
const&)>::resize_an
d_fill(int, int, int, octave_value (* const&)(octave_base_value&,
octave_value_l
ist const&, octave_base_value const&))
The top call is Array<double>::elem !!!! The next is associated with the
path search at startup and the third call is the assign function itself.
Notice that ::assignN and not ::assign2 is called.
Looking at the ::assign functions they call lhs.elem(...). If we instead
call lhs.make_unique() once and then use lhs.xelem(...) we can already
get a significant improvement. However Array<T>::make_unique is
protected and so to support g++ 3.2 compilers it would have to be made
public rather than make the ::assign functions a friend of Array<T>.
Looking further there is code like
for (octave_idx_type i = 0; i < n; i++)
{
octave_idx_type ii = lhs_idx.elem (i);
lhs.xelem (ii) = rhs.elem (i);
}
that might be written as
if (lhs_idx.is_colon ())
{
for (octave_idx_type i = 0; i < n; i++)
lhs.xelem (i) = rhs.elem (i);
}
else
{
for (octave_idx_type i = 0; i < n; i++)
{
octave_idx_type ii = lhs_idx.elem (i);
lhs.xelem (ii) = rhs.elem (i);
}
or even
for (octave_idx_type j = 0; j < m; j++)
{
octave_idx_type jj = idx_j.elem (j);
for (octave_idx_type i = 0; i < n; i++)
{
octave_idx_type ii = idx_i.elem (i);
lhs.xelem (ii, jj) = xrhs.elem (k++);
}
}
that might be written as
for (octave_idx_type j = 0; j < m; j++)
{
octave_idx_type off = dims(0) *
idx_j.elem (j);
for (octave_idx_type i = 0; i < n; i++)
{
octave_idx_type ii = idx_i.elem (i);
lhs.xelem (ii + off) = xrhs.elem (k++);
}
}
though it makes no sense to do that in ::assign2 as I'll address later.
Making these changes and then running the attached script with an
unaltered version of Octave I get
[t1,t2] = testsubsassgn([100,200,500,1000],[100,200,500,1000],5)
t1 =
4.0000e-04 8.0000e-04 1.7998e-03 3.5996e-03
4.0000e-04 9.9980e-04 1.9996e-03 4.1992e-03
5.9980e-04 1.3996e-03 2.9994e-03 5.9990e-03
7.9980e-04 1.7998e-03 4.5994e-03 8.5988e-03
t2 =
8.0000e-04 1.7996e-03 4.1994e-03 8.7988e-03
1.3998e-03 2.7996e-03 7.1988e-03 1.4398e-02
2.9996e-03 5.9992e-03 1.5398e-02 3.0595e-02
5.7992e-03 1.1598e-02 2.8796e-02 5.7591e-02
with the attached patch I get
[t1,t2] = testsubsassgn([100,200,500,1000],[100,200,500,1000],5)
t1 =
2.0000e-04 8.0000e-04 1.5996e-03 3.3994e-03
4.0000e-04 7.9980e-04 1.7996e-03 3.5994e-03
4.0000e-04 5.9980e-04 1.9996e-03 4.3994e-03
4.0000e-04 9.9980e-04 2.7996e-03 5.7992e-03
t2 =
7.9980e-04 1.7996e-03 4.1994e-03 8.5988e-03
1.3998e-03 2.9996e-03 6.9990e-03 1.4398e-02
2.9994e-03 6.3990e-03 1.5398e-02 3.0395e-02
5.7990e-03 1.1598e-02 2.8596e-02 5.7991e-02
the times are in seconds for 5 runs and 100x100, 100x200, ... 500x1000,
1000x1000 matrix constructed like "for i = 1:n, a(:,i) = ...; end" for
the first and "col=[1:n]"; for i = 1:n, a(col,i) = ...; end;" for the
second. The reason for the second test is to see the cost of the test
for is_colon on the cases where the index isn't a colon.
For the colon case the speed can be enormous.. Dividing one by the other
gives
0.50000 1.00000 0.88877 0.94438
1.00000 0.79996 0.89998 0.85716
0.66689 0.42855 0.66667 0.73336
0.50013 0.55551 0.60869 0.67442
The numerically indexed case has a few speed-ups, but some small
slow-downs as well. Dividing the t2 values give.
0.99975 1.00000 1.00000 0.97727
1.00000 1.07144 0.97225 1.00001
0.99993 1.06664 1.00003 0.99345
0.99997 0.99998 0.99307 1.00694
so the numerical indexing is basically identical and the differences are
probably just measurement error. Given that the "a(:,i)" indexing case
is really very common and that there can be almost a doubling in speed I
think the attached patch should probably be applied.. I also ran "make
check" with no issues with the attached change (sorry no changelog,
though will create one later)..
Another thing I noticed is that in ::assign there is the code
return (lhs.ndims () == 2
&& (n_idx == 1
|| (n_idx < 3
&& rhs.ndims () == 2
&& rhs.rows () == 0 && rhs.columns () == 0)))
? assign2 (lhs, rhs, rfv) : assignN (lhs, rhs, rfv);
essentially assign2 is only called 2-D matrices with empty matrices for
the RHS or for 2-D matrices indexed with 1 index vector. The first some
case is trivial and just calls maybe_delete_elements, which is also in
assignN. So the above might be simplified to
return (lhs.dims () == 2 && n_idx == 1) ? assign2 (lhs, rhs, rfv) :
assignN (lhs, rhs, rfv);
in fact then perhaps assign2 might be significantly simplified, though
I'd rather not do that just before a 3.0 release. In any case the above
change makes no difference on the speed.
D.
Index: liboctave/Array.cc
===================================================================
RCS file: /usr/local/cvsroot/octave/liboctave/Array.cc,v
retrieving revision 1.159
diff -c -r1.159 Array.cc
*** liboctave/Array.cc 6 Sep 2007 12:08:44 -0000 1.159
--- liboctave/Array.cc 6 Sep 2007 20:59:53 -0000
***************
*** 1355,1361 ****
iidx++;
else
{
! new_data[ii] = elem (i);
ii++;
}
}
--- 1355,1361 ----
iidx++;
else
{
! new_data[ii] = xelem (i);
ii++;
}
}
***************
*** 1441,1447 ****
iidx++;
else
{
! new_data[ii] = elem (i);
ii++;
}
--- 1441,1447 ----
iidx++;
else
{
! new_data[ii] = xelem (i);
ii++;
}
***************
*** 1558,1564 ****
else
{
for (octave_idx_type i = 0; i < nr; i++)
! new_data[nr*jj+i] = elem (i, j);
jj++;
}
}
--- 1558,1564 ----
else
{
for (octave_idx_type i = 0; i < nr; i++)
! new_data[nr*jj+i] = xelem (i, j);
jj++;
}
}
***************
*** 1621,1627 ****
else
{
for (octave_idx_type j = 0; j < nc; j++)
! new_data[new_nr*j+ii] = elem (i, j);
ii++;
}
}
--- 1621,1627 ----
else
{
for (octave_idx_type j = 0; j < nc; j++)
! new_data[new_nr*j+ii] = xelem (i, j);
ii++;
}
}
***************
*** 1919,1925 ****
octave_idx_type kidx
= ::compute_index (temp_result_idx,
new_lhs_dim);
! new_data[kidx] = elem (result_idx);
}
increment_index (result_idx, lhs_dims);
--- 1919,1925 ----
octave_idx_type kidx
= ::compute_index (temp_result_idx,
new_lhs_dim);
! new_data[kidx] = xelem (result_idx);
}
increment_index (result_idx, lhs_dims);
***************
*** 1977,1983 ****
}
else
{
! new_data[ii++] = elem (lhs_ra_idx);
}
increment_index (lhs_ra_idx, lhs_dims);
--- 1977,1983 ----
}
else
{
! new_data[ii++] = xelem (lhs_ra_idx);
}
increment_index (lhs_ra_idx, lhs_dims);
***************
*** 2506,2511 ****
--- 2506,2513 ----
}
else
{
+ lhs.make_unique ();
+
if (rhs_len == n || rhs_len == 1)
{
octave_idx_type max_idx = lhs_idx.max () + 1;
***************
*** 2515,2534 ****
if (rhs_len == n)
{
! for (octave_idx_type i = 0; i < n; i++)
{
! octave_idx_type ii = lhs_idx.elem (i);
! lhs.elem (ii) = rhs.elem (i);
}
}
else if (rhs_len == 1)
{
RT scalar = rhs.elem (0);
! for (octave_idx_type i = 0; i < n; i++)
{
! octave_idx_type ii = lhs_idx.elem (i);
! lhs.elem (ii) = scalar;
}
}
else
--- 2517,2552 ----
if (rhs_len == n)
{
! if (lhs_idx.is_colon ())
{
! for (octave_idx_type i = 0; i < n; i++)
! lhs.xelem (i) = rhs.elem (i);
! }
! else
! {
! for (octave_idx_type i = 0; i < n; i++)
! {
! octave_idx_type ii = lhs_idx.elem (i);
! lhs.xelem (ii) = rhs.elem (i);
! }
}
}
else if (rhs_len == 1)
{
RT scalar = rhs.elem (0);
! if (lhs_idx.is_colon ())
{
! for (octave_idx_type i = 0; i < n; i++)
! lhs.xelem (i) = scalar;
! }
! else
! {
! for (octave_idx_type i = 0; i < n; i++)
! {
! octave_idx_type ii = lhs_idx.elem (i);
! lhs.xelem (ii) = scalar;
! }
}
}
else
***************
*** 2546,2555 ****
if (lhs_dims.all_zero ())
{
lhs.resize_no_fill (rhs_len);
for (octave_idx_type i = 0; i < rhs_len; i++)
! lhs.elem (i) = rhs.elem (i);
}
else if (rhs_len != lhs_len)
(*current_liboctave_error_handler)
--- 2564,2575 ----
if (lhs_dims.all_zero ())
{
+ lhs.make_unique ();
+
lhs.resize_no_fill (rhs_len);
for (octave_idx_type i = 0; i < rhs_len; i++)
! lhs.xelem (i) = rhs.elem (i);
}
else if (rhs_len != lhs_len)
(*current_liboctave_error_handler)
***************
*** 2669,2674 ****
--- 2689,2696 ----
if (n > 0 && m > 0)
{
+ lhs.make_unique ();
+
MAYBE_RESIZE_LHS;
RT scalar = xrhs.elem (0, 0);
***************
*** 2679,2685 ****
for (octave_idx_type i = 0; i < n; i++)
{
octave_idx_type ii = idx_i.elem (i);
! lhs.elem (ii, jj) = scalar;
}
}
}
--- 2701,2707 ----
for (octave_idx_type i = 0; i < n; i++)
{
octave_idx_type ii = idx_i.elem (i);
! lhs.xelem (ii, jj) = scalar;
}
}
}
***************
*** 2688,2693 ****
--- 2710,2717 ----
&& (rhs_nr == 1 || rhs_nc == 1)
&& n * m == rhs_nr * rhs_nc)
{
+ lhs.make_unique ();
+
MAYBE_RESIZE_LHS;
if (n > 0 && m > 0)
***************
*** 2700,2712 ****
for (octave_idx_type i = 0; i < n; i++)
{
octave_idx_type ii = idx_i.elem (i);
! lhs.elem (ii, jj) = xrhs.elem (k++);
}
}
}
}
else if (n == rhs_nr && m == rhs_nc)
{
MAYBE_RESIZE_LHS;
if (n > 0 && m > 0)
--- 2724,2738 ----
for (octave_idx_type i = 0; i < n; i++)
{
octave_idx_type ii = idx_i.elem (i);
! lhs.xelem (ii, jj) = xrhs.elem (k++);
}
}
}
}
else if (n == rhs_nr && m == rhs_nc)
{
+ lhs.make_unique ();
+
MAYBE_RESIZE_LHS;
if (n > 0 && m > 0)
***************
*** 2717,2723 ****
for (octave_idx_type i = 0; i < n; i++)
{
octave_idx_type ii = idx_i.elem (i);
! lhs.elem (ii, jj) = xrhs.elem (i, j);
}
}
}
--- 2743,2749 ----
for (octave_idx_type i = 0; i < n; i++)
{
octave_idx_type ii = idx_i.elem (i);
! lhs.xelem (ii, jj) = xrhs.elem (i, j);
}
}
}
***************
*** 2862,2881 ****
}
else if (len == rhs_nr * rhs_nc)
{
! for (octave_idx_type i = 0; i < len; i++)
{
! octave_idx_type ii = idx_i.elem (i);
! lhs.elem (ii) = xrhs.elem (i);
}
}
else if (rhs_is_scalar)
{
RT scalar = rhs.elem (0, 0);
! for (octave_idx_type i = 0; i < len; i++)
{
! octave_idx_type ii = idx_i.elem (i);
! lhs.elem (ii) = scalar;
}
}
else
--- 2888,2927 ----
}
else if (len == rhs_nr * rhs_nc)
{
! lhs.make_unique ();
!
! if (idx_i.is_colon ())
! {
! for (octave_idx_type i = 0; i < len; i++)
! lhs.xelem (i) = xrhs.elem (i);
! }
! else
{
! for (octave_idx_type i = 0; i < len; i++)
! {
! octave_idx_type ii = idx_i.elem (i);
! lhs.xelem (ii) = xrhs.elem (i);
! }
}
}
else if (rhs_is_scalar)
{
+ lhs.make_unique ();
+
RT scalar = rhs.elem (0, 0);
! if (idx_i.is_colon ())
{
! for (octave_idx_type i = 0; i < len; i++)
! lhs.xelem (i) = scalar;
! }
! else
! {
! for (octave_idx_type i = 0; i < len; i++)
! {
! octave_idx_type ii = idx_i.elem (i);
! lhs.xelem (ii) = scalar;
! }
}
}
else
***************
*** 2934,2941 ****
else if (n_idx == 1)
{
idx_vector iidx = idx(0);
! if (! (iidx.is_colon ()
|| (iidx.one_zero_only ()
&& iidx.orig_dimensions () == lhs.dims ())))
(*current_liboctave_warning_with_id_handler)
--- 2980,2988 ----
else if (n_idx == 1)
{
idx_vector iidx = idx(0);
+ int iidx_is_colon = iidx.is_colon ();
! if (! (iidx_is_colon
|| (iidx.one_zero_only ()
&& iidx.orig_dimensions () == lhs.dims ())))
(*current_liboctave_warning_with_id_handler)
***************
*** 2959,2980 ****
}
else if (len == rhs.length ())
{
! for (octave_idx_type i = 0; i < len; i++)
{
! octave_idx_type ii = iidx.elem (i);
! lhs.elem (ii) = rhs.elem (i);
}
}
else if (rhs_is_scalar)
{
RT scalar = rhs.elem (0);
! for (octave_idx_type i = 0; i < len; i++)
{
! octave_idx_type ii = iidx.elem (i);
! lhs.elem (ii) = scalar;
}
}
else
--- 3006,3047 ----
}
else if (len == rhs.length ())
{
! lhs.make_unique ();
!
! if (iidx_is_colon)
{
! for (octave_idx_type i = 0; i < len; i++)
! lhs.xelem (i) = rhs.elem (i);
! }
! else
! {
! for (octave_idx_type i = 0; i < len; i++)
! {
! octave_idx_type ii = iidx.elem (i);
! lhs.xelem (ii) = rhs.elem (i);
! }
}
}
else if (rhs_is_scalar)
{
RT scalar = rhs.elem (0);
! lhs.make_unique ();
!
! if (iidx_is_colon)
{
! for (octave_idx_type i = 0; i < len; i++)
! lhs.xelem (i) = scalar;
! }
! else
! {
! for (octave_idx_type i = 0; i < len; i++)
! {
! octave_idx_type ii = iidx.elem (i);
! lhs.xelem (ii) = scalar;
! }
}
}
else
***************
*** 3131,3136 ****
--- 3198,3205 ----
if (rhs_is_scalar)
{
+ lhs.make_unique ();
+
if (n_idx < orig_lhs_dims_len)
lhs = lhs.reshape (lhs_dims);
***************
*** 3146,3156 ****
octave_idx_type len = frozen_len(0);
! for (octave_idx_type i = 0; i < len; i++)
{
! octave_idx_type ii = iidx.elem (i);
! lhs.elem (ii) = scalar;
}
}
else if (lhs_dims_len == 2 && n_idx == 2)
--- 3215,3233 ----
octave_idx_type len = frozen_len(0);
! if (iidx.is_colon ())
{
! for (octave_idx_type i = 0; i < len; i++)
! lhs.xelem (i) = scalar;
! }
! else
! {
! for (octave_idx_type i = 0; i < len; i++)
! {
! octave_idx_type ii = iidx.elem (i);
! lhs.xelem (ii) = scalar;
! }
}
}
else if (lhs_dims_len == 2 && n_idx == 2)
***************
*** 3161,3173 ****
octave_idx_type i_len = frozen_len(0);
octave_idx_type j_len = frozen_len(1);
! for (octave_idx_type j = 0; j < j_len; j++)
{
! octave_idx_type jj = idx_j.elem (j);
! for (octave_idx_type i = 0; i < i_len; i++)
{
! octave_idx_type ii = idx_i.elem (i);
! lhs.elem (ii, jj) = scalar;
}
}
}
--- 3238,3264 ----
octave_idx_type i_len = frozen_len(0);
octave_idx_type j_len = frozen_len(1);
! if (idx_i.is_colon())
{
! for (octave_idx_type j = 0; j < j_len; j++)
{
! octave_idx_type off = new_dims (0) *
! idx_j.elem (j);
! for (octave_idx_type i = 0; i < i_len; i++)
! lhs.xelem (i + off) = scalar;
! }
! }
! else
! {
! for (octave_idx_type j = 0; j < j_len; j++)
! {
! octave_idx_type off = new_dims (0) *
! idx_j.elem (j);
! for (octave_idx_type i = 0; i < i_len; i++)
! {
! octave_idx_type ii = idx_i.elem (i);
! lhs.xelem (ii + off) = scalar;
! }
}
}
}
***************
*** 3181,3187 ****
{
Array<octave_idx_type> elt_idx = get_elt_idx
(idx, result_idx);
! lhs.elem (elt_idx) = scalar;
increment_index (result_idx, frozen_len);
}
--- 3272,3278 ----
{
Array<octave_idx_type> elt_idx = get_elt_idx
(idx, result_idx);
! lhs.xelem (elt_idx) = scalar;
increment_index (result_idx, frozen_len);
}
***************
*** 3203,3208 ****
--- 3294,3301 ----
}
else
{
+ lhs.make_unique ();
+
if (n_idx < orig_lhs_dims_len)
lhs = lhs.reshape (lhs_dims);
***************
*** 3216,3226 ****
octave_idx_type len = frozen_len(0);
! for (octave_idx_type i = 0; i < len; i++)
{
! octave_idx_type ii = iidx.elem (i);
! lhs.elem (ii) = rhs.elem (i);
}
}
else if (lhs_dims_len == 2 && n_idx == 2)
--- 3309,3327 ----
octave_idx_type len = frozen_len(0);
! if (iidx.is_colon ())
{
! for (octave_idx_type i = 0; i < len; i++)
! lhs.xelem (i) = rhs.elem (i);
! }
! else
! {
! for (octave_idx_type i = 0; i < len; i++)
! {
! octave_idx_type ii = iidx.elem (i);
! lhs.xelem (ii) = rhs.elem (i);
! }
}
}
else if (lhs_dims_len == 2 && n_idx == 2)
***************
*** 3232,3246 ****
octave_idx_type j_len = frozen_len(1);
octave_idx_type k = 0;
! for (octave_idx_type j = 0; j < j_len; j++)
{
! octave_idx_type jj = idx_j.elem (j);
! for (octave_idx_type i = 0; i < i_len; i++)
{
! octave_idx_type ii = idx_i.elem (i);
! lhs.elem (ii, jj) = rhs.elem (k++);
}
}
}
else
{
--- 3333,3363 ----
octave_idx_type j_len = frozen_len(1);
octave_idx_type k = 0;
! if (idx_i.is_colon())
{
! for (octave_idx_type j = 0; j < j_len; j++)
{
! octave_idx_type off = new_dims (0) *
! idx_j.elem (j);
! for (octave_idx_type i = 0;
! i < i_len; i++)
! lhs.xelem (i + off) = rhs.elem (k++);
}
}
+ else
+ {
+ for (octave_idx_type j = 0; j < j_len; j++)
+ {
+ octave_idx_type off = new_dims (0) *
+ idx_j.elem (j);
+ for (octave_idx_type i = 0; i < i_len;
i++)
+ {
+ octave_idx_type ii = idx_i.elem (i);
+ lhs.xelem (ii + off) = rhs.elem (k++);
+ }
+ }
+ }
+
}
else
{
***************
*** 3252,3258 ****
{
Array<octave_idx_type> elt_idx = get_elt_idx
(idx, result_idx);
! lhs.elem (elt_idx) = rhs.elem (i);
increment_index (result_idx, frozen_len);
}
--- 3369,3375 ----
{
Array<octave_idx_type> elt_idx = get_elt_idx
(idx, result_idx);
! lhs.xelem (elt_idx) = rhs.elem (i);
increment_index (result_idx, frozen_len);
}
Index: liboctave/Array.h
===================================================================
RCS file: /usr/local/cvsroot/octave/liboctave/Array.h,v
retrieving revision 1.100
diff -c -r1.100 Array.h
*** liboctave/Array.h 6 Sep 2007 12:08:45 -0000 1.100
--- liboctave/Array.h 6 Sep 2007 20:59:53 -0000
***************
*** 110,115 ****
--- 110,120 ----
//--------------------------------------------------------------------
+ public:
+
+ // !!! WARNING !!! -- these should be protected, not public. You
+ // should not access these methods directly!
+
void make_unique (void)
{
if (rep->count > 1)
***************
*** 130,137 ****
rep->fill (val);
}
- public:
-
typedef T element_type;
// !!! WARNING !!! -- these should be protected, not public. You
--- 135,140 ----
function [t1, t2] = testsubsassgn (m, n, nrun)
if (nargin < 3)
nrun = 3;
endif
if (nargin < 2)
n = [10, 100];
endif
if (nargin < 1)
m = [10, 100];
endif
tcat = zeros (length(m), length(n));
tsubs = zeros (length(m), length(n));
for i = 1 : length(m)
col = ones (m(i), 1);
col2 = [1:m(i)];
for j = 1 : length(n)
for k = 1 : nrun
b = zeros (m(i), n(j));
t0 = cputime ();
for l = 1 : n (j)
b(:,l) = col;
endfor
t1 (i, j) = cputime () - t0;
b = zeros(m(i), n(j));
t0 = cputime();
for l = 1 : n (j)
b(col2,l) = col;
endfor
t2 (i, j) = cputime () - t0;
endfor
t1 (i, j) /= nrun;
t2 (i, j) /= nrun;
endfor
endfor
endfunction