octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

strcmp optimized + question


From: Jaroslav Hajek
Subject: strcmp optimized + question
Date: Fri, 20 Feb 2009 11:17:59 +0100

hi,

I have recently implemented a mechanism for octave_cell to cache the
string array value after requested. This got the issorted function
back working for cell arrays (after John's removal of
Array<octave_value> sorting code), and also improves efficiency for
various methods treating expecting a cell array to be a cell array of
strings.

In particular, the following changeset:
http://hg.savannah.gnu.org/hgweb/octave/rev/c3445f1c8cb4
reuses the cache in strcmp.

Here's a short benchmark, that creates an array of 2000 random 20-char
strings, and then matches 5000 strings against them using strcmp. This
simulates something like searching in a pre-built dictionary.

# create 2000 random strings of length 20.
a = char (65+floor (26*rand (2000, 20)));
a = cellstr (a);

# pick 5000 at random
idx = ceil (length (a) * rand (1, 5000));
b = a(idx);

# match each of them against the original array
tic;
for i = 1:5000
  m = strcmp (a, b{i});
endfor
toc

with a recent tip, I get:

Elapsed time is 7.10118 seconds.

while with the newest I get:

Elapsed time is 0.669188 seconds.

as you can see, strcmp is approximately 10-times faster on subsequent
calls (the cost of the first call will be roughly equivalent). The
downside is that the cache will occupy additional memory. Does anyone
see this as a problem? I don't think Octave is much used for heavy
string processing, and anyway using a cell array is not particularly
efficient for lots of strings per se...

I promised a question, so here it comes:
In Octave, as well as M*b 2007a, strcmp silently ignores cells that
are not strings. However, there's no mention about this in the
documentation, neither in Octave's nor Matlab's. The Matlab manual
says the function expects cell arrays of strings. Should Octave warn
if that is not the case? That would be incompatible but it seems sane
to me. Otherwise, maybe we should update the docs.

cheers

-- 
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]