octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Vectorizing strcmp


From: John W. Eaton
Subject: Vectorizing strcmp
Date: Wed, 15 Sep 2004 13:18:13 -0400

On 15-Sep-2004, Keith Goodman <address@hidden> wrote:

| On the one hand, it's nice that strcmp is a user-defined
| function---it's easy to look at the code to see what it does.

Perhaps it is easier because it is a user-defined function, but with
Octave, unlike the other leading brand, you can always look at all of
the source code for any of the functions.

| On the other hand, the reason I'm looking at the code is that it is slow.
| 
| I'm using strcmp(s1,s2) where s1 is a cell array of strings (around
| 1000X1) and s2 is a string.
| 
| Currently, in 2.1.58, strcmp does
| 
| n = length(t1);
| for i = 1:n
|    retval(i,:) = strcmp (t1{i}, s2);
| endfor
| 
| which recursively calls strcmp for two strings and where t1 is s1(:).
| The for loop is slow.
| 
| Short of making strcmp a built-in function, one way to speed it up is
| to replace the for loop with a repmat like this
| 
| c1 = char(t1);
| [rc1, cc1] = size(c1);
| ns2 = size(s2,2);
| blank = ' ';
| ms2 = repmat([s2 repmat(blank,1,max(cc1 - ns2, 0))], rc1, 1);
| mc1 = [c1 repmat(blank,1,max(ns2 - cc1, 0))];

| retval2 = sum(mc1==ms2, 2)==size(mc1, 2);

Wouldn't

  all (mc1 == ms2, 2)

be better here?

| The second code fragment is about 10 times faster than the first for
| length(s1)==1000. For length(s1)==10 the speed is about the same. And
| for length(s1) < 10 the first approach is faster.
| 
| For me the added complexity in the code is more than offset by the
| gain in speed.

I think we need to be careful about using repmat in this way.  If your
cell array is large and the comparison string is fairly long, it could
easily generate a matrix many megabytes in size.  Following that with
code that generates additional temporary arrays could cause trouble.
Maybe it could be done in blocks rather than all at once?

Or, perhaps it should be a builtin function.

jwe



reply via email to

[Prev in Thread] Current Thread [Next in Thread]