octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: character matrix inputs to string functions


From: Andrew Janke
Subject: Re: character matrix inputs to string functions
Date: Thu, 20 Feb 2020 14:28:39 -0500
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Thunderbird/68.4.2



On 2/20/20 1:22 PM, Rik wrote:
On 02/20/2020 09:00 AM, address@hidden wrote:

On Thursday, 20 February 2020 14.35.19 WET Nicholas Jankowski wrote:
can you give a code example of what produces an error in matlab but not
octave? i may be misunderstanding your earlier comments.
c = ['1 2 3 '; '4 5 6 ']

strrep(c, '2', '0')

The call to strrep fails in Matlab since c is not a char array and it succeeds
in Octave (with an anti-intuitive result IMHO). My proposal is to do the same
in Octave.

[m,n] = size(c);

If m != 1 and n!= 1 then throw an error. I hope that this now makes sense. :-)

Regards,
-- José Matos

There is a much larger issue which should be resolved and that is how character matrix inputs should be handled by all string functions.

Consider this example,

octave:1> cstr = { 'Hello World' ; 'Goodbye World '}
cstr =
{
   [1,1] = Hello World
   [2,1] = Goodbye World
}

octave:2> strrep (cstr, 'World', 'Jane')
ans =
{
   [1,1] = Hello Jane
   [2,1] = Goodbye Jane
}

This does just what you would think.  Now try the same thing with a character matrix.

octave:3> chmat = char ('Hello World', 'Goodbye World')
chmat =

Hello World
Goodbye World

octave:4> strrep (chmat, 'World', 'Jane')
ans =

Hello World
Goodbye World

There is no substitution because the internal algorithm sees a string that is "HGeolo...."  In any case, the average user is going to be quite surprised by the apparent failure of the strrep function.  Restricting character input to be a row vector (1xN) restores the correct behavior of the function.

octave:5> strrep (chmat(1,:), 'World', 'Jane')
ans = Hello Jane

So, I think we (Octave community) need to make a decision about how we want to handle character matrix inputs and then propagate this change to all of the m-files in scripts/strings.

One obvious possibility is simply to follow Matlab and increase the level of input validation to reject character matrices.  The validation code is pretty simple

if (ischar (input))
   if (! isrow (input))
    error ("fcn_name: input must be a character string or cell array of strings");
   endif
elseif (iscellstr (input))
   ...
else
  error ("fcn_name: input must be a character string or cell array of strings");
endif

But Octave does try to see itself as a superset of Matlab.  We don't have to follow them slavishly.  In this case we could change the input validation to detect the character matrix and call the function recursively.  For example, this code converts the character matrix to a cell array of string, executes strrep, and converts the output back to a character matrix.

if (ischar (input))
   if (! isrow (input))
     retval = char (strrep (cellstr (input), pattern, replacement)));
     return;
   endif
endif

And it works,

octave:7> char (strrep (cellstr (chmat), 'World', 'Jane'))
ans =

Hello Jane
Goodbye Jane

Anyone want to comment on which approach they like and why?

--Rik

In my view, a 2-D char matrix is a special case, which represents a list or vector of "strings" as a blank-padded 2-D array, and should be treated as an alternate physical representation of the same sort of thing that a cellstr vector represents. Lots of Matlab and existing Octave functions operate on them this way.

So in this case, I'd suggest that doing the strrep substitution on a per-row basis would be a right and useful thing to do.

I don't think there's a big Matlab compatibility concern - things that throw errors aren't really functional space that Matlab is claiming and that is likely to be used in user code. So it's unlikely that defining useful behavior here would break any user code.

Cheers,
Andrew



reply via email to

[Prev in Thread] Current Thread [Next in Thread]