octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug] strread() elaborated format strings


From: Philip Nienhuis
Subject: Re: [Bug] strread() elaborated format strings
Date: Mon, 14 May 2012 21:12:37 +0200
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Hi Júlio:

Júlio Hoffimann wrote:
Thanks Philip, that's why i didn't found in the code where %[^] was
being handled. I know John wants to rewrite some of this I/O functions
in pure C++, but if i have time to do a quick fix in strread.m, should i
add something around the line 473? It's the section where we need a new
branch for dealing with the mentioned format specifier?

I'm not quite sure whether %[] format specifiers can be implemented efficiently in strread's current form.

Last summer I tried to get it together but it turned out to be a messy affair full of gotchas and corner cases, and as a consequence, lots of if clauses and thus very slow code. (I actually needed %[] and %[^] myself but luckily I found ways to avoid them. Plus, at work we do have Matlab.) In addition, IIRC later on Rik found that Octave's regexp (based on pcre) is relatively slow, and I think for each %[] we need one or two calls to regexp().

Nevertheless, if you really need %[] I'm happy to again look into it. The code for splitting the data into columns is much more reliable these days so perhaps %[] can be made to work now.

The very best option would be to implement a binary (compiled) textscan as work horse for strread (instead of vice versa). A while ago John has sent me a rough textscan.c framework it but I lack C++ proficiency (and I suppose John lacks time). So, the question remains whether it is at all worthwile to again invest in strread.m given the plans to have a binary textscan()

Anyway, if you're in a hurry, be my guest to give it a try.
Note: currently there are some pending strread.m fixes in the bug tracker. See bugs #36356 + #36392 and #36398 (the last one should be rebased). (I can't push those as my hgrc/mercurial setup got fubarred repeatedly and I have neither time nor appetite to again fix it.)

Some guidelines (don't be put off):
First, you'd have to adapt the format string parsing code (L.284-309 in -my patched see bug #s above- strread.m) to correctly parse and isolate %[] specifiers. Shouldn't be too hard. Next you'll have to adapt the format string matching code in L.450-530, and adapt the column-splitting code in L.532-618. Especially this part of the code is where I expect you to spend many an evening. (But who knows...) Then, further below you'd have to add a stanza for processing the %[] specifiers to every matching column. Probably a breeze once the column splitting is right. Finally, a fair number of test cases should be added, covering all imaginable corner cases. Have Matlab at hand for comparison.

Bug reported: https://savannah.gnu.org/bugs/index.php?36464

Thanks,
I'll first add a format scan for all not(-yet)-implemented ML format specifiers + error msg. Only then I'll start thinking about %[] (unless you or someone else beats me to it).

Philip


reply via email to

[Prev in Thread] Current Thread [Next in Thread]