|
From: | Markus Bergholz |
Subject: | Re: xlsread in Octave 3.6.4 |
Date: | Mon, 2 Sep 2013 11:38:50 +0200 |
On Sun, Sep 1, 2013 at 11:42 PM, PhilipNienhuis <address@hidden> wrote:
Markus Bergholz wrote
> now it's faster than matlab!!> On Sun, Jun 2, 2013 at 10:25 PM, Markus Bergholz <
> matlab takes ~100 seconds
> xlsxread in octave ~80 seconds
> http://p.osuv.de/index.php/ZuBLam/ (autodelete after 5 days)
> i will push my modifications later.
>
>
> markuman@
> > wrote:
>
>>
>>
>>
>> On Sun, May 12, 2013 at 9:26 PM, Philip Nienhuis <
> pr.nienhuis@
> >wrote:
>>
>>> Markus Bergholz wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Wed, May 8, 2013 at 10:06 AM, PhilipNienhuis <
> pr.nienhuis@
> >>> <mailto:
> pr.nienhuis@
>>>> http://git.osuv.de/Octave/**tree/functions/xlsxread.m<http://git.osuv.de/Octave/tree/functions/xlsxread.m>
> >**> wrote:
>>>>
>>>> E4
>>>> Markus Bergholz wrote
>>>> > I haven't follow this thread and it's issue, but i've wrote a
>>>> xlsxread
>>>> > function whitch don't need java.
>>>> > but it's very very rudimentary, works just with linux and is a
>>>> quick&dirty
>>>> > write-down.
>>>> > furthermore, you have to remove the string-analyse part, if your
>>>> sheet
>>>> > don't contain strings.
>>>> > but maybe it helps someone else or someone want to improve it or
>>>> someone
>>>> > rewrite it in c/c++ as oct file, to get it even faster than
>>>> matlab (for me
>>>> > it's still faster than the java stuff atm).
>>>> >
>>>> >
>>> xlsopen-xls2oct-[parsecell-]**oct2xls-xlsclose sequences. So we'd be>>>>
>>>> The Java based options are relatively slow as they offer maximum
>>>> flexibility
>>>> as regards data types.
>>>>
>>>> Before venturing in COM/ActiveX and Java based solutions for the io
>>>> pkg 4
>>>> years ago I've looked at a few other solutions, similar to yours.
>>>> IIRC the
>>>> most promising one was posted in an OpenWatcom news group. All of
>>>> them (i.e.
>>>> the "free solutions") suffered from the same limitations: lack of
>>>> flexibility, lack of documentation, dependency on some very
>>>> specific
>>>> development framework, and/or bound to specific .xls formats
>>>> (BIFF5,
>>>> BIFF8,
>>>> OOXML, what not).
>>>>
>>>> If you want I can look if your code can somehow be absorbed in the
>>>> io pkg as
>>>> a sort of fall-back option.
>>>>
>>>>
>>>> i don't think that this is a good idea :D as i said, it just works with
>>>> linux (i'm using sed and unzip through 'system' command. furthermore, i
>>>> made quick&dirty my own tmp-dir (mktemp -d would be better). aaaaaand
>>>> so
>>>> on :)
>>>>
>>>> To that end it needs a suitable license
>>>>
>>>>
>>>> i don't care about the licence as long as it's a free licence.
>>>>
>>>> and
>>>> someone should support/maintain it (my C/C++ skills are
>>>> rudimentary).
>>>>
>>>> Philip
>>>>
>>>>
>>>> my c/c++ skills are rudimentary too :)
>>>> if you like, we could code together on github on a xlsxread function
>>>> e.g..
>>>> it is not so difficult but it is extremely time-consuming to parse the
>>>> shitty ms xml format!! (i don't read any specs yet, just do some lousy
>>>> reverse engineering).
>>>>
>>>
>>> Weighing the amount of work needed to build a good, robust and
>>> fool-proof
>>> C+/C-based xlsread backend versus already having available a well-tested
>>> choice of working (albeit relatively slow [1]) solutions, I just fail to
>>> see the benefits of reinventing the wheel.
>>>
>>> Just for the record & to emphasize an important aspect, I myself don't
>>> use xlsread (or xlswrite), I usually invoke the much more flexible
> address@hidden>>> talking about another interface in xlsopen/xls2oct/xlsclose rather than
>>> xlsread.
>>>
>>> Philip
>>>
>>> [1] OpenOffice / LibreOffice are really fast for large spreadsheets, I
>>> doubt a 2-person amateur team can beat the OOo/LO devs as regards speed
>>> tuning; the only problem is start-up time of OOo/LO.
>>> Oh and there's a currently unsolvable Java-UNO issue outlined when you
>>> use it for the first time.
>>> BTW a while ago I had a try with Starbasic (& ActiveX) invoking
>>> LibreOffice for spreadsheet I/O. I already had some success, but I had
>>> to
>>> put it away due to lack of time. Maybe next summer I can look at it
>>> again.
>>> Maybe that can be made cross-platform too.
>>>
>>
>>
>> I've do a rewrite of my xlsxread function and push it to github
>> https://github.com/markuman/xlsxread/
>> it is ~10% faster now, (still faster than the java version, but still
>> slow!)
>> Theoretical this could work in windows now too, but the unzip command in
>> octave don't accept the .xlsx extension:
>> warning: unrecognized file type, .xlsx
>> So i have to use a system command again (see line 47-51
>> https://github.com/markuman/xlsxread/blob/master/xlsxread.m )
>> strings are not recognized too atm. so it's still limited.
>> if someone has an idea how to improve it, i'd like so see some forks :D
>>
>>
>>
>>
>>
>
>
> --
> icq: 167498924
> XMPP|Jabber:
>
> _______________________________________________
> Help-octave mailing list
> Help-octave@
> https://mailman.cae.wisc.edu/listinfo/help-octave
Hi Markus,
Tonight I had a brief glance of your code and tried a few command lines from
your .m files. Nice stuff.I encountered a few hurdles (e.g., no unzip binaries in the MXE builds f
Windows) but OK that was easily solved.yes, this is already fixed. see: http://savannah.gnu.org/bugs/index.php?39148A first try, concerning a simple xlsx file from my test suite with one text
string inside a square, otherwise numerical cell range, breaks in the
reshape stage because your regexp line doesn't recognize and thus skips
<f></f> tags that AFAICS seem to be used for booleans (rather than <v></v>
tags).
Note that the enclosing <c...> (column) tags indicate the cell type, so in
principle text strings can be extracted as well.yes :) it's all not supported atm.I'd expect a next hurdle to be "merged" cells. But maybe that is easy.
It is probably not so hard to properly parse the xml worksheet files so that
text strings and booleans + probably formulas are read. But I am sure it
will induce a speed penalty.
yes, it will :)my very first quick and dirty version did one sed command for parse line by line.this is the easiest but slowest (but still faster than java!) way to parse it.i made the last changes ~3 month ago https://github.com/markuman/xlsxread/but i've never pushed my last commit with a 10% working range-read regexp part (that's another braking part).So xlsxread is always on my mind, but i did roughly nothing in my semester break ;)In ~2-3 weeks i'll be more active again.All in all I think the blazing speed you claim (a claim I believe as-is)
comes at the cost of robustness and some flexibility. To be able to be
included in the io package I think some of the speed has to be sacrificed to
get some more robust code that won't provoke too many bug reports.
BTW I saw str2num being used to convert text to doubles. Any reason for
that? I ask because str2double is known to be much faster.
I don't know when I can have another look. Your code is promising though;
I'd like to amend and include it in the near future in the io package.But to that end I hope you can make up your mind about the license. Would
you agree with GPL 3? I don't know if the current "do what the f**k you want
to" license is compatible with GPL 3 and thus compatible with the rest of
the io package.
GPL3 is fine too for me.feel free to fork it on github and commit it with a new licence and the str2double replacement :P
Philip
--
View this message in context: http://octave.1599824.n4.nabble.com/xlsread-in-Octave-3-6-4-tp4652046p4656979.html
Sent from the Octave - General mailing list archive at Nabble.com.
_______________________________________________
Help-octave mailing list
address@hidden
https://mailman.cae.wisc.edu/listinfo/help-octave
--
icq: 167498924
XMPP|Jabber: address@hidden
[Prev in Thread] | Current Thread | [Next in Thread] |