octave-patch-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-patch-tracker] [patch #7969] Organized textread.m for better err


From: Philip Nienhuis
Subject: [Octave-patch-tracker] [patch #7969] Organized textread.m for better error notification
Date: Sat, 09 Mar 2013 23:52:38 +0000
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Follow-up Comment #1, patch #7969 (project octave):

Thanks Júlio for cleaning up some cruft.

Some remarks:

- I don't think it is wise, nor polite, to silently ignore an "improper"
("crazy") user-specified setting for endofline; there should always be a
suitable warning if any user setting is deemed inappropriate.
But more to the point, why limit user's choices as to allowable EOLs in the
first place?
E.g., Matlab has no strict requirements for what an EOL should look like;
recent ML textread docs only say: 'a single character <not specified!> or
"\r\n"'.
If you interpret that as the "single character" to only be "\r" or "\n" I'd
agree that seems obvious - but it's not strictly according to the docs.
FYI, I've used textread to read old (if not fossil) "direct access" Fortran
(IIRC) files where line endings comprised special bytes (often null bytes) or
byte sequences. Admittedly a bit arcane but it does show there's a use for
unintuitive EOLs. (It may look weird but IMO there's nothing wrong with trying
to invoke textread or textscan to recover old data from almost binary data
files that were produced with long gone applications. Think of database files
where "traditional" EOLs can't be used to delimit records if some fields can
contain multi-line text)

- As to performance: I don't think there's much speed to gain in textread.m
(and for that matter, textscan.m). They're just front-ends.
If you want to really increase performance, please set yourself to producing a
compiled (binary) textscan.cc
AFAIK it was (is) intended to have a compiled (x)textscan.cc as backend for
strread textread and textscan itself.
JWE posted a skeleton some time ago (around August 4, 2011 in a message with
subject "strread.m").

The current workhorse strread.m is a dinosaur; I think it was set up the way
it was because that was the only way it could be efficiently vectorized; yet
that setup imposes several limitations. Over time, adding more options made it
a complicated animal until Rik and I agreed that spending more effort at it
was pointless; we're all waiting for a compiled backend.
Some initial profiling last year showed that -as expected- most of the CPU
time is lost in the various nested if-elseif-else-endif sequences and
cellfun() operations on data columns that are unavoidable because text files
have virtually unlimited variability.
If you're not put off by this (hopefully not) I'm at your disposal because
lots if not most of the current code in strread.m was added by me.


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/patch/?7969>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]