On Fri, Sep 3, 2010 at 3:58 AM, CdeMills
<address@hidden> wrote:
I'm happy to see discussions about the dataframe concept. Regarding cell
output format, the problems is as follows. After selecting some sub-range of
size mxn, we may or may not need some extra information in order to rebuild
a full dataframe for it. If we don't care, a cell array of size mxn, is just
fine, this is the purpose of df.as.cell. If we care, then headers are added
as two lines with column names and types as strings, and two columns with ID
as unsigned int and row name as string. This compound table can't not just
be further dereferenced, as the real content is offseted by two lines and
two columns.
First off I want to say that I really appreciate that you are working on this because it's sorely needed. I'm going to look at the code. My initial reaction is that it seems like it would be more flexible to store each column as vectors and headings as a separate vector, but it sounds like as currently implemented, everything is stored in a single cell array.
The operations of sub-ranging and cell conversion can thus not
be performed separately, this is the logic behind
df.cell(some range). It results in one call to subsref, all the accessors
being packed together.
As a said, a dataframe should mimic as much as possible a matrix. If
a=randn(3, 3), then you can't say
cell(a). You have to explicitly call 'mat2cell'. This is why something
cell(df) looks a bit strange to me, besides the problem of not separating
subranging and output type conversion.
Ah. I see. cell() does behave differently from say double() or uint32() in that it doesn't create a cell array. I know it's annoying but in the end I think it may be better to use a notation closer to mat2cell(). I do think it would be nice to actually use the '{}' indexing for something. Nothing anywhere required that '{}' has to actually return a cell array unless the octave parser does something wierd (in which case we should fix the parser).
--judd