[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How to load data files with mixed str & numerical data with headers
From: |
Philip Nienhuis |
Subject: |
Re: How to load data files with mixed str & numerical data with headers "*.txt.gz" or "*.xml.tgz" |
Date: |
Mon, 20 Aug 2012 15:45:52 -0700 (PDT) |
Shoumei wrote
>
> http://octave.1599824.n4.nabble.com/file/n4642993/GSE1-2.txt GSE1-2.txt
> The extracted data files are txt files suffixed with ".soft' . I need to
> extract from the files the data matrixices of usually repeated 44Kxn
> tables/matrices with a start line ' !series_matrix_table_begin' and end
> with ' !series_matrix_table_end'. The string data are 44Kx1 and the
> numerical data are 44x(n-1). I could possibly ignore the other ! comments.
> I knew the exact matrix size for each repeated matrixices.
> The example file attached included data of two samples (GSM1&GSM2), each
> with a 2x2 matrix .
>
That file format looks easy to read. Dataloggers we use at work yield more
or less the same file structure (header followed by data sections between
"something-like-begin-data" and "something-like-end-data" lines) and we have
several Matlab scripts for reading those into a struct.
The file header is parsed line by line into separate fields, the data
sections into dedicated data fields (often numeric arrays, sometimes cell
arrays).
> The complete txt file could not be loaded in excel or word.
>
What MS-Office version did you use?
As I wrote, LibreOffice Calc 3.4 and later versions should be able to read
very very big files. But your computer RAM may be a limiting factor.
> The other tables in the sample file between"!platform_table_begin" and
> "!platform_table_end" are info related to each ID/string-This info I
> could process from other files so are ignored for the time being.
> When I dealt with small txt files I usually opened with excel and save the
> string (ID) as csv files and the numerical data as txt files. I had
> trouble comparing the strings with other string files if I save the whole
> file as csv and used "csv2cell' to load the data.
>
You should save only the individual data sections as .csv and read them with
csv2cell. Concatenating the sections can be done into a struct or so.
> I was not able to use"xlsread' with mixed string and numerical xls data
> initially so I did not persist in using it.
>
Remarkable .... spreadsheets are just made for that. But they are usually
fairly inefficient as far as RAM uage is concerned (because of a.o., the
formatting overhead).
Philip
--
View this message in context:
http://octave.1599824.n4.nabble.com/How-to-load-data-files-with-mixed-str-numerical-data-with-headers-txt-gz-or-xml-tgz-tp4642969p4642997.html
Sent from the Octave - General mailing list archive at Nabble.com.