Re: Import large field-delimited file with strings and numbers

help-octave

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Import large field-delimited file with strings and numbers

From:	João Rodrigues
Subject:	Re: Import large field-delimited file with strings and numbers
Date:	Mon, 08 Sep 2014 18:54:22 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0


On 08-09-2014 17:49, Philip Nienhuis wrote:

<snip>
Yet, csv2cell is orders of magnitude faster. I will break the big file
into chunks (using fileread, strfind to determine newlines and fprintf)
and then apply csv2cell chunk-wise.

Why do you need to break it up using csv2cell? AFAICS that reads the entire
file and directly translates the data into "values" in the output cell
array, using very little temporary storage (the latter quite unlike
textscan/strread).
It does read the entire file twice, once to assess the required dimensions
for the cell array, the second (more intensive) pass for actually reading
the data.

The file I want to read has around 35 million rows, 15 columns and takes200 MB of disk space: csv2cell would simply eat up all memory and thecomputer stopped responding.

I tried to feed it small chunk of increasing size and found out that itbehaved well until it received a chunk of 500 million rows (when memoryuse went through the stratosphere).

So I opted for the clumsy solution of breaking the file into smallpieces and spoon feed csv2cell.

But then I found out something interesting. If I would save a cell with35 million rows and only 3 columns in gzip format it would take verylittle disk space (20 MB or so) but when I tried to open it... it wouldagain take forever and eat up GBs of memory.

Bottom line: I think it has to do with the way Octave allocates memoryto cells, which is not very efficient (as opposed to dense or sparsenumerical data, which it handles very well).


I managed to solve the problem I had, thanks to the help of you guys.

However, I think it would probably be nice if in future versions ofOctave there was something akin to ulimit installed by default toprevent a process from eating up all available memory.


If someone wants to check this issue the data I am working with is public:

http://www.bls.gov/cew/data/files/*/csv/*_annual_singlefile.zip

where * = 1990:2013

http://www.bls.gov/cew/datatoc.htm explains the content.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Import large field-delimited file with strings and numbers, (continued)
- Re: Import large field-delimited file with strings and numbers, Ben Abbott, 2014/09/06
- Re: Import large field-delimited file with strings and numbers, Philip Nienhuis, 2014/09/06
  - Re: Import large field-delimited file with strings and numbers, João Rodrigues, 2014/09/06
    - Re: Import large field-delimited file with strings and numbers, Philip Nienhuis, 2014/09/08
    - Re: Import large field-delimited file with strings and numbers, João Rodrigues <=
    - Re: Import large field-delimited file with strings and numbers, Markus Bergholz, 2014/09/08
    - Re: Import large field-delimited file with strings and numbers, Markus Bergholz, 2014/09/08
    - Re: Import large field-delimited file with strings and numbers, Joao Rodrigues, 2014/09/08
    - Re: Import large field-delimited file with strings and numbers, Markus Bergholz, 2014/09/08
    - Re: Import large field-delimited file with strings and numbers, Markus Bergholz, 2014/09/08
    - Re: Import large field-delimited file with strings and numbers, Philip Nienhuis, 2014/09/11
    - Re: Import large field-delimited file with strings and numbers, Philip Nienhuis, 2014/09/10
- Re: Import large field-delimited file with strings and numbers, CdeMills, 2014/09/08
  - Re: Import large field-delimited file with strings and numbers, João Rodrigues, 2014/09/08
    - Re: Import large field-delimited file with strings and numbers, CdeMills, 2014/09/09

Prev by Date: Re: Import large field-delimited file with strings and numbers
Next by Date: Re: Import large field-delimited file with strings and numbers
Previous by thread: Re: Import large field-delimited file with strings and numbers
Next by thread: Re: Import large field-delimited file with strings and numbers
Index(es):
- Date
- Thread