|
From: | Joao Rodrigues |
Subject: | Re: How do you select only specific rows based on the values in a specific column? |
Date: | Sun, 26 Oct 2014 09:40:27 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 |
On 10/26/2014 02:29 AM, Thompson, Robert M - (rmt1) wrote:
I maybe mistaken, but I don't think what you want to do is possible: either you import the whole lot and then let Octave parse the content, which will be fast but you have to import everything. Or you import one row at a time C-style (fopen, fscanf, fclose) and test it, which has no memory overload but is very slow.I have a huge source file of a million lines, like: (cartographic data) 0.015625 89.996094 0.018000 0.046875 89.996094 0.018000 0.078125 89.996094 0.018000 I was using C to pare the source file down into a smaller file based on values in first and second column. The evaluation was like, e.g., keep this row if column 1 is greater than 0.20000 and column 2 is >= 89.00000. But now I want to cut out the C middleman and import the million-line source file directly into Octave. But also select only the rows with first or second columns matching criteria, before I consume great amounts of memory on records I will not be using.
If what you have is a million rows, I would go for the first option. C-style reading is only worth it if the file is small. Octave has many import functions, each suitable to particular context. If what you have is a file that only has numerical data and is in ascii then I would first try
tic a = dlmread(XYZ); tocIf it takes a lot of time, then try breaking the original file into chunks and import each at a time or other import functions (check the io package, I found out that csv2cell was amazingly fast).
After a is loaded into octave, then use Doug's suggestion to truncate the desired rows.
[Prev in Thread] | Current Thread | [Next in Thread] |