|
From: | Ben Abbott |
Subject: | Re: Loading a large and unusually formatted dataset into an Octave matrix |
Date: | Wed, 19 Jun 2013 01:23:59 +0000 (GMT) |
On 06/14/2013 02:04 PM, Elliot Gorokhovsky wrote:
>> > Hello! I am a new octave user and I am trying to predict the price of
>> > bitcoins 15 minutes in advance via neural networks for use on the
>> > website btcoracle.com <http://btcoracle.com> <http://btcoracle.com>.
>> I have about a gigabyte of
>> > data that looks like this:
{"data":1279408157,"price":"0.04951","amount":"20","price_int_":"4951","amount_int":"2000000000","tid":"1","price_currency":"USD","item":"BTC","trade_type":""},
{"data":1279424586,"price":"0.05941","amount":"50.01","price_int_":"5941","amount_int":"5001000000","tid":"2","price_currency":"USD","item":"BTC","trade_type":""},
{"data":1279475336,"price":"0.08080","amount":"5","price_int_":"8080","amount_int":"500000000","tid":"3","price_currency":"USD","item":"BTC","trade_type":""}
I responded, trying a=fscanf(FD,"%d %d")
Eh, make it a=fscanf(FD,"%f %f") of course:
command="perl -F'\"' -lane 'print \"$F[5] $F[9]\"' /tmp/bitcoin";
a=reshape(fscanf(popen(command,'r'),"%f"),2,[]);
-lane is a very useful Perl idiom that splits every line into words into
array F ($F[0], $F[1], etc). -F" changes the word break character to the
double quote. After that, some judicious quoting of special characters
et voila.
As usual, I got bitten by *scanf code failing silently unless the format
string is perfect. Does anyone have good tips on debugging issues like
that? a way to figure out how far into the input did the format string
match?
From a bash prompt, you perl command works as expected.
perl -F'"' -lane 'print "$F[5] $F[9]"' bitcoin.txt
0.04951 20
0.05941 50.01
0.08080 5
The snippet below works for me.
cmd = "perl -F'\"' -lane 'print \"$F[5] $F[9]\"' bitcoin.txt"
unwind_protect
pid = popen (cmd, "r");
while (ischar (s = fgets (pid)))
fputs (stdout, s);
endwhile
unwind_protect_cleanup
pclose (pid);
end_unwind_protect
If you modify this code (sscanf() for fputs()) to load a lot of data this the array(s) will be resized on each sscanf(). That will be inefficient. Something like the code below will be faster.
cmd = "perl -F'\"' -lane 'print \"$F[5] $F[9]\"' bitcoin.txt > data.txt"
[status, output] = system (cmd);
data = "" ("data.txt");
Ben
[Prev in Thread] | Current Thread | [Next in Thread] |