[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Discrepancy between actual RAM usage by Octave and "whos" output
From: |
Andrew Janke |
Subject: |
Re: Discrepancy between actual RAM usage by Octave and "whos" output |
Date: |
Tue, 30 Oct 2018 16:51:31 -0400 |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
On 10/30/18 3:38 PM, Mike Miller wrote:
On Sun, Oct 28, 2018 at 11:05:44 -0500, PhilipNienhuis wrote:
What is the reason that actual RAM usage is so much larger than suggested by
"whos"? Where does the overhead come from?
Cell arrays have a lot of memory overhead. Compare the memory used by
Octave instantiating
A = rand (1e7, 10);
with the memory used for
A = num2cell (rand (1e7, 10));
On my system, I see about 850 MB used for the first, 3.9 GB for the
second.
A corrollary is that esp. for unwary users, "whos" actually gives deceiving
results.
Yes, especially if you are using cell arrays with a large number of
cells.
Digging a little deeper: If Octave works like Matlab (and I think it
does), the "Bytes" reported by "whos" reflect only the memory used by
the raw primitive values (i.e. doubles, ints, chars) inside the arrays,
and not any of the overhead in Octave's internal array-management data
structures. Cells have high overhead because each individual cell
element contains an entire Octave array. (Same for objects that are not
coded as planar-organized.)
You can kind of see this by doing a "whos" on empty compound data types,
which all report as 0 bytes, when they clearly are using some memory.
octave:1> a_struct = struct;
octave:2> a_cell = {};
octave:3> an_object = containers.Map;
octave:4> whos
Variables in the current scope:
Attr Name Size Bytes Class
==== ==== ==== ===== =====
a_cell 0x0 0 cell
a_struct 1x1 0 struct
an_object 1x1 0 containers.Map
Conversely, because it doesn't check for memory shared between arrays
via CoW, whos() can also over-report memory usage for cells and other
compound types. (The memory arrangement is a directed acyclic graph, and
the memory-counting algorithm isn't checking for already-visited nodes.)
But this is less likely in practice.
octave:1> x = rand(1e7,1);
octave:2> cx = { x x x x x x x };
octave:3> ccx = repmat( { x }, [1e5 1]);
octave:4> cccx = repmat( { ccx }, [1e3 1]);
octave:6> ccccx = repmat( { cccx }, [1e3 1]);
octave:7> whos
Variables in the current scope:
Attr Name Size Bytes Class
==== ==== ==== ===== =====
ccccx 1000x1 8000000000000000000 cell
cccx 1000x1 8000000000000000 cell
ccx 100000x1 8000000000000 cell
cx 1x7 560000000 cell
x 10000000x1 80000000 double
Actual memory usage here is about 100 MB.
You can exploit this behavior to do a form of low-rent compression on
low-cardinality cellstr or struct-organized object arrays:
function out = canonicalize(x)
[ux,~,Jndx] = unique(x);
out = reshape(ux(Jndx), size(x));
end
After a "foo = canonicalize(foo)", foo will contain the same values, but
have only one copy of each distinct value in memory.
Cheers,
Andrew