lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev Re: [PATCH 2.8.5-dev14] *Really* large tables


From: Ilya Zakharevich
Subject: lynx-dev Re: [PATCH 2.8.5-dev14] *Really* large tables
Date: Thu, 10 Apr 2003 00:24:38 -0700
User-agent: Mutt/1.4i

Leonid Pauzner asks:

   I think a main footprint is not due to (re)allocation of memory
   but copying it from one place to another.
   Just a guess. Tables have two kind of memory:

   "array of cells + something" (==row), and
   "array of pointers to rows"

   Rows can be of different size (usually <<80), we need not realloc rows:
   we can work nicely with a pair (array of cells, num of cells in this array).
   And cell is a fixed-size record.

   "Array of pointers to rows" can be of arbitrary size, 500000 entries in your
   example. Instead of reallocating this huge array we can add another level of
   indirection: split this array into chunks (pools) of a fixed size, and
   maintain a small array of pointers to that chunks (VM usually does this:).

   Everything except the small array may utilize HTPool
   (say, HText->Stbl->pool, a parse-time temporary pool).

   Does this naive picture have something with the reality?

With the supplied file 

  http://ilyaz.org/software/tmp/table_2col_bold_it_500000.html.gz

the footprint on my system (EMX malloc(), slowish, but quite dense) is
107M.  Same result with my malloc (aka the Perl one) - known for
extreme efficiency (as usual, "at least in some situations" ;-).  So
it looks "as good as it gets".  Let us break it down:

One gets the footprint 80 bytes per line if the same text is not put
into a table.  This is 40M.

To process the table: eventually we need a large array of "row
descriptors".  Each one takes 36byte.  The minimal footprint for this
is 18M.  The algorithm used (reallocation with size increased by 50%
each time) may live a "trail" of reallocated chunks 2 times as big -
so the maximal footprint is 56M (but see below).

The cells take circa 20byte per cell.  The used algorithm allocateds
them in several arenas, so there very little overhead.  This gives
extra 20M.

So far, so good.  So the footprint to process the table is:

  a) to set the text: 40M

  b) to store the temporary table structures: 38M..76M;

  c) to reset the text with new "indentations": ???

My experiments show that the overhead of the "trail" should pretty
small.  With -DREUSE_ROWS_AS_CELLS_POOLS instead of living the trail,
the old arrays of rows-data are reused as pools to store cell data.
This does not lead to any savings (even the opposite), so reuse of the
chunks of memory left after realloc() should be very good.

So: I would expect that the memory usage before step "b" should be
close to 80M (the minimum with the current data layout).  My
conjecture is that it is the step "c" which wastes the remaining memory.

Another data point: with 1 column (so there is no step c) of non-bold
non-italic data the memory footprint is 80M (with the expected minimum
close to 70M).

Hope this helps,
Ilya

; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]