sks-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sks-devel] SKS RAM usage gone haywire


From: Yaron Minsky
Subject: Re: [Sks-devel] SKS RAM usage gone haywire
Date: Sun, 15 Feb 2009 10:05:19 -0500

Ari is right: there's nothing inherent about the algorithm that should require an ever-growing use of memory.  OCaml itself is very careful about reclaiming unreferenced memory, but that of course does not preclude a memory leak in the code.

So far, I have no real clue as to what is going wrong.  I could imagine that the caching at some level is overly aggressive.  There are a number of configuration variables that control how much caching there is.  Some of these are explicit caching numbers that are used by the actual DB, and some if it is caching that the prefix-tree datastructure does on its own.  For instance, there is a bound (defaulted to 1000) on the number of in-memory nodes of the prefix-tree.

The idea that some weird query or a server in an unusual state is exercising some bug that blows up the memory utilization seems possible as well.

Has anyone confirmed if it's the db or recon process that is blowing up in memory?  That would help figure out what's going on.  For instance, it's pretty unlikely that a query from a web-crawler would cause the recon process to explode in size.

y

On Sat, Feb 14, 2009 at 7:14 PM, Ari Trachtenberg <address@hidden> wrote:
>From memory, the theory predicts that recon running time and memory
should grow roughly linearly with delta; only interaction should
increase.  In practice, everything depends on the actual implementation
and how careful OCAML is in reclaiming unused memory.  Maybe Yaron can chime in.
       -Ari

Phil Pennock wrote:
On 2009-02-14 at 17:45 -0500, Daniel Kahn Gillmor wrote:
Maybe someone who knows the source and/or is proficient with the use of
valgrind could assess whether sks recon is actually leaky?

I had been running without *noticing* any increase for some time and am
inclined to believe that it's a change in observed behaviour.

I saw recon size go to 3GB again, but the RSS was only 11MB, so not so
painful.  Thus I'm inclined to think that most of this is DB backing
(/pending/sks/PTree/ptree mmap'ing) and therefore mostly not paged in
and harmless.  So, what has changed the working set?

In trying to visit my peers' stats pages, one has no data (DB recent
restart) and one has ... 25503 keys.  However, I added that peer in
November, shortly after I myself set up my server.  So unless bazon.ru
only recently lost its keys, that looks less likely.

I begin to wonder if recon is sub-optimal with a large delta of keys to
send and also to wonder if I should bump "learn to read OCaml" up my
priority list -- I'm managing to navigate the sks source faster already,
but I'm still mostly in the dark.

I'm fairly sure that the only other recentish change in my setup is
innocent; I set up db_recover to run weekly, but that's on a Saturday
and since I didn't set $PATH to include the tools, automatic runs
wouldn't work until I fixed that today so it has only happened the first
time when I wrote the wrapper script and I restarted the DB server
shortly thereafter anyway because I'd played with sks dump before
discovering that it couldn't be done online.

-Phil


------------------------------------------------------------------------


_______________________________________________
Sks-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/sks-devel


_______________________________________________
Sks-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/sks-devel


reply via email to

[Prev in Thread] Current Thread [Next in Thread]