monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

HOWTO: benchmarking monotone (was Re: [Monotone-devel] "memory exhausted


From: Nathaniel Smith
Subject: HOWTO: benchmarking monotone (was Re: [Monotone-devel] "memory exhausted" error for 'mtn list status' command)
Date: Fri, 28 Jul 2006 02:16:24 -0700
User-agent: Mutt/1.5.12-2006-07-14

On Wed, Jul 26, 2006 at 07:09:23PM -0700, Drakie Awita wrote:
> Hi,
> 
> I've been a happy mtn user for a while, and the upgrade to v0.27 is ok
> except for one problem: if I have some large amount of un-versioned
> files/sub-directories (thousands of them) under my versioned project
> tree, then 'mtn list unknow' command errors out with "memory
> exhausted" message:
> 
> # mtn list unknown
> mtn: fatal: std::runtime_error: Memory exhausted
> mtn:
> mtn: this is almost certainly a bug in monotone.
> mtn: please send this error message, the output of 'mtn --full-version',
> mtn: and a description of what you were doing to address@hidden
> mtn: wrote debugging log to ....../_MTN/debug
> mtn: if reporting a bug, please include this file

The benchmark framework I've been working on -- called
"mtn_benchmark", for boring reasons that probably don't make sense to
anyone but me -- is reaching the point where it's ready for real use,
so I thought maybe I'd take this bug report and do a worked example on
how to use the new framework to attack it.

First, we have to get the framework itself.  It's in branch
.contrib.benchmark, so if you pull nvm* you already have it; if not,
  $ mtn pull venge.net net.venge.monotone.contrib.benchmark*
and then in any case,
  $ mtn co -r h:net.venge.monotone.contrib.benchmark benchmark
to get the code into the 'benchmark' dir.

There are some C helper programs:
  $ cd benchmark
  $ make
The latter will want to use 'sudo', so it can build a suid binary that
knows how to clear your OS disk caches (only on linux 2.6.16+, so
far).

You also need to make sure you have python 2.4 (for the 'subprocess'
module), and a compatible version of SciPy (for its binomial variate
generation function -- stock python doesn't come with a way to sample
from a binomial distribution, and writing one by hand is annoying).
You can check for SciPy with 'python2.4 -c "import scipy"'; if it
doesn't error out, you have scipy.

Now, we want to investigate 'ls unknown' in a directory with lots of
unknown files.  One could imagine all sorts of parameters that might
matter -- does it matter how many known files are present?  What about
ignore settings, how do they affect things?  To keep it simple,
though, we're going to just look at a fresh workspace containing only
unknown files -- basically what you'd have immediately after running
'setup' in some tree you were going to import.

So we want to add a benchmark that sets up such a directory, then
measures the memory usage when running 'ls unknown'.  No such
benchmark exists, so we get to write one.  (Well, it does now, since I
committed it while writing this, but you can pretend, or add an 'ls
ignored' benchmark or something.)  The way to do this is to add a
benchmark class to the mtn_benchmark/benchmarks.py file.

A benchmark class is just a python class that has a 'run' method.
This method takes one argument, generally called 'vcs' -- this is an
object that represents that system being benchmarked.  To run
monotone, we call methods on the vcs object.  (In principle, we could
end up running some other VCS's commands, if someone implemented this
interface for that system.)  See mtn_benchmark/mtn.py for the full
list of methods we can use.

Benchmark objects can also have another method called 'setup', which
also takes a single 'vcs' argument.  The difference between 'setup'
and 'run' is that anything you do run 'run' will automatically be
instrumented, while things done in 'setup' will not be.  Every
benchmark gets its own scratch directory to work in; 'setup' writes
whatever is needed to this directory, then 'run' uses it to run
whatever commands are actually of interest.  This scratch directory is
always the process working directory when 'run' or 'setup' are called.

As an example, let's look at the InitialPull benchmark.  This
benchmark measures doing a 'pull' into an empty database.  The code
looks like:

class InitialPull(object):
    def __init__(self, repo):
        self.requirements = [repo]
        self.repo = repo

    def setup(self, vcs):
        vcs.init_repo("target")

    def run(self, vcs):
        vcs.pull(self.repo.repo_path(), "target")

Aside from the 'setup' and 'run' methods mentioned earlier, this also
has a standard python constructor, called '__init__'.  This is so we
can adapt this benchmark to different situations -- for instance, we
might want to test how long it takes to pull a particular real-world
database, or databases of different sizes, etc.  Instead of
hard-coding some particular database in the benchmark, we require that
when the user creates an InitialPull benchmark, they provide us with a
repo to use.  This is an object meeting the interface described in
mtn_benchmark/repo.py.  The __init__ method stashes the repo object
for later, and also adds it to its "requirements" list.  This is
because the repo will need to do some setup itself, and by adding it
to the 'requirements' list, we ensure that its setup method will be
called before ours is.

That means that in our own 'setup' method, we don't have to do
anything with the source repo, we just have to create the empty db we
will be pulling into.  ("target" is the filename in the benchmark
scratch directory.)

Finally, in the run method, we ask the vcs object to do a pull from
the provided repo, into our new empty repo.

Now, how do we actually run this benchmark?  Let's look at the
command-line, imaginatively called "benchmark.py" in the root of your
checkout.  An example command line might look like:

  $ python2.4 benchmark.py \
    -m mtn=/usr/bin/mtn \
    -b pull="InitialPull(SimpleRandomRepo(num_files=100, num_revisions=100))"
    --cache mycache
    myscratch myresults

One line at a time, what this says is:
  -- use the python2.4 interpreter to run the benchmark.py script
  -- we're benchmarking the monotone binary at /usr/bin/mtn
     (-m may be repeated arbitrarily many times to benchmark multiple
     binaries, e.g. for before/after testing.  -m takes a "name=path"
     argument; the "name" is anything we want, it's just so we can
     distinguish the results for different binaries.)
  -- we want to run our InitialPull benchmark.  -b also lets you
     specify an arbitrary meaningful name, but after the equals sign
     instead of taking a simple path, it takes arbitrary python code,
     which is evaluated in a special environment that has lots of
     handy stuff in it.  In this case, we are constructing an
     InitialPull object, and for that required 'repo' argument, we're
     passing a 'SimpleRandomRepo(num_files=100, num_revisions=100)'
     object.  This generates a random tree with 100 files with some
     vaguely plausible size distribution and directory layout, checks
     that in to a monotone repo, and then makes some random edits,
     commits those, and then does that another 99 times.  There are
     lots more parameters to tweak; see mtn_benchmark/repo.py and
     mtn_benchmark/random_content.py for details.

     Of course, if we wanted to be traditional and benchmark a pull of
     the monotone repo, we could have just said
       -b nvm-pull=InitialPull(ExistingRepo("path/to/my/nvm.mtn"))
  -- The --cache option specifies a directory to cache large
     files/directory trees in across runs.  Generating a large random
     repository takes quite some time; this lets that work be
     re-used.
  -- Finally, we specify a scratch directory and a results directory.
     The scratch directory will be deleted as we go; use "-d" if you
     want to keep it around to look at.  The results directory will
     contain one subdirectory for each monotone binary / benchmark /
     "instrumenter" triple.  Since we haven't specified anything else,
     the above command will use the "TimingInstrumenter", which uses
     /usr/bin/time to gather time statistics.  These statistics will
     be in comma-separated-value files in the subdirectories under
     your results directory.

So, going back to the 'ls unknown' case, there are three parts to our
benchmark: initialization, setup, and run.  Let's look at them in that
order.

Initialization turns out to be very easy, because benchmarks.py
already contains this handy class, also used by benchmarks for initial
"add" and "commit":

class InitialImportBase(object):
    def __init__(self, tree_changer):
        repo = mtn_benchmark.repo.EmptyRepo()
        self.workspace = mtn_benchmark.workspace.InitialImportWorkspace(repo, 
tree_changer)
        self.requirements = [self.workspace]

Like the InitialPull class above, this defines an '__init__' method,
but in this case it takes a 'tree_changer', instead of a 'repo'.  A
'tree_changer' is an object with a 'change' method, and when that
method is called, it creates a tree from scratch.  (See
mtn_benchmark/change.py.)  We don't really have to worry about that,
though, because there's a class called InitialImportWorkspace that
knows how to use a tree_changer to create an initial workspace.

So, we just inherit from this base class, and our initialization is
taken care of:

class InitialImportLsUnknown(InitialImportBase):

Next, 'setup'.  This is even easier -- we just need the workspace to
be set up, but because we put the workspace object on our
'requirements' list, its setup method will be called automatically,
and there's nothing left for us to do!  So we just don't define a
setup method at all.

Finally, the run method, where we actually run 'ls unknown'.  There's
a small problem, because until now, mtn_benchmark/mtn.py didn't expose
a method to call 'ls unknown'.  So we go add one, which looks like:

    def ls_unknown(self):
        self.instrumenter.run("ls_unknown",
                              self.rundir,
                              self.ep("mtn") + ["ls", "unknown"])

Like most such methods, it is only one line long.  'self.instrumenter'
is an object that knows how to run a process under whatever
instrumentation is currently appropriate.  We want to run this command
synchronously, so we call its 'run' method (don't confuse this with
benchmarks's 'run' methods!).  This method takes 3 arguments -- the
name of the operation being performed, as a string -- this is useful
if a single benchmark runs multiple commands, e.g., the "pull" method
runs both a server and a client.  Because they pass different strings
to the instrumenter, we can keep separate the measurements made of
each.  In this case, though, we just stick in "ls_unknown".  Then, we
pass 'self.rundir', maybe I'll explain that one later... and finally,
the command we want to run.  'self.ep("mtn")' is a convenience method
that adds various options we always want, and then we append to that
basic command line the actual command we want to run.

Now, with that added, we can finish writing our benchmark:

class InitialImportLsUnknown(InitialImportBase):
    def run(self, vcs):
        self.workspace.in_workspace(vcs.ls_unknown)

'in_workspace' is a convenience method that chdir's to the workspace
directory, runs the function it was passed, and then chdir's back out
again.

Now we can run our benchmark:

  $ python2.4 benchmark.py \
    -m mtn=/usr/bin/mtn \
    -b ls_unknown="InitialImportLsUnknown(SimpleRandomTree(num_files=100))"
    --cache mycache
    myscratch myresults

We might also want to specify "-f" here, to force it to clobber the
scratch and/or results directories from before.

So, if we look at myresults/ls_unknown-mtn-time/stats.csv, we'll see some
numbers for user/system/real time.  (The three entries in each row are
from running the benchmark 3 times.)  That's handy for some things,
but the bug report complained about excessive memory usage, and this
isn't very helpful for that!  One thing we can do is try using the
MemTimingInstrumenter, which uses a program called 'memtime' that you
should have built above when you ran 'make'.  It measures the same
things time does, but also uses some tricks to look at memory usage.
To do this, we run the same command, but now with "-i" to specify an
instrumenter:

  $ python2.4 benchmark.py \
    -m mtn=/usr/bin/mtn \
    -b ls_unknown="InitialImportLsUnknown(SimpleRandomTree(num_files=100))"
    --cache mycache
    -i memtime="MemTimingInstrumenter()"  # don't forget the (parens)
    myscratch myresults

And now if we look in myresults/ls_unknown-mtn-memtime/stats.csv we'll
see rather more entries.  They still don't help us track down the
problem, though.  What'd be really nice is if we could run our
benchmark under a memory profiler like massif:
  http://valgrind.org/docs/manual/ms-manual.html
  http://developer.gnome.org/doc/guides/optimisation/Massif.html#fig:before

The way to do that in mtn_benchmark is to define a new Instrumenter.
Well, actually, to define a new InstrumenterObj, since for various
reasons Instrumenter's are actually factories for InstrumenterObj's,
which do the real work.  This is nice and easy (...once you figure out
the links(1) bug that stymies you for a few hours, see IRC logs...).
We open up mtn_benchmark/instrumenters.py and start writing a new
class:

class MassifInstrumenterObj(instrumenter.RecordingInstrumenterObj):

The constructor is pretty boilerplate; like every instrumenter, it
takes the 'record_dir', and passes that on to its base class to worry
about, and also takes an optional argument "depth", because the most
interesting tunable parameter that massif has is called,
coincidentally, --depth.  (It controls how deep to make the stack
traces it gathers.)  Because the massif default of 3 is a little low
for heavily library-using code like monotone, we make it default to 5:

    def __init__(self, record_dir, depth=5):
        super(MassifInstrumenterObj, self).__init__(record_dir)
        self.depth = depth

Then, we do the main thing you do in an instrumenter: provide an
override for 'run_bg'.  This is the method that every process spawn
goes through, and its where we get to sneak in and add our
instrumentation.  The first part of the function simply takes the
command we were given, and munges it into a new command line that will
run the given process under massif with options we control:

    # extra functions that we want massif to treat like 'malloc' (i.e., leave
    # them out of the graph, simply crediting any heap bytes allocated to
    # their caller)
    alloc_fns = ["__gnu_cxx::new_allocator<char>::allocate(unsigned, void 
const*)",
                 "Botan::(anonymous namespace)::do_malloc(unsigned, bool)",
                 "Botan::Malloc_Allocator::alloc_block(unsigned) const",
                 "Botan::Pooling_Allocator::get_block(unsigned) const",
                 "Botan::Pooling_Allocator::allocate(unsigned) const",
                 "sqlite3GenericMalloc",
                 "sqlite3MallocRaw",
                 "sqlite3Malloc",
                 ]

    def run_bg(self, name, scratchdir, cmd):
        alloc_fns_args = ["--alloc-fn=%s" % s for s in self.alloc_fns]
        my_cmd = ["valgrind",
                  "--log-file=%s" % os.path.join(scratchdir, 
"valgrind-chatter"),
                  "--tool=massif", "--format=html",
                  "--depth=%s" % self.depth,] \
                  + alloc_fns_args \
                  + cmd

Of course, we could just pass this off now to the super-class's
implementation of run_bg, but there's a minor issue -- after the
program finishes running, we want to capture massif's output!
So we need to also define a callback for when this process finishes.
We use python's block-scoping to do this as a simple function inside
the current function:

        def hook(pid):
            self.record_file(name + "-valgrind-chatter.txt",
                             os.path.join(scratchdir, "valgrind-chatter.%s" % 
pid))
            self.record_file(name + "-massif.ps",
                             "massif.%s.ps" % pid)
            self.record_file(name + "-massif.html",
                             "massif.%s.html" % pid)

This simply gathers up each file output by massif, and uses
RecordingInstrumenterObj's handy methods to make sure that they end up
with useful names in the appropriate results directory.

Finally, we call our super-class's run_bg to actually spawn the
process, register our hook, and return:

        process = super(MassifInstrumenterObj, self).run_bg(name, scratchdir, 
my_cmd)
        process.hook(hook)
        return process

So, that's it, we have all the code we need to run massif on any
benchmark we have -- except we're missing the factory class that will
let a user actually request this on the command line.  This is also
very simple, again relying heavily on already defined classes:

class MassifInstrumenter(instrumenter.Instrumenter):
    repeats = 1
    objclass = MassifInstrumenterObj

'repeats' gives the default number of times that a benchmark should be
run under a given instrumenter; this is useful for things like the
timing instrumenters, where there is noise in their measurements and
you want to run them multiple times to account for that.  That's not
useful for a profiling instrumenter like this, so we set it to 1.  You
can pass "repeats=<n>" to any instrumenter on the command line to
override this value, though; the framework driver will make <n> passes
through the vcs's under test, in a random order on each pass, to
minimize interference from other load on the machine.

Okay, now let's look at that bug.  I ran:

  $ python2.4 benchmark.py \
    -m mtn=../unopt/mtn \
    -b ls_unknown="InitialImportLsUnknown(SimpleRandomTree(num_files=1000))"
    --cache mycache
    -i massif="MassifInstrumenter()"
    myscratch myresults

And this produced a file
myresults/ls_unknown-mtn-massif/ls_unknown-massif.ps (attached, try
"seascape" mode in gv).  There's lots of interesting stuff in here --
we can see the sawtooth pattern made by the lua VM as it periodically
runs its garbage collector (it's working hard running the ignore
hook), and the solid red band of the ~300kB that Botan allocates at
startup (don't ask me...), but the most interesting part are the angry
orange spikes at the top of the graph attributed to opendir.  These
are the cause of the peak memory usage, and we might ask ourselves,
"why do we have 800kB of opendir-allocated memory live at once?!?".

Switching to myresults/ls_unknown-mtn-massif/ls_unknown-massif.html,
we see that opendir was called by some boost stuff, was called by
the monotone function walk_tree_recursive... which was called by
walk_tree_recursive?  Interesting -- if opendir uses so much memory,
then why are we leaving one directory open while we recurse into its
children?  We'll end up using (tree depth * memory for one dir) heap
at once, for no good reason!

So I flipped over to file_io.cc, and taught it to handle all files in
a directory, release the directory iterator and _then_ recurse to
subdirectories.  Then I re-ran my benchmarks, using both the old
binary and my new binary, and with both massif (to verify the spikes
went away) and memtime (to see how big the change was):

  $ python2.4 benchmark.py \
    -m before=../unopt/mtn-before \
    -m after=../unopt/mtn
    -b ls_unknown="InitialImportLsUnknown(SimpleRandomTree(num_files=1000))"
    --cache mycache
    -i memtime="MemTimingInstrumenter()"
    -i massif="MassifInstrumenter()"
    myscratch myresults

This gave me the same graph as before in ls_unknown-before-massif/,
and a new graph in ls_unknown-after-massif/ls_unknown-massif.ps
(attached).  The new graph looks much better; the peak memory usage is
about 400kB lower, and the spikes are gone, replaced by a gradual
slope (probably simply caused by accumulating the list of files in
memory before outputting them).  If we wanted to be really fancy, we
could easily run this a few times, with different num_files
parameters, to see how we scaled with tree size.

Weirdly, the memtime data doesn't really reflect this.  For "before"
I get:

  ls_unknown-avg-resident-MiB,4.35963439941,4.33722305298,4.40716743469
  ls_unknown-avg-size-MiB,21.4379463196,21.345328331,21.6831884384
  ls_unknown-max-resident-MiB,4.703125,4.69921875,4.70703125
  ls_unknown-max-size-MiB,22.34375,22.34765625,22.34765625
  ls_unknown-num-samples,78,65,75
  ls_unknown-system-time,0.051,0.037,0.044
  ls_unknown-user-time,0.478,0.470,0.477
  ls_unknown-wall-time,0.549,0.519,0.529

And for "after":

  ls_unknown-avg-resident-MiB,3.36142158508,3.18865299225,3.06260299683
  ls_unknown-avg-size-MiB,19.809679985,19.9870185852,16.5047159195
  ls_unknown-max-resident-MiB,4.69140625,4.69140625,4.69140625
  ls_unknown-max-size-MiB,21.83203125,21.83203125,21.83203125
  ls_unknown-num-samples,299,208,93
  ls_unknown-system-time,0.047,0.048,0.036
  ls_unknown-user-time,0.458,0.477,0.461
  ls_unknown-wall-time,2.092,1.467,0.743

If anything, this would suggest that memory usage had _increased_?
That doesn't make much sense to me, and massif seems like the more
trustworthy party here, with its fine-grained deterministic approach.
The numbers also seem weirdly large -- maybe we're measuing how much
of the binary has gotten swapped into memory, for instance?  Eric, any
thoughts?

----

Anyway, though, let's review.  Despite the almost 3000 words to this
point, from reading the bug report we:
  -- wrote a 5 line benchmark (including the 2 lines in mtn.py)
  -- wrote a 40 line instrumenter
Looking at the benchmark on a largish random data set immediately
pointed to the recursion in walk_tree_recursive as a possible culprit,
and after changing that, we could immediately verify that our patch
did help.

The benchmark can now be run automatically at every release, etc., to
watch for regressions; the instrumenter, now that it's written, can be
used to get massif data on any other benchmark we write.

There are a lot of other possibilities here as well -- want a random
repo, but seeded from a real project?  Try
  SimpleRandomRepo(tree_change=TarTree("../my-package.tgz"), num_revisions=100)
Or check out SimpleRandomTree's options for lots of tweakable
parameters affecting average file size, files per directory, directory
nesting depth, ...

The code is still in flux, and everyone's encouraged to hack at it as
they go and discover ways it could be made simpler/better.  I think at
this point it's already a very useful tool, though!

Now that the core is starting to stabilize, there are a lot of useful
things that could be added on.  Some ideas:
  -- a better memtime, maybe using a malloc hook of some kind?
  -- support for oprofile, dtrace, callgrind...
  -- more benchmarks!  what do you want to be faster?
  -- analysis tools -- ways to quickly compare two runs without
     squinting at csv files, or ways to graph data over time...
  -- for that matter, can we run these automatically at every release,
     or every commit, and graph the results over time?

Kinda fun, I think :-).


-- Nathaniel

-- 
"But in Middle-earth, the distinct accusative case disappeared from
the speech of the Noldor (such things happen when you are busy
fighting Orcs, Balrogs, and Dragons)."

This email may be read aloud.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]