chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-hackers] [PATCH] Add a statistical profiler


From: Peter Bex
Subject: [Chicken-hackers] [PATCH] Add a statistical profiler
Date: Sun, 3 Jan 2016 20:20:19 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Hi all,

For a while now I've been meaning to improve our profiler.  I finally
got around to writing a new profiler a few weeks ago and I've been
testing it on and off since then.  I think now it's ready to start
integrating it.  I've successfully made some performance improvements in
uri-generic and intarweb.  The irregex performance improvement I sent to
chicken-hackers was also a result from running it with the new profiler.

I believe the profiler needs improving because of these limitations:

* Instrumentation adds a whole lot of overhead and prevents various
   optimizations due to requiring dynamic-wind and turning all calls
   to such procedures into a CPS call.  This skews the results,
   sometimes so extremely to make the profile output useless.
   If you want to see an example of this, try profiling the "nbody"
   benchmark from Mario's chicken-benchmarks repo.  It goes from
   1s to 17s on my laptop.  Other examples include "knucleotide",
   "primes" and "ray2".
* Only toplevel procedures are instrumented.  This means that programs
   with large procedures or programs which rely heavily on closures will
   be hard to profile.  It's possible to work around this by tweaking
   the source code, but this is very annoying to do.
* Checking performance of a program requires recompilation with
   -profile.  This is only a minor annoyance, but sometimes you
   want to look "deeper" inside the library calls that your program
   performs, and that means you'll have to recompile those libraries
   with -profile as well (and possibly libraries they use, and so on),
   which gets annoying pretty quickly.  Also, you can easily forget you
   did that for a library, and this will slow down all programs using
   this library.

The attached patches add a *second* profiler to CHICKEN, which uses
statistical sampling to figure out where a program spends most of its
time.  It registers a timer with setitimer().  This will trigger a
signal to be delivered every 10ms (this is customisable via "-:P").

When a signal arrives, we look at the top of the trace buffer to
find the currently running procedure.  This means it needs *no*
additional instrumentation or even recompilation: libraries or
programs which are compiled without -no-trace or -d0 (i.e., most
of them) are instantly profileable!

This statistical sampling method is basically what gprof(1) and profil(3)
also do, from what I understand of their documentation.

To see it in action, just compile the target program as normal, and then
when running it, pass it the "-:p" switch.  This is recognised by the
runtime system and therefore any program can be passed this switch.
On exit, it will write out a "PROFILE.1234" file, where 1234 is the pid,
just like under the instrumentation-based profiler.  This can only be
read with the new version of chicken-profile, which I've tweaked a tiny
bit to understand the new profile format, which has a header now.

The tweak to chicken-profile is necessary because the statistical
profiler has very little context to work with: it only sees the top
of the trace buffer, so it knows what procedure is currently running.
Because it takes a sample every 10 ms, it can count how much time
is spent in that procedure, but it doesn't really know how it got there,
as it can't carefully track procedure entry/exit events like the
instrumentation does under the old profiler.

So, we know how much time is spent in a procedure, we can only calculate
the percentage of the entire running time spent, individually per
procedure.  This means the percentages add up to 100, which is different
from the original profiler.  There, the percentage will be 100 for the
"main" procedure, and less for everything called by it.  Because of this,
both profilers now write a "header" symbol to the start of the profile,
which chicken-profile recognises and will change its behaviour on.

The new profiler has none of the disadvantages of the old profiler, but
comes with its own set of caveats:

* If a program blocks signals, profiling is unreliable.  I got a report
   from Kooda that SDL's vsync() seems to do this, so this is not always
   directly under the programmer's control.
* The profile results are very different: it shows how much time is
   spent in the procedure that was called last, making it very low-level.
   This can sometimes result in a harder to read profile, with lots of
   details.
   The old profiler would simply state "procedure X is slow", while the
   new will say "call (A) on line 1 is slowest followed by call (B) on
   line 1507 and call (C) on line 5", even though lines 1 and 5 are both
   part of procedure X.  Ideally you would want some way to sum the
   calls belonging to the same procedure.  On the other hand, it can
   also be useful to know what each *part* of a procedure contributes
   to the profile.
* Due to the statistical approach, the call counts can be *completely*
   wrong.  If you have very fast procedures which are called extremely
   often, the number of calls will be way too low because it has missed
   several calls in between two samples.

Finally, I should mention two more caveats, but I don't think these are
huge problems:

* Due to the statistical approach, the time spent can be over- or under-
   allocated.  We simply allocate the 10ms to the currently running
   procedure at the time we take the sample.  This should in general not
   be a problem because we take so many samples that it will average out:
   In the long run, small, fast procedures should not be running too often
   while a sample is taken.
* The code is all in C, this is required to reduce the impact on
   performance as much as possible, and to avoid having to recompile.
   Also, code running with interrupts disabled would not receive the
   signal if we processed the signal in Scheme.  Finally, the profiler
   might always see itself running when taking a sample if we did it in
   Scheme!

Currently, profiling only works for compiled programs, but I think with
some more or less invasive changes it may be possible to even profile
programs running under the interpreter.  This could not be made to work
with the old profiler, as far as I can see.

Given the above laundry list of caveats and the elegant approach of
the old profiler (all in Scheme, very little code, precise and more
high-level results), I think we should not be removing that one just
yet.  In fact, I think we could even merge some of these parts of the
code, which should also remove some of the overhead of the old profiler,
reducing some of its problems.

I'm sure the above text sounds very rambling, so please feel free to
ask if you have questions.  I'm also preparing a blog post that
describes the new profiler in a more structured and easy-going way.

There's a lot more we could potentially do to improve either profiler,
but let's get started by applying this first.

Cheers,
Peter

Attachment: 0001-Add-simple-statistical-profiler-to-runtime-library.master.patch
Description: Text Data

Attachment: 0002-Support-profiling-on-Windows-with-native-timers.master.patch
Description: Text Data

Attachment: 0003-Fix-statistical-percentage-chicken-profile-output.master.patch
Description: Text Data

Attachment: 0001-Add-simple-statistical-profiler-to-runtime-library.chicken-5.patch
Description: Text Data

Attachment: 0002-Support-profiling-on-Windows-with-native-timers.chicken-5.patch
Description: Text Data

Attachment: 0003-Fix-statistical-percentage-chicken-profile-output.chicken-5.patch
Description: Text Data

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]