[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] [PATCH] Add a statistical profiler

From: Andy Bennett
Subject: Re: [Chicken-hackers] [PATCH] Add a statistical profiler
Date: Mon, 04 Jan 2016 17:03:36 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.8.0


This is excellent and just what I need.

Thanks Peter!

> For a while now I've been meaning to improve our profiler.  I finally
> got around to writing a new profiler a few weeks ago and I've been
> testing it on and off since then.  I think now it's ready to start
> integrating it.  I've successfully made some performance improvements in
> uri-generic and intarweb.  The irregex performance improvement I sent to
> chicken-hackers was also a result from running it with the new profiler.
> I believe the profiler needs improving because of these limitations:
> * Instrumentation adds a whole lot of overhead and prevents various
>    optimizations due to requiring dynamic-wind and turning all calls
>    to such procedures into a CPS call.  This skews the results,
>    sometimes so extremely to make the profile output useless.
>    If you want to see an example of this, try profiling the "nbody"
>    benchmark from Mario's chicken-benchmarks repo.  It goes from
>    1s to 17s on my laptop.  Other examples include "knucleotide",
>    "primes" and "ray2".
> * Only toplevel procedures are instrumented.  This means that programs
>    with large procedures or programs which rely heavily on closures will
>    be hard to profile.  It's possible to work around this by tweaking
>    the source code, but this is very annoying to do.
> * Checking performance of a program requires recompilation with
>    -profile.  This is only a minor annoyance, but sometimes you
>    want to look "deeper" inside the library calls that your program
>    performs, and that means you'll have to recompile those libraries
>    with -profile as well (and possibly libraries they use, and so on),
>    which gets annoying pretty quickly.  Also, you can easily forget you
>    did that for a library, and this will slow down all programs using
>    this library.
> The attached patches add a *second* profiler to CHICKEN, which uses
> statistical sampling to figure out where a program spends most of its
> time.  It registers a timer with setitimer().  This will trigger a
> signal to be delivered every 10ms (this is customisable via "-:P").
> When a signal arrives, we look at the top of the trace buffer to
> find the currently running procedure.  This means it needs *no*
> additional instrumentation or even recompilation: libraries or
> programs which are compiled without -no-trace or -d0 (i.e., most
> of them) are instantly profileable!
> This statistical sampling method is basically what gprof(1) and profil(3)
> also do, from what I understand of their documentation.
> To see it in action, just compile the target program as normal, and then
> when running it, pass it the "-:p" switch.  This is recognised by the
> runtime system and therefore any program can be passed this switch.
> On exit, it will write out a "PROFILE.1234" file, where 1234 is the pid,
> just like under the instrumentation-based profiler.  This can only be
> read with the new version of chicken-profile, which I've tweaked a tiny
> bit to understand the new profile format, which has a header now.
> The tweak to chicken-profile is necessary because the statistical
> profiler has very little context to work with: it only sees the top
> of the trace buffer, so it knows what procedure is currently running.
> Because it takes a sample every 10 ms, it can count how much time
> is spent in that procedure, but it doesn't really know how it got there,
> as it can't carefully track procedure entry/exit events like the
> instrumentation does under the old profiler.
> So, we know how much time is spent in a procedure, we can only calculate
> the percentage of the entire running time spent, individually per
> procedure.  This means the percentages add up to 100, which is different
> from the original profiler.  There, the percentage will be 100 for the
> "main" procedure, and less for everything called by it.  Because of this,
> both profilers now write a "header" symbol to the start of the profile,
> which chicken-profile recognises and will change its behaviour on.
> The new profiler has none of the disadvantages of the old profiler, but
> comes with its own set of caveats:
> * If a program blocks signals, profiling is unreliable.  I got a report
>    from Kooda that SDL's vsync() seems to do this, so this is not always
>    directly under the programmer's control.
> * The profile results are very different: it shows how much time is
>    spent in the procedure that was called last, making it very low-level.
>    This can sometimes result in a harder to read profile, with lots of
>    details.
>    The old profiler would simply state "procedure X is slow", while the
>    new will say "call (A) on line 1 is slowest followed by call (B) on
>    line 1507 and call (C) on line 5", even though lines 1 and 5 are both
>    part of procedure X.  Ideally you would want some way to sum the
>    calls belonging to the same procedure.  On the other hand, it can
>    also be useful to know what each *part* of a procedure contributes
>    to the profile.
> * Due to the statistical approach, the call counts can be *completely*
>    wrong.  If you have very fast procedures which are called extremely
>    often, the number of calls will be way too low because it has missed
>    several calls in between two samples.
> Finally, I should mention two more caveats, but I don't think these are
> huge problems:
> * Due to the statistical approach, the time spent can be over- or under-
>    allocated.  We simply allocate the 10ms to the currently running
>    procedure at the time we take the sample.  This should in general not
>    be a problem because we take so many samples that it will average out:
>    In the long run, small, fast procedures should not be running too often
>    while a sample is taken.
> * The code is all in C, this is required to reduce the impact on
>    performance as much as possible, and to avoid having to recompile.
>    Also, code running with interrupts disabled would not receive the
>    signal if we processed the signal in Scheme.  Finally, the profiler
>    might always see itself running when taking a sample if we did it in
>    Scheme!
> Currently, profiling only works for compiled programs, but I think with
> some more or less invasive changes it may be possible to even profile
> programs running under the interpreter.  This could not be made to work
> with the old profiler, as far as I can see.
> Given the above laundry list of caveats and the elegant approach of
> the old profiler (all in Scheme, very little code, precise and more
> high-level results), I think we should not be removing that one just
> yet.  In fact, I think we could even merge some of these parts of the
> code, which should also remove some of the overhead of the old profiler,
> reducing some of its problems.
> I'm sure the above text sounds very rambling, so please feel free to
> ask if you have questions.  I'm also preparing a blog post that
> describes the new profiler in a more structured and easy-going way.
> There's a lot more we could potentially do to improve either profiler,
> but let's get started by applying this first.
> Cheers,
> Peter
> _______________________________________________
> Chicken-hackers mailing list
> address@hidden


0290 DA75 E982 7D99 A51F  E46A 387A 7695 7EBA 75FF

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]