qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC 0/8] Introduce an extensible static analyzer


From: Alberto Faria
Subject: Re: [RFC 0/8] Introduce an extensible static analyzer
Date: Mon, 4 Jul 2022 20:30:08 +0100

On Mon, Jul 4, 2022 at 5:28 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> Have you done any measurement see how much of the overhead is from
> the checks you implemented, vs how much is inherantly forced on us
> by libclang ? ie what does it look like if you just load the libclang
> framework and run it actross all source files, without doing any
> checks in python.

Running the script with all checks disabled, i.e., doing nothing after
TranslationUnit.from_source():

    $ time ./static-analyzer.py build
    [...]
    Analyzed 8775 translation units in 274.0 seconds.

    real    4m34.537s
    user    49m32.555s
    sys     1m18.731s

    $ time ./static-analyzer.py build block util
    Analyzed 162 translation units in 4.2 seconds.

    real    0m4.804s
    user    0m40.389s
    sys     0m1.690s

This is still with 12 threads on a 12-hardware thread laptop, but
scalability is near perfect. (The time reported by the script doesn't
include loading and inspection of the compilation database.)

So, not great. What's more, TranslationUnit.from_source() delegates
all work to clang_parseTranslationUnit(), so I suspect C libclang
wouldn't do much better.

And with all checks enabled:

    $ time ./static-analyzer.py build block util
    [...]
    Analyzed 162 translation units in 86.4 seconds.

    real    1m26.999s
    user    14m51.163s
    sys     0m2.205s

Yikes. Also not great at all, although the current implementation does
many inefficient things, like redundant AST traversals. Cutting
through some of clang.cindex's abstractions should also help, e.g.,
using libclang's visitor API properly instead of calling
clang_visitChildren() for every get_children().

Perhaps we should set a target for how slow we can tolerate this thing
to be, as a percentage of total build time, and determine if the
libclang approach is viable. I'm thinking maybe 10%?

> If i run 'clang-tidy' across the entire source tree, it takes 3 minutes
> on my machine, but there's overhead of repeatedly starting the process
> in there.

Is that parallelized in some way? It seems strange that clang-tidy
would be so much faster than libclang.

> I think anything that actually fully parses the source files is going
> to have a decent sized unavoidable overhead, when run across the whole
> source tree.
>
> Still having a properly parsed abstract source tree is an inherantly
> useful thing. for certain types of static analysis check. Some things
> get unreliable real fast if you try to anlyse using regex matches and
> similar approaches that are the common alternative. So libclang is
> understandably appealing in this respect.
>
> > The script takes a path to the build directory, and any number of paths
> > to directories or files to analyze. Example run on a 12-thread laptop:
> >
> >     $ time ./static-analyzer.py build block
> >     block/commit.c:525:15: non-coroutine_fn function calls coroutine_fn
> >     block/nbd.c:206:5: non-coroutine_fn function calls coroutine_fn
> >     [...]
> >     block/ssh.c:1167:13: non-coroutine_fn function calls coroutine_fn
> >     block/nfs.c:229:27: non-coroutine_fn function calls coroutine_fn
> >     Analyzed 79 translation units.
> >
> >     real    0m45.277s
> >     user    7m55.496s
> >     sys     0m1.445s
>
> Ouch, that is incredibly slow :-(

Yup :-(

Alberto




reply via email to

[Prev in Thread] Current Thread [Next in Thread]