Re: [RFC 0/8] Introduce an extensible static analyzer

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC 0/8] Introduce an extensible static analyzer

From:	Alberto Faria
Subject:	Re: [RFC 0/8] Introduce an extensible static analyzer
Date:	Wed, 6 Jul 2022 10:54:51 +0100

On Tue, Jul 5, 2022 at 5:12 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Tue, Jul 05, 2022 at 12:28:55PM +0100, Alberto Faria wrote:
> > On Tue, Jul 5, 2022 at 8:16 AM Daniel P. Berrangé <berrange@redhat.com> 
> > wrote:
> > >      for i in `git ls-tree --name-only -r HEAD:`
> > >      do
> > >         clang-tidy $i 1>/dev/null 2>&1
> > >      done
> >
> > All of those invocations are probably failing quickly due to missing
> > includes and other problems, since the location of the compilation
> > database and some other arguments haven't been specified.
>
> Opps yes, I was waaaay too minimalist in testing that.
>
> >
> > Accounting for those problems (and enabling just one random C++ check):
> >
> >     $ time clang-tidy -p build \
> >         --extra-arg-before=-Wno-unknown-warning-option \
> >         --extra-arg='-isystem [...]' \
> >         --checks='-*,clang-analyzer-cplusplus.Move' \
> >         $( find block -name '*.c' )
> >     [...]
> >
> >     real    3m0.260s
> >     user    2m58.041s
> >     sys     0m1.467s
>
> Only analysing the block tree, but if we consider a static analysis
> framework is desirable to use for whole of qemu.git, lets see the
> numbers for everything.
>
> What follows was done on  my P1 Gen2 thinkpad with 6 core / 12 threads,
> where I use 'make -j 12' normally.
>
> First as a benchmark, lets see a full compile of whole of QEMU (with
> GCC) on Fedora 36 x86_64
>
>     => 14 minutes
>
>
> I find this waaaaay too slow though, so I typically configure QEMU with
> --target-list=x86_64-softmmu since that suffices 90% of the time.
>
>    => 2 minutes
>
>
> A 'make check' on this x86_64-only build takes another 2 minutes.
>
>
> Now, a static analysis baseline across the whole tree with default
> tests enabled
>
>  $ clang-tidy --quiet -p build $(git ls-tree -r --name-only HEAD: | grep 
> '\.c$')
>
>   => 45 minutes
>
> wow, wasn't expecting it to be that slow !
>
> Lets restrict to just the block/ dir
>
>  $ clang-tidy --quiet -p build $(find block -name '*.c')
>
>   => 4 minutes
>
> And further restrict to just 1 simple check
>
>  $ clang-tidy --quiet   --checks='-*,clang-analyzer-cplusplus.Move'  -p build 
> $(find block -name '*.c')
>   => 2 minutes 30
>
>
> So extrapolated just that single check would probably work out
> at 30 mins for the whole tree.
>
> Overall this isn't cheap, and in the same order of magnitude
> as a full compile. I guess this shouldn't be that surprising
> really.
>
>
>
> > Single-threaded static-analyzer.py without any checks:
> >
> >     $ time ./static-analyzer.py build block -j 1
> >     Analyzed 79 translation units in 16.0 seconds.
> >
> >     real    0m16.665s
> >     user    0m15.967s
> >     sys     0m0.604s
> >
> > And with just the 'return-value-never-used' check enabled for a
> > somewhat fairer comparison:
> >
> >     $ time ./static-analyzer.py build block -j 1 \
> >         -c return-value-never-used
> >     Analyzed 79 translation units in 61.5 seconds.
> >
> >     real    1m2.080s
> >     user    1m1.372s
> >     sys     0m0.513s
> >
> > Which is good news!

(Well, good news for the Python libclang approach vs others like
clang-tidy plugins; bad news in absolute terms.)

>
> On my machine, a whole tree analysis allowing parallel execution
> (iiuc no -j arg means use all cores):
>
>   ./static-analyzer.py build  $(git ls-tree -r --name-only HEAD: | grep '\.c$'
>
>    => 13 minutes
>
> Or just the block layer
>
>   ./static-analyzer.py build  $(find block -name '*.c')
>
>    => 45 seconds
>
>
> So your script is faster than clang-tidy, which suggests python probably
> isn't the major dominating factor in speed, at least at this point in
> time.
>
>
> Still, a full tree analysis time of 13 minutes, compared to  my normal
> 'make' time of 2 minutes is an order of magnitude.

There goes my 10% overhead target...

>
>
> One thing that I noticed when doing this is that we can only really
> do static analysis of files that we can actually compile on the host.
> IOW, if on a Linux host, we don't get static analysis of code that
> is targetted at FreeBSD / Windows platforms. Obvious really, since
> libclang has to do a full parse and this will lack header files
> for those platforms. That's just the tradeoff you have to accept
> when using a compiler for static analysis, vs a tool that does
> "dumb" string based regex matching to detect mistakes. Both kinds
> of tools likely have their place for different tasks.

Right, I don't think there's anything reasonable we can do about this
limitation.

>
>
> Overall I think a libclang based analysis tool will be useful, but
> I can't see us enabling it as a standard part of 'make check'
> given the time penalty.
>
>
> Feels like something that'll have to be opt-in to a large degree
> for regular contributors. In terms of gating CI though, it is less
> of an issue, since we massively parallelize jobs. As long as we
> have a dedicated build job just for running this static analysis
> check in isolation, and NOT as 'make check' in all existing jobs,
> it can happen in parallel with all the other build jobs, and we
> won't notice the speed.
>
> In summary, I think this approach is viable despite the speed
> penalty provided we dont wire it into 'make check' by default.

Agreed. Thanks for gathering these numbers.

Making the script use build dependency information, to avoid
re-analyzing translation units that weren't modified since the last
analysis, should make it fast enough to be usable iteratively during
development. Header precompilation could also be worth looking into.
Doing that + running a full analysis in CI should be good enough.

Alberto

>
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [RFC 5/8] static-analyzer: Enforce coroutine_fn restrictions on function pointers, (continued)
- [RFC 6/8] Fix some coroutine_fn indirect calls and pointer assignments, Alberto Faria, 2022/07/02
- [RFC 7/8] block: Add no_coroutine_fn marker, Alberto Faria, 2022/07/02
- [RFC 8/8] Avoid calls from coroutine_fn to no_coroutine_fn, Alberto Faria, 2022/07/02
- Re: [RFC 0/8] Introduce an extensible static analyzer, Paolo Bonzini, 2022/07/02
- Re: [RFC 0/8] Introduce an extensible static analyzer, Daniel P . Berrangé, 2022/07/04
  - Re: [RFC 0/8] Introduce an extensible static analyzer, Alberto Faria, 2022/07/04
    - Re: [RFC 0/8] Introduce an extensible static analyzer, Daniel P . Berrangé, 2022/07/05
    - Re: [RFC 0/8] Introduce an extensible static analyzer, Alberto Faria, 2022/07/05
    - Re: [RFC 0/8] Introduce an extensible static analyzer, Daniel P . Berrangé, 2022/07/05
    - Re: [RFC 0/8] Introduce an extensible static analyzer, Alberto Faria <=
    - Re: [RFC 0/8] Introduce an extensible static analyzer, Daniel P . Berrangé, 2022/07/06
    - Re: [RFC 0/8] Introduce an extensible static analyzer, Alberto Faria, 2022/07/08
- Re: [RFC 0/8] Introduce an extensible static analyzer, Daniel P . Berrangé, 2022/07/05
  - Re: [RFC 0/8] Introduce an extensible static analyzer, Alberto Faria, 2022/07/06

Prev by Date: Re: [PATCH v8 06/20] job.h: define functions called without job lock held
Next by Date: [PATCH v2 0/3] Fix some coverity issues on VDUSE
Previous by thread: Re: [RFC 0/8] Introduce an extensible static analyzer
Next by thread: Re: [RFC 0/8] Introduce an extensible static analyzer
Index(es):
- Date
- Thread