qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 00/29] Tame a few "touch this, recompile the


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH v3 00/29] Tame a few "touch this, recompile the world" headers
Date: Sat, 10 Aug 2019 19:01:41 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux)

Philippe Mathieu-Daudé <address@hidden> writes:

> Hi Markus,
>
> On 8/9/19 8:46 AM, Markus Armbruster wrote:
>> We have quite a few "touch this, recompile the world" headers.  My
>> "build everything" tree has some 6600 objects (not counting tests and
>> objects that don't depend on qemu/osdep.h).  Touching any of 54
>> headers triggers a recompile of more than half of them.
>> 
>> This series reduces them to 46.
[...]
>> Observed patterns of #include misuse:
>> 
>> * Copy pasta
>> 
>>   I found and deleted quite a few #include that were almost certainly
>>   never needed.  The most likely explanation is lazy copying from a
>>   "similar" file.  My deletions produced only minor improvements,
>>   though.
>> 
>> * "Convenience" headers
>> 
>>   We sometimes have a header include a bunch of other headers the
>>   header itself doesn't need, so the header's users don't have to.  An
>>   extreme case is hw/hw.h: it pulls in more than 40 other headers,
>>   then declares just hw_error().  Most of its users need only a
>>   fraction of it.  PATCH 08-09,12-18 fix that, trading the very
>>   occasional convenience of not having to type a few #include
>>   directives for build speed.
>> 
>> * "Fat" headers
>> 
>>   Some headers provide many things to many customers.  Bad when the
>>   customers generally need only parts.  Worse when such a "fat" header
>>   pulls in loads more.  This series grapples with three instances:
>>   qapi/qapi-types-common.h (PATCH 03), hw/boards.h, which pulls in
>>   almost 70 headers (PATCH 19-23), and sysemu/sysemu.h, which pulls in
>>   more than 20 (PATCH 23-28).
>> 
>> * Design erosion
>> 
>>   Off-the-cuff additions to headers can erode design.  For instance,
>>   the generated trace.h were carefully designed for minimal
>>   dependencies.  We let them balloon when we added the per-vCPU
>>   tracing feature a few years later.  PATCH 07 grapples with that.
>
> What can prevent us from these misuse patterns?

Excellent question.

> Will you redo this analysis/series after each releases?

Perhaps I should first explain my analysis, i.e. where my numbers come
from, then my cleanups.

Counting #include directives is not useful by itself.  For instance,
#include "qemu/typedefs.h" occurs just once, yet almost all .o depend on
it.

Better: count the actual make dependencies.  Compiling FOO.c generates
FOO.d in addition to FOO.o.  These .d list the .h the .o depend on.  I
wrote a stupid Python program to count how many .o depend on each .h.
This identifies widely included headers.

Also useful are the .d for the .c generated by "make check-headers":
they tell us how many headers each header pulls in.  Widely included
headers that pull in lots more are particularly promising candidates for
#include cleanup.

The actual cleanup is manual work.  Getting rid of copy pasta and
"convenience" headers is pretty straightforward.  "Fat" headers and
design erosion can require more thought.  Another difficult part is
identifying how to order the cleanups for reviewability.  Testing is
straightforward, but sloooow; much world-recompiling for many, many
worlds.

I started looking into this back in 2016[1], and updated the analysis
recently[2].  I contributed a few cleanups myself, and at least Paolo
also chipped in.

I can update the analysis more frequently if that helps.  However, ...

> How to automate misuse checks?
>
> Can we report some metrics and warn if there a considerable discrepancy?

... detecting we've made things worse weeks or months after we did is
clearly suboptimal.

The actual numbers depend on build configuration.  For reproducible
numbers, we need to hold configuration constant for all developers.  I
figure our containerized builds could do that for us.

To detect metrics going south, we need a baseline, and a definition of
"south".

If we store the baseline in git and fail the test when the actual
numbers are too "south" of the baseline, patches making things a lot
worse will have to update the baseline.  This should ensure proper
scrutiny.

However, a slow slide south over multiple independent patches will 
arbitrary blame the last one, while the others get off scot free.

I'm not sure how to best handle new headers.  Having to update the
baseline whenever you add a new header will likely add too much
friction.

Needs thought.

Perhaps we can start with fully automating the measurement part, then
passively gather data for a while to learn how the numbers change during
development.



[1] https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg03271.html
[2] https://lists.nongnu.org/archive/html/qemu-devel/2019-05/msg06291.html



reply via email to

[Prev in Thread] Current Thread [Next in Thread]