Re: build system rules & algorithms

help-make

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: build system rules & algorithms

From:	Mike Shal
Subject:	Re: build system rules & algorithms
Date:	Wed, 10 Jun 2009 22:37:20 -0400

On 6/10/09, Paul Smith <address@hidden> wrote:
>  My concern about this is that it sort of glosses over the most critical
>  aspect.  The basic information required by any build system is and has
>  always been, "what's been changed?"  Without a highly accurate
>  accounting of what's been changed since the last successful build, even
>  the most advanced build system in the world will not work properly.
>
>  When make was created 30+ years ago, the only way to determine what's
>  been changed with any accuracy and reliability is modification times,
>  because these are maintained by the operating system and are unlikely to
>  be changed "arbitrarily" by hand... at any rate, requiring that the user
>  not do so is not an onerous burden since they rarely want to anyway.
>  Once this is decided, the current algorithms and behavior of make
>  follow: there is no way to know what's been changed without going out
>  and looking at each file.  Further, make gains a lot of simplicity (but
>  at the cost of some performance as you point out) by avoiding the need
>  to maintain a separate state: make is completely stateless or, at least,
>  it uses no state other than what is automatically maintained for it by
>  the operating system.
>
>  Going to a Beta build system, on the other hand, requires that some
>  entity maintain a highly if not completely accurate accounting of all
>  the files that have changed.  That is NOT a simple problem to solve.
>  Delegating that problem to IDEs and similar simply ignores the reality
>  of development (at least in my world).  People don't tend to use IDEs
>  much, and when they do they certainly don't do every operation using
>  them.  People remove files by hand, they rename files by hand, they edit
>  files using vi "just for a quick fix" rather than boot up the IDE, etc.
>  Further the act of updating the workspace via external tools such as
>  source control operations needs to be integrated.  I'm not aware of any
>  existing IDE or other userspace tool that has that level of integration
>  across the variety of tools and editors available: invariably they
>  relegate support for "other" tools to some kind of scripting engine or
>  "shell out" facility, which certainly means they cannot track "what's
>  changed".

I myself don't use an IDE (I prefer vim and the shell). For tup I use
a recursive inotify file monitor to watch the project directory, so
the file change list can be had from the kernel for little cost.
Though since it is likely that a user (specifically, me :) will forget
to run the file monitor, it will keep track of the last-known file
modification time that can be used for comparison when it starts up.
The monitor does an initial scan at startup, comparing the saved graph
against what the filesystem is now reporting. In this way, files that
have been changed, added, or deleted while the monitor wasn't looking
are still caught, so no information is lost. It is somewhat like
running make once when you start development, and then get all future
makes for almost free. I'm not claiming this is the best solution, but
it was the easiest for me to implement, and seems to cover most cases
for my development (like removing a file at the command line, using
vi, etc). Of course using inotify makes that part Linux-specific - I
definitely do not know of a generic way to achieve this across
platforms.

Still, your point that the file change list needs to be accurate is
correct. If you prefer to do the full-directory scan in order to get
the file-change list, that is still a possible improvement for a build
system. My goal with the paper was to get less than O(n) updates, so
both algorithms needed to be changed. For some hard numbers, my
machine takes 17s to run make in an already-built linux kernel tree.
Just doing a 'find . | xargs stat > /dev/null' only takes 2.6s though,
so I would guess the remaining ~14s are mostly for parsing Makefiles
and processing the DAG. It may be possible to just use the
full-directory scan initially, and from that information only build
the partial DAG that is necessary to perform the update. However, I
think you would run into the problem of determining which Makefiles to
load that I mentioned in response to John.

>
>  Any discrepancy in the list of modified files results in incorrect
>  results of a Beta build, and possibly in very subtle and hard to
>  discover ways.
>
>  Although writing reliable and correct makefiles is arcane and much
>  harder than we'd like it to be, it can be done once by a central
>  designer and that complexity can be contained and not exposed to the
>  development community at large.  In contrast, the requirement for a
>  complete and accurate list of "what's changed" must, by necessity, rely
>  on every developer doing everything "the right way" all the time, or the
>  list becomes inaccurate.  Then you're back to some kind of "reset"
>  operation similar to "make clean" to get back to a known state.

I don't think relying on the developer to do things correctly is the
right solution, which is why I favor an automatically generated list
(either via something like the file monitor I use, or operating system
support that you mention below).

>
>  Of course, the solution to this is to get the operating system involved
>  with tracking changes.  Certainly there are lots of reasons that
>  userspace tools want to know "what's changed": not just build systems
>  but also indexers, etc. all want to know.  And Linux, at least, has
>  evolved some relatively nice utilities to provide this information.
>  However, there is certainly no general, portable method for the
>  operating system to track this.  And, we've not even touched the
>  complexities involved with things like sources built from NFS-mounted
>  partitions, etc.

I agree getting the operating system involved would be nice. It seems
to me that it would be ideal for a filesystem to have a simple
changelog built in, and interested programs can have a pointer into
the list of where they last checked. That way they could easily get a
list of changes from the last check through the most recent change
(and it would be useful for indexers and such as you point out).

I haven't looked into NFS at all - sounds like it would be a headache.
All I use it for is to stream video around my house :).

>
>
>  I really do like the ideas in your paper and I think it's a worthwhile
>  exercise to skip over the very hard initial problem (knowing "what's
>  changed") to see what we could do if we knew that: whether it's even
>  worthwhile to try to solve it.  I think your paper shows very nicely
>  that we could do a lot better if we can solve that problem, so it's
>  worth thinking about those solutions.

Thanks! As the Linux example above shows, the initial problem (getting
the file change list) is probably the least beneficial anyway. I think
there are more gains to be had by dealing with a smaller graph, and
only parsing the build files that are necessary.

-Mike

[Prev in Thread]

Current Thread

[Next in Thread]

Re: build system rules & algorithms, (continued)
- Re: build system rules & algorithms, grischka, 2009/06/11
  - Re: build system rules & algorithms, Mike Shal, 2009/06/11
    - Re: build system rules & algorithms, Eli Zaretskii, 2009/06/12
    - Re: build system rules & algorithms, Eli Zaretskii, 2009/06/12
    - Re: build system rules & algorithms, Paul Smith, 2009/06/13
    - Re: build system rules & algorithms, Mike Shal, 2009/06/14
    - Re: build system rules & algorithms, grischka, 2009/06/13
    - Re: build system rules & algorithms, Mike Shal, 2009/06/14

Prev by Date: Re: build system rules & algorithms
Next by Date: Re: build system rules & algorithms
Previous by thread: Re: build system rules & algorithms
Next by thread: Re: build system rules & algorithms
Index(es):
- Date
- Thread