[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [PATCH] Windows jobserver updates for GNU make 4.1
From: |
Troy Runkel |
Subject: |
RE: [PATCH] Windows jobserver updates for GNU make 4.1 |
Date: |
Tue, 1 Mar 2016 12:21:38 +0000 |
Hi Eli,
The changes for this patch were created by another developer where I work.
I've forwarded your questions to him and included his answers below. Thanks!
--Troy Runkel
-----Original Message-----
From: Eli Zaretskii [mailto:address@hidden
Sent: Saturday, February 27, 2016 4:01 AM
To: Troy Runkel <address@hidden>
Cc: address@hidden
Subject: Re: [PATCH] Windows jobserver updates for GNU make 4.1
> From: Troy Runkel <address@hidden>
> Date: Wed, 24 Feb 2016 17:27:57 +0000
> Cc: Troy Runkel <address@hidden>
>
> This patch expands the maximum number of simultaneous jobs on Windows
> from 63 (limit dictated by Windows MAXIMUM_WAIT_OBJECTS) to 4095.
>
> It also fixes a situation where GNU make on Windows could deadlock
> and/or suffer memory corruption issues if –j or –j63 was used. The problem
> was due to the way that $(shell …) commands are handled.
Thanks.
I wonder how hard would it be to remove the limitation altogether?
If we are lifting the limitation, why not lift it entirely?
ANSWER: Simplicity and reliability. To our knowledge, no one has
complained about the
previous limit of 63 until we bumped into it while using IncrediBuild
to distribute jobs across
multiple machines. A larger number could certainly be used or a
dynamic value, but
neither seemed appropriate in our engineering judgement.
Another comment is about the busy-wait loop (with 10-msec sleep) that this
change uses -- it looks like, when there are 4095 jobs running, we will be
re-checking each process once every 640 msec, is that right?
ANSWER: No, the check is constant time. The 10ms sleep only occurs
when no processes
are ready and gmake would have otherwise been blocked (note: the sleep
is outside the
inner for-loop. It looks like the file was formatted with tabs so
brace alignment might be
confusing depending on what diff tool you are using.
That might be too large a delay, I think. Did you consider a design that uses
a separate thread for watching up to 64 processes?
I think such a design might scale up better; in particular, the response time
for checking the status of each job will not decrease as the number of jobs
goes up.
ANSWER: Yes, that was our first approach, but the cost of creating and
managing the threads vastly
exceeded the execution time of the current approach. More
specifically, when waiting on
threads, one is typically waiting for them to die and when one dies,
you have all the others
to worry about rounding up (i.e., they need to be killed or stopped).
It becomes hairy
quite fast, and has all sorts of subtleties like race conditions, etc.
Also, note that we were unable to measure the overhead of the current
approach and it has been
working reliably in a large production environment for 8+ months.
> +DWORD GmakeWaitForMultipleObjects(
> + DWORD nCount,
> + const HANDLE *lpHandles,
> + BOOL bWaitAll,
> + DWORD dwMilliseconds
> +)
> +{
> + assert(nCount <= GMAKE_MAXIMUM_WAIT_OBJECTS);
> +
> + if (nCount <= MAXIMUM_WAIT_OBJECTS) {
> + DWORD retVal = WaitForMultipleObjects(nCount, lpHandles, bWaitAll,
> dwMilliseconds);
> + return (retVal == WAIT_TIMEOUT) ? GMAKE_WAIT_TIMEOUT : retVal;
This GMAKE_WAIT_TIMEOUT and GMAKE_WAIT_ABANDONED_0 stuff looks like a kludge to
me; can we get rid of it?
ANSWER: No. These are to compensate for a short-sighted design
decision of Microsoft's part to
use values for error conditions that are now in the middle of our
legitimate return range, thus
we need to move the error conditions outside this now legal region.
Keeping the error conditions
the same and mapping the legitimate values around them, while probably
possible, would
likely require more code and greater chance of confusion and errors.
> + switch (retVal) {
> + case WAIT_TIMEOUT:
> + retVal = GMAKE_WAIT_TIMEOUT;
> + continue;
> + break;
> + case WAIT_FAILED:
> + fprintf(stderr,"WaitForMultipleOjbects failed waiting with error
> %d\n", GetLastError());
> + break;
I guess this fprintf should go away, or be converted to a DB call?
ANSWER: We have never seen this error in practice (that we are aware
of), but we wanted to
cover all our bases. Converting it to an alternate form is probably
fine. Thanks.
Thank you for working on this.
- RE: [PATCH] Windows jobserver updates for GNU make 4.1,
Troy Runkel <=
- Re: [PATCH] Windows jobserver updates for GNU make 4.1, Eli Zaretskii, 2016/03/01
- Re: [PATCH] Windows jobserver updates for GNU make 4.1, Paul Smith, 2016/03/07
- Re: [PATCH] Windows jobserver updates for GNU make 4.1, Eli Zaretskii, 2016/03/07
- Re: [PATCH] Windows jobserver updates for GNU make 4.1, Paul Smith, 2016/03/08
- Re: [PATCH] Windows jobserver updates for GNU make 4.1, Eli Zaretskii, 2016/03/08
- Re: [PATCH] Windows jobserver updates for GNU make 4.1, Paul Smith, 2016/03/08
- Re: [PATCH] Windows jobserver updates for GNU make 4.1, Eli Zaretskii, 2016/03/08
- RE: [PATCH] Windows jobserver updates for GNU make 4.1, Troy Runkel, 2016/03/08
- Re: [PATCH] Windows jobserver updates for GNU make 4.1, Eli Zaretskii, 2016/03/08