[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
make returning spurious 4522468 errors
From: |
tom honermann |
Subject: |
make returning spurious 4522468 errors |
Date: |
Thu, 07 Jan 2010 15:08:12 -0800 |
User-agent: |
Thunderbird 2.0.0.23 (Windows/20090812) |
We are in the process of migrating from our current Microsoft nmake
based build system to a platform independent build system based on GNU
make. We are encountering a problem in which GNU make sometimes reports
an error of 4522468 for some jobs only on Windows. We aren't able to
reproduce the error on demand - it currently happens somewhere between 1
and 15 times for every 10000 or so jobs. I think we've only seen this
when invoking make with the '-j' option. Higher occurrences of this
seem to correspond to higher values for the '-j' option. (ie, '-j 2'
produces fewer occurrences than '-j 8'). This suggests a race condition
somewhere in GNU make. We have not identified any correlation between
this occurring and the specific job that was run. We have clear
examples of make reporting this error even for jobs that completed
successfully (ie, where we are confident that the job returned an exit
value of 0). In some cases, this error has been emitted for a recursive
make invocation (in which the sub-make returned 0 or 2).
We were originally using the gmake.exe 3.81 binary distributed with
GNUWin32 (http://gnuwin32.sourceforge.net), but have now reproduced this
with a binary we built ourselves (also 3.81 using the MS VS 2008
compiler). Anyone else seen this?
Based on a brief glance at the code, I'm guessing that the value
returned by one of the calls to 'GetExitCodeProcess' in
'w32/subproc/sub_proc.c' is somehow getting lost or corrupted. There
are two calls to 'GetExitCodeProcess', both of which look very similar
to this:
DWORD ierr;
GetExitCodeResult = GetExitCodeProcess(childhand, &ierr);
if (ierr == CONTROL_C_EXIT) {
pproc->signal = SIGINT;
} else {
pproc->exit_code = ierr;
}
if (GetExitCodeResult == FALSE) {
pproc->last_err = GetLastError();
pproc->lerrno = E_SCALL;
}
Two things stand out to me here:
1: 'pproc->exit_code' is assigned the value of 'ierr' regardless of
whether the call to 'GetExitCodeProcess' is successful or not. If
'GetExitCodeProcess' fails, it may not assign to 'ierr' at all - which
could result in 'pproc->exit_code' getting assigned an uninitialized
value (since 'ierr' is not assigned a value)
2: There is no check for 'ierr' being assigned 'STILL_ACTIVE'. It may
be that other parts of the code ensure that 'GetExitCodeProcess' is
never called for an uncompleted process, I haven't looked for that.
Regardless, this is unlikely to be related to the problem that we are
seeing.
Tom.
- make returning spurious 4522468 errors,
tom honermann <=