Re: [bug #33138] .PARLLELSYNC enhancement with patch

Since you asked basic questions I'm going to start this at a basic level. Apologies if it covers some stuff you already know or if I misinterpreted the questions. Note that I haven't actually looked at the patch that went in so this is generally wrt the original.

The first thing is get the word "lock" out of your mind because we aren't really locking anything. Yes, that API is in use but it's only to create a semaphore or baton. Nobody is ever prevented from doing anything. It just happens that on Unix the most portable (i.e. oldest) way of implementing a semaphore is with the advisory locking API. All cooperating processes agree not to proceed unless and until they are able to acquire the exclusive lock on a shared file descriptor, but it's not necessary to ever actually write anything to that descriptor.

Second, the original implementation (not sure if I ever sent that one one in though) actually created a temp file to use as the semaphore fd. But then I discovered that stdout can be locked in the same way, which is simpler. But applying the lock to stdout is just a frill; it could be a temp file, especially if some platform turned out to need it that way. I just figured that stdout is always available, or at least if it's closed you don't have to worry about synchronizing output.

Third, yes, nothing is locked while the child runs. If a shared resource was locked during child runs it would have the effect of re-serializing the build as each supposedly parallel child waited on the lock. So what happens here is really very simple: each child (aka recipe) runs asynchronously, assuming -j of course, and dumps its output to one or two temp files. Only when the child has finished and wants to report results does it enter the queue waiting for the baton. When it gets it, it holds it just long enough to copy its output from the temp files to stdout/stderr and then lets the next guy have his turn. Thus, assuming the average job runs for a significant amount of time (multiples of a write() system call anyway) there will not be much contention on the semaphore and it won't be a bottleneck.

You're right that simply writing to temp files and dumping everything at once when the job finished would be likely to reduce the incidence of garbling even without the semaphore, but not to zero.

It may be that the locking of stdout is only useful on Unix due to the fact that it's inherited into child processes. I don't know what Paul or Frank is thinking, and as mentioned I haven't looked at the current version, but my thinking originally was that Windows could easily handle this using its own far richer set of semaphore/locking APIs. I'd actually expect this to be easier and more natural on Windows than Unix. All that's required is to choose a semaphore to synchronize on, dump output to temp files, and copy it to stdout/stderr only after acquiring the semaphore. And remove the temp files of course.

-David Boyce

On Tue, Apr 23, 2013 at 10:50 AM, Eli Zaretskii <address@hidden> wrote:

> Date: Fri, 19 Apr 2013 11:54:05 +0200
> Cc: address@hidden, address@hidden
> From: Frank Heckenbach <address@hidden>
>

> Eli Zaretskii wrote:
>
> > Initial investigation indicates that tmpfile should do the job just
> > fine: the file is deleted only when the last descriptor for it is
> > closed. That includes any duplicated descriptors.
>
> Great.
>
> > As for fcntl, F_SETLKW, and F_GETFD, they will need to be emulated.
> > In particular, it looks like LockFileEx with LOCKFILE_EXCLUSIVE_LOCK
> > flag set and LOCKFILE_FAIL_IMMEDIATELY flag cleared should do the
> > job. I will need to see how it works in reality, though.
>
> OK.

Upon a second look, I'm not sure I understand how this feature works,
exactly, and why you-all thought making it work on Windows is a matter
of a few functions. I sincerely hope I'm missing something, please
bear with me.

First, most of the meat of OUTPUT_SYNC code, which sets up the stage
when running child jobs, is in a branch that isn't compiled on Windows
("#if !defined(__MSDOS__) && !defined(_AMIGA) && !defined(WINDOWS32)"
on line 1482 of job.c). So currently that part is not even run on
Windows. Please tell me that nothing in this feature relies on
'fork', with its copying of handles and other data structures.
Because if it does, we have no hope of making it work on Windows, at
least not using the same algorithms as on Unix.

More importantly, how exactly locking the (redirected) stdout/stderr
of the child is supposed to cause synchronization, and why do we need
it at all? Isn't synchronization already achieved by redirecting
child's output to a file, and only dumping it to screen when the child
exits? What does lock add to this? Who else will be writing what to
where, that we want to prevent by holding the lock/semaphore?

In an old thread, Paul explained something similar:

> David, can you explain why you needed to lock the files? Also, what
> region(s) of the file you are locking? fcntl with F_WRLCK won't work
> on Windows, so the question is how to emulate it.

David wants to interlock between ALL instances of make printing output,
so that even during recursive makes no matter how many you have running
concurrently, only one will print its output at a time.

There is no specific region of the file that's locked: the lockfile is
basically a file-based, system-wide semaphore. The entire file is
"locked"; it's empty and has no content.

Assuming this all is still basically true, I guess I still don't
understand what exactly is being locked and why. E.g., why do we only
want to interlock instances of Make, but not the programs they run?
Also, acquire_semaphore is used only in sync_output, which is called
only when a child exits. IOW, nothing is locked while the child
runs, only when its output is ready.

In addition, we are locking stdout. But doesn't each instance of Make
have, or can have, its own stdout? If so, how will the interlock
work?

What am I missing? Probably a lot.

TIA

_______________________________________________
Bug-make mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/bug-make

From:	David Boyce
Subject:	Re: [bug #33138] .PARLLELSYNC enhancement with patch
Date:	Tue, 23 Apr 2013 11:29:35 -0700