bug-parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Parallel Bug Reports [parallel] parallel-20110205 hangs in "make


From: Ido Tal
Subject: Re: GNU Parallel Bug Reports [parallel] parallel-20110205 hangs in "make check" step
Date: Thu, 10 Feb 2011 16:19:38 -0800

Hi,

I thought the following might be relevant to the discussion, since it presents another use case in which the problem can occur (if I've understood correctly).

 I've recently recommended GNU parallel to our supercomputer center (but have not heard back). The way the supercomputer works is as follows:
* A user submits many jobs to a queue.
* The queuing system distributes the jobs between the different computer nodes that make up the supercomputer.
* Each node has its own CPU, but the file system is shared between the nodes.

Now, if the jobs sent were making use of parallel with the sem option, then it seems the bug would occur. In this case, I think it would make sense for each host to have its own file.

Hope this helps,

Ido


On Thu, Feb 10, 2011 at 3:16 PM, Ole Tange <address@hidden> wrote:
On Thu, Feb 10, 2011 at 6:25 PM, Nelson H. F. Beebe <address@hidden> wrote:
>>> ...
>>> > [... sem hangs ...]
>>>
>>> sem (== parallel --semaphore) writes to ~/.parallel/. If ~/.parallel
>>> is shared between the different machines (maybe using NFS), then that
>>> may cause unexpected blocking.
>>> ...
>
> Ah hah!  That is the problem!  When I install packages, the build-all
> script (see Chapter 9 of Classic Shell Scripting) does the builds in
> parallel across all targets.  At my site, about half of them share the
> $HOME tree via NFS, and the other half are virtual machines or
> standalone systems with independent $HOME trees.  The systems on which
> the hangs occur are all ones with the shared $HOME.

For the install the issue can be solved by adding hostname to the --id
in the makefile. However, I am not sure what the general solution
should be:

* Should the lock be hostwide (i.e. processes on other hosts can use
the same lock - even when they share $HOME)
* Should the lock be universalwide (i.e. process on other hosts will
have to wait until the lock is released when they share $HOME).

Currently the lock is only tested on unshared $HOME.

> What about using one of these directories instead:
>
>        ~/.parallel/semaphores/<processnumber>
>        ~/.parallel/semaphores/<randomnumber>
>        ~/.parallel/semaphores/`hostname`

The first two would defeat the purpose of a lock. The second would
make the lock local to this machine and that can be done by simply
adding `hostname` to --id.

> The last is the easiest, but what if the user is employing GNU
> parallel on the same system for several different tasks?

You will normally use a different --id for different tasks. Also
remember this ONLY happens when using --semaphore.

>  Our biggest
> servers are Sun SPARC Enterprise T5240 systems, with two 8-core
> Niagara T-3 CPUs and 128 hardware threads, so that scenario could well
> occur.

... but only if all of these requirements are met:

* The --semaphore option is used (or GNU Parallel is called as 'sem')
* $HOME is shared
* other servers sharing $HOME use the same --id

As you can simply append `hostname` to make your --id only valid for
one host then that should not pose a problem for normal use.

This would probably make sense for the 'make install'.

/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]