[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
GNU Parallel Bug Reports user-specified action on idle remote hosts with
From: |
Matthews, Gregory A. (ARC-TN)[InuTeq, LLC] |
Subject: |
GNU Parallel Bug Reports user-specified action on idle remote hosts with no more jobs to run |
Date: |
Tue, 15 Jan 2019 07:04:31 +0000 |
Hi,
NASA Ames makes GNU parallel available to users of its clusters
(https://www.nas.nasa.gov/hecc/support/kb/using-gnu-parallel-to-package-multiple-jobs-in-a-single-pbs-job_303.html),
and we've been considering how GNU parallel could better integrate with our
resource manager/batch scheduler (PBS) to minimize wasted compute cycles. A
simple use case is that a user requests some number of hosts for a PBS session
where GNU parallel is used to distribute jobs to the assigned hosts. When all
jobs have been distributed (but not yet completed) it would be nice if GNU
parallel had a mechanism to take a user-specified action on those hosts which
are no longer running any jobs. In our case we would want to run a particular
PBS command on the local host that would remove the no-longer-running-jobs
hosts from the PBS session, returning them to the pool of hosts that other
users can use in their PBS sessions.
I'd be happy to give a more concrete explanation of the above use case if that
would help. As I've looked over the tutorial, man page, mailing lists and perl
code itself I see how the management of remote hosts has become more
sophisticated over time, and this seems to me another step toward more dynamic
management of remote hosts. GNU parallel knows* when a remote host will remain
idle because there are no more jobs left to distribute, user-specified action
at that point opens up a number of possibilities for marking/shedding/??? those
hosts.
* I know there are wrinkles, such as --retries where it might be best to hang
on to some number of currently-idle remote hosts to satisfy --retries
Our current focus is on the SSHLogin structure, obtaining a mechanism to call
the user-specified action on each SSHLogin that transitions to permanently-idle
(again, noting the asterisk statement above). From what I can tell,
drain_job_queue() would be the place to scan all SSHLogin's to find
permanently-idle instances (I'm not entirely clear how SSHLogin and
--sqlmaster/--sqlworker intersect here). Then reaper() would allow detection of
future permanently-idle SSHLogin instances.
-Greg Matthews
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- GNU Parallel Bug Reports user-specified action on idle remote hosts with no more jobs to run,
Matthews, Gregory A. (ARC-TN)[InuTeq, LLC] <=