[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNU Parallel Bug Reports Adding sqlite job queue
From: |
Ole Tange |
Subject: |
Re: GNU Parallel Bug Reports Adding sqlite job queue |
Date: |
Sun, 31 Aug 2014 14:04:11 +0200 |
On Fri, Aug 29, 2014 at 2:19 AM, Stephen Fralich <address@hidden> wrote:
> I administer the university-wide shared HPC Cluster at the University
> of Washington. I installed parallel several years ago and several
> groups use it heavily.
That sounds great. I will be looking forward to when the research
using GNU Parallel results in published articles. Use 'parallel
--bibtex' to generate the reference entry.
> A useful enhancement I've envisioned is to have parallel read tasks
> out of an SQLite database rather than from the command line and then
> update the database when the tasks have started and then again when
> they've finished with return codes and so on. This would make it
> simpler for users to keep track of tens of thousands of individual
> tasks and decrease the number of jobs users submit as well (which
> helps me).
>
> I'd like to modify parallel to have this capability, but I thought I'd
> seek advice before getting into it.
> I've been writing Perl code for quite a while (> 15 years) and I'd
> consider myself moderately adequate. What do you think? Do you have
> any advice as to how I'd go about implementing it? Any help is
> appreciated (even vague pointers). Thanks a lot,
First you should look at --joblog, --results, --resume, --resume-failed.
This is a Python module for reading output from --results:
http://git.savannah.gnu.org/cgit/parallel.git/tree/src/optional/python/README?id=2852939aa586d0a4c850481f84c7516fa6a9f379
If you still feel you need SQLite, a way would be to make two wrappers:
cat_sql jobs.sql | parallel --joblog jl &
tail -n+0 -f jl | update_sql
If they work well, it may make sense to include them in the distribution.
/Ole