[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-bash] parallel processing in bash (to replace for loop)
From: |
Peng Yu |
Subject: |
Re: [Help-bash] parallel processing in bash (to replace for loop) |
Date: |
Sat, 4 Feb 2012 16:57:02 -0600 |
> for i in *.log ; do
> echo "$i"
> [...do other needed stuff...]
> sem -j10 gzip $i ";" echo done
> done
> sem --wait
This solution and other solutions (including gnu parallel) suffer the
problem of not able to understand bash function.
~$ hello=xxx
~$ myfun() {
> echo $hello$1
> }
~$ myfun 1
xxx1
~$ sem -j10 myfun 1
^C
~$ sem -j10 myfun 1 ";"
~$ /bin/bash: myfun: command not found
~$ export -f myfun
~$ sem -j10 myfun 1 ";"
~$ 1
~$ set -a
~$ export -f myfun
~$ sem -j10 myfun 1 ";"
~$ 1
The best solution that I find is the following. It essentially use a
file to represent how a running job. When a job runs, the associated
file is created, when a job finishes the associate file gets deleted.
By controlling the number of files, you can control the number of jobs
run in parallel.
I hope that this may be helpful to others. I'd like to hear comments
if there is anything that can be improved.
~/linux/bin/xplat/src/util/holdcup$ cat holdcup.sh
#!/usr/bin/env bash
script_name=`basename "$0" .sh`
TEMP=`getopt -o hn:d: --long help,number_of_cups:,holdcup_dir: -n
"${script_name}.sh" -- "$@"`
if [ $? != 0 ] ; then printf "Terminating...\n" >&2 ; exit 1 ; fi
eval set -- "$TEMP"
abspath_script=`readlink -f -e "$0"`
script_absdir=`dirname "$abspath_script"`
number_of_cups=`ncpu.sh`
holdcup_dir=~/.holdcup/"$HOSTNAME"
while true ; do
case "$1" in
-h|--help)
cat "$script_absdir"/${script_name}_help.txt
exit
;;
-n|--number_of_cups)
number_of_cups="$2"
shift 2
;;
-d|--holdcup_dir)
holdcup_dir="$2"
shift 2
;;
--)
shift
break
;;
*)
printf "Internal error!\n">&2
exit 1
;;
esac
done
mkdir -p "$holdcup_dir"
lockfile="$holdcup_dir/.lockfile"
lockfile -1 "$lockfile"
used_cups=`ls "$holdcup_dir" | wc -l`
while [ "$used_cups" -ge "$number_of_cups" ]
do
#echo waiting ...
sleep 1
used_cups=`ls "$holdcup_dir" | wc -l`
done
cup_handle=`mktemp --tmpdir="$holdcup_dir"`
rm -f "$lockfile"
echo "$cup_handle"
~/linux/bin/xplat/src/util/holdcup$ cat holdcup_help.txt
Description:
On multicore machine, when you want to submit multiple jobs and run
them in parallel, it is important to known how many jobs have been
submitted and how many jobs have been finished. You need to avoid
submitting to many jobs simultanously. To solve this problem, this
script check the number of available CPUs in the machine. It gives you
a file handle representing that there is a CUP available. The script
will wait until a CPU is available if all CPUs are used. Once a job is
finished, you need to delete the file so that the next job can be run.
Usage:
holdcup.sh [Options]
Options:
-h|--help Help message.
-n|--number_of_cups Number of CUPs. Default: the
number cores in a machine.
-d|--holdcup_dir Working directory of
'holdcup.sh'. Default: ~/.holdcup/"$HOSTNAME".
Example:
rm -rf holdcup_dir
hello=hello
function myfun {
echo $hello$1
}
for i in `seq 6`
do
cup_handle=`../holdcup.sh -d holdcup_dir`
(
sleep 4; myfun $i; rm "$cup_handle"
) &
done
wait
Author:
Peng Yu <address@hidden>
--
Regards,
Peng
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [Help-bash] parallel processing in bash (to replace for loop),
Peng Yu <=