help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-bash] parallel processing in bash (to replace for loop)


From: Peng Yu
Subject: Re: [Help-bash] parallel processing in bash (to replace for loop)
Date: Sat, 4 Feb 2012 16:57:02 -0600

> for i in *.log ; do
>  echo "$i"
>  [...do other needed stuff...]
>  sem -j10 gzip $i ";" echo done
> done
> sem --wait

This solution and other solutions (including gnu parallel) suffer the
problem of not able to understand bash function.

~$ hello=xxx
~$ myfun() {
> echo $hello$1
> }
~$ myfun 1
xxx1
~$ sem -j10 myfun 1
^C
~$ sem -j10 myfun 1 ";"
~$ /bin/bash: myfun: command not found

~$ export -f myfun
~$ sem -j10 myfun 1 ";"
~$ 1

~$ set -a
~$ export -f myfun
~$ sem -j10 myfun 1 ";"
~$ 1


The best solution that I find is the following. It essentially use a
file to represent how a running job. When a job runs, the associated
file is created, when a job finishes the associate file gets deleted.
By controlling the number of files, you can control the number of jobs
run in parallel.

I hope that this may be helpful to others. I'd like to hear comments
if there is anything that can be improved.


~/linux/bin/xplat/src/util/holdcup$ cat holdcup.sh
#!/usr/bin/env bash

script_name=`basename "$0" .sh`

TEMP=`getopt -o hn:d: --long help,number_of_cups:,holdcup_dir: -n
"${script_name}.sh" -- "$@"`

if [ $? != 0 ] ; then printf "Terminating...\n" >&2 ; exit 1 ; fi

eval set -- "$TEMP"

abspath_script=`readlink -f -e "$0"`
script_absdir=`dirname "$abspath_script"`

number_of_cups=`ncpu.sh`

holdcup_dir=~/.holdcup/"$HOSTNAME"

while true ; do
  case "$1" in
    -h|--help)
      cat "$script_absdir"/${script_name}_help.txt
      exit
      ;;
    -n|--number_of_cups)
      number_of_cups="$2"
      shift 2
      ;;
    -d|--holdcup_dir)
      holdcup_dir="$2"
      shift 2
      ;;
    --)
      shift
      break
      ;;
    *)
      printf "Internal error!\n">&2
      exit 1
      ;;
  esac
done

mkdir -p "$holdcup_dir"

lockfile="$holdcup_dir/.lockfile"
lockfile -1 "$lockfile"

used_cups=`ls "$holdcup_dir" | wc -l`
while [ "$used_cups" -ge "$number_of_cups" ]
do
  #echo waiting ...
  sleep 1
  used_cups=`ls "$holdcup_dir" | wc -l`
done
cup_handle=`mktemp --tmpdir="$holdcup_dir"`

rm -f "$lockfile"

echo "$cup_handle"

~/linux/bin/xplat/src/util/holdcup$ cat holdcup_help.txt
Description:
  On multicore machine, when you want to submit multiple jobs and run
them in parallel, it is important to known how many jobs have been
submitted and how many jobs have been finished. You need to avoid
submitting to many jobs simultanously. To solve this problem, this
script check the number of available CPUs in the machine. It gives you
a file handle representing that there is a CUP available. The script
will wait until a CPU is available if all CPUs are used. Once a job is
finished, you need to delete the file so that the next job can be run.

Usage:
  holdcup.sh [Options]

Options:
  -h|--help                           Help message.
  -n|--number_of_cups                 Number of CUPs. Default: the
number cores in a machine.
  -d|--holdcup_dir                    Working directory of
'holdcup.sh'. Default: ~/.holdcup/"$HOSTNAME".

Example:
  rm -rf holdcup_dir

  hello=hello
  function myfun {
  echo $hello$1
  }

  for i in `seq 6`
  do
    cup_handle=`../holdcup.sh -d holdcup_dir`
    (
    sleep 4; myfun $i; rm "$cup_handle"
    ) &
  done
  wait

Author:
  Peng Yu <address@hidden>


-- 
Regards,
Peng



reply via email to

[Prev in Thread] Current Thread [Next in Thread]