rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[rdiff-backup-users] parallelized rdiff-backup running on multiple hosts


From: Sabuj Pattanayek
Subject: [rdiff-backup-users] parallelized rdiff-backup running on multiple hosts connected to shared source and destination filesystems and reporting scripts
Date: Thu, 12 Jun 2008 18:08:05 -0500

Hi,

I don't know if this has been suggested or if it exists, but has
anyone written wrapper scripts that can launch multiple rdiff-backup
jobs on multiple hosts connected to the same (shared) source (e.g. a
huge NAS) and destination filesystems (a huge SAN running a shared
fibre channel/iSCSI file system)? If not, I'm in the process of
writing prdiff-backup - parallel rdiff-backup (in python). The wrapper
processes will look at a shared file that has the names of all the
directories to backup. Each process will open a file called
rdiff-backup-data/prdiff-backup.running (under various destination
dirs), then lock the file using advisory locking (to make sure it
doesn't collide with other rdiff-backup jobs trying to backup to the
same destination directory), and store in it a string that has the
output of `hostname -s` indicating which host is currently backing up
into that directory. Since some backup jobs won't even finish in one
day, it'll limit itself to run N number of jobs on each backup host at
all times because cron will launch the wrapper script every day. The
script will also automatically remove old increments using
--remove-older-than. At the end of the backup job the wrapper will log
the amount of time it took to finish the job to a shared log file.

I had actually written something similar to this in perl using tar
(for initial backups) and rsync for incrementals. It was similar to
dirvish but parallelized as described above. Eventually I decided to
switch to rdiff-backup because generating millions of hardlinks
started to really slow everything down with all the stats calls, etc.

Another side project will be to have a perl script that looks through
all the backup destination dirs for the prdiff-backup.running files,
uses rdiff-backup -l, and looks at the logs from the prdiff-backup
wrapper to generate stats on which backups are running, how many
increments are available for each backup destination (e.g. home dirs),
and on average how much time it's taking to run each backup job,
respectively. The script could even dump this info to a sqlite db for
faster retrieval by CGI scripts which would dump the data to a browser
in html format with nice tables and color codings.

Does anyone have similar scripts for rdiff-backup that they might
share? If not, I hope to have mine finished in a few days.

Thanks,
Sabuj Pattanayek




reply via email to

[Prev in Thread] Current Thread [Next in Thread]