[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [rdiff-backup-users] Activity
From: |
address@hidden |
Subject: |
Re: [rdiff-backup-users] Activity |
Date: |
Mon, 01 Aug 2011 14:02:10 +0100 |
User-agent: |
Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20110624 Thunderbird/5.0 |
David,
On 01/08/2011 10:14, address@hidden wrote:
Felix,
Im wondering if there is anyone developing on rdiff-backup atm.
as for me, you are asking the crucial question concerning rdiff-backup.
There has not been a lot of development activity on rdiff-backup in the
recent times, and in addition, there are some fatal bugs in rdiff, causing
f*cked-up repositories especially when backuping to windows-hosted
targets.
Some time ago, I mailed the current maintainer, I guess it was Andrew
Ferguson, on the maintainance state of rdiff-backup -- however I got no
answer.
Yes I think rdiff-backup is currently unmaintained. Anyone who wants to
take it forward (and has the skills to do so, which unfortunately I have
not) might need to make a fork (which in due course could become
rdiff-backup2?)
Im asking because im
thinking of using it in larger scale and want to know if there is some
activity going on.
I also would like to use it in larger scale, as it is - to the best of my
knowledge - the only free and flexible 4D-Backup-solution. However, I
experienced there are currently some caveats that I want to state here
(and propose some thoughts I had on what would make rdiff-backup more
useful).
* The repository format. When recovering older files, rdiff-backup
really needs every single reverse delta, which is not only slow, but
also extremely fragile (if only one of those files is corrupted,
recovery will fail). A solution might be some additional,
larger-granularity reverse deltas that help speeding up recovery as well
as preserving integrity of "most of the timeline" even if some deltas
are corrupted.
Although using multiple delta files is slow if you are regressing back
through many previous backup runs (which is very rare in practice,
though of course very valuable when you need it), I don't see how
creating larger-granularity reverse-deltas would really make it more
robust, it would just make the archives bigger. Under normal circs I
expect a reverse-delta file covering 10 backups would be not much less
than 10x the size of each separate reverse-delta file. (It is different
if files have been backed up accidentally once and then removed from the
archive, accidents like this can certainly bloat an rdiff-backup
repository.) Although a single damaged reverse-delta file will 'break'
backup recovery this will only apply for backups earlier than the date
of the reverse-delta file.
* Missing operators on an existing repository. For use of rdiff-backup
in larger scale it should be possible to e.g.
- merge time steps
- delete timesteps and correct deltas accordingly
- remove subtrees (sometimes one backups large data sets by accident)
- lots more.
yes these would be helpful especially to correct backup mistakes which
can permanently bloat a repository
* Some bugs, especially on operating system independence. For example,
even though the issue was investigated sometime, it's still difficult to
use windows machines as target due to the "write only attribute on
folders" problem. Multiple users report mismatching hashes, and so on.
Yes, the best advice re a windows target seems to be: don't. I think you
can reliably use rdiff-backup.exe to backup windows data to a linux
target, though.
* Maybe a dedicated network protocol would be nice (inspired by rsync),
but I think, this is less important.
and I would add:
* ability to run a thorough verification of an rdiff-backup archive.
The current verification process is flawed as has been discussed in
earlier threads here. The best strategy at the moment is to run a
verification for a date at or earlier than the earliest backup run
date, and then to run one or two backups for dates between the
earliest date and the current date, but although this provides 'high
confidence' about the integrity of the overall archive it does not,
at least from a theoretical point of view, guarantee that the full
history of all files, whether currently present or deleted, can be
recovered. The only way to get this at present is to run a separate
verification for every previous backup run, which is not realistic
for a long-standing repository.
* add a switch to enable 'forced' regression of an archive. At present
rdiff-backup will only regress an archive that it considers to be
broken. (However you can work around this limitation.)
Overall, am unsure whether it is more appropriate
- to learn from the experience the great rdiffproject gave us and use
the base operators from rdiff-backup to maybe rewrite a whole new thing
with the above issues fixed (especially with a less fragile repository)
or
- to continue fixing bugs on the small way in a project that seems
unmaintained (unfortunately, I lack the pro-grade python skills in order
to do it right).
I hope to start a discussion here on this thoughts, please contribute :)
There was a discussion a while ago here and there was a strong view that
the existing project should be fixed rather than a new one started, I
suppose because rdiff-backup as it stands is 99.5% perfect and any
project, even if it fixed the 0.5%, is likely to introduce new bugs and
failings. But in either case it needs someone to take on the
responsibility and workload. I think Daniel Miller began some work on a
replacement for rdiff-backup but I don't know where his project stands.
AFAIK the only other open source project like rdiff-backup is duplicity.
It has slightly different objectives, uses forward-deltas and has
different maintainers; maybe it is more actively maintained? But I value
the reverse-diff approach of rdiff-backup because it means the most
recent data is the most reliable and fastest to retrieve, and you can
continue to build up data history (for years even) without having to
start over at regular intervals. I would feel nervous if I had a 3 year
backup history but needed to use an original dataset and then 1000 daily
forward-diff files in order to get the latest backup of a file (which is
usually what you need). With rdiff-backup, if you do start to run out of
space, you can easily delete the older data without endangering more
recent backups.
Two other possibilities (neither of which I have tried) are:
* use rsync (or scripts based on it such as rsnapshot) but store the
backup datasets on a deduplication file system such as lessfs.
* put the filesystem on top of lvm and just take and keep regular lvm
snapshots, these can then be the backups. Recent linux kernels allow
you to revert a volume to an earlier snapshot if required. I don't
think this was an intended use of lvm snapshots, but it should work
and be quick'n'easy too, though I don't think it could or should be
used over a prolonged period because of space issues (and perhaps
speed). Of course the backups remain in the same volume as the
original data; they can be copied to another location but then they
will each take up the full space of the data.
Dominic
http://www.timedicer.co.uk
- [rdiff-backup-users] Activity, Felix Rios, 2011/08/01
- Re: [rdiff-backup-users] Activity, mail, 2011/08/01
- Re: [rdiff-backup-users] Activity, D. Kriesel, 2011/08/01
- Re: [rdiff-backup-users] Activity,
address@hidden <=
- Re: [rdiff-backup-users] Activity, D. Kriesel, 2011/08/01
- Re: [rdiff-backup-users] Activity, Robert Nichols, 2011/08/01
- Re: [rdiff-backup-users] Activity, Wojciech Stryjewski, 2011/08/01
- Re: [rdiff-backup-users] Activity, D. Kriesel, 2011/08/01
- Re: [rdiff-backup-users] Activity, D. Kriesel, 2011/08/01
- Re: [rdiff-backup-users] Activity, Wojciech Stryjewski, 2011/08/01
- Re: [rdiff-backup-users] Activity, Joe Steele, 2011/08/01
- Re: [rdiff-backup-users] Activity, Alexander Samad, 2011/08/02
- Re: [rdiff-backup-users] Activity, Piotr Karbowski, 2011/08/05
Re: [rdiff-backup-users] Activity, covici, 2011/08/01