rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Activity


From: address@hidden
Subject: Re: [rdiff-backup-users] Activity
Date: Mon, 01 Aug 2011 14:02:10 +0100
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20110624 Thunderbird/5.0

David,

On 01/08/2011 10:14, address@hidden wrote:
Felix,

Im wondering if there is anyone developing on rdiff-backup atm.
as for me, you are asking the crucial question concerning rdiff-backup.

There has not been a lot of development activity on rdiff-backup in the
recent times, and in addition, there are some fatal bugs in rdiff, causing
f*cked-up repositories especially when backuping to windows-hosted
targets.

Some time ago, I mailed the current maintainer, I guess it was Andrew
Ferguson, on the maintainance state of rdiff-backup -- however I got no
answer.

Yes I think rdiff-backup is currently unmaintained. Anyone who wants to take it forward (and has the skills to do so, which unfortunately I have not) might need to make a fork (which in due course could become rdiff-backup2?)

Im asking because im
thinking of using it in larger scale and want to know if there is some
activity going on.
I also would like to use it in larger scale, as it is - to  the best of my
knowledge - the only free and flexible 4D-Backup-solution. However, I
experienced there are currently some caveats that I want to state here
(and propose some thoughts I had on what would make rdiff-backup more
useful).
   * The repository format. When recovering older files, rdiff-backup
really needs every single reverse delta, which is not only slow, but
also extremely fragile (if only one of those files is corrupted,
recovery will fail). A solution might be some additional,
larger-granularity reverse deltas that help speeding up recovery as well
as preserving integrity of "most of the timeline" even if some deltas
are corrupted.

Although using multiple delta files is slow if you are regressing back through many previous backup runs (which is very rare in practice, though of course very valuable when you need it), I don't see how creating larger-granularity reverse-deltas would really make it more robust, it would just make the archives bigger. Under normal circs I expect a reverse-delta file covering 10 backups would be not much less than 10x the size of each separate reverse-delta file. (It is different if files have been backed up accidentally once and then removed from the archive, accidents like this can certainly bloat an rdiff-backup repository.) Although a single damaged reverse-delta file will 'break' backup recovery this will only apply for backups earlier than the date of the reverse-delta file.

   * Missing operators on an existing repository. For use of rdiff-backup
in larger scale it should be possible to e.g.
     - merge time steps
     - delete timesteps and correct deltas accordingly
     - remove subtrees (sometimes one backups large data sets by accident)
     - lots more.

yes these would be helpful especially to correct backup mistakes which can permanently bloat a repository

   * Some bugs, especially on operating system independence. For example,
even though the issue was investigated sometime, it's still difficult to
use windows machines as target due to the "write only attribute on
folders" problem. Multiple users report mismatching hashes, and so on.

Yes, the best advice re a windows target seems to be: don't. I think you can reliably use rdiff-backup.exe to backup windows data to a linux target, though.

   * Maybe a dedicated network protocol would be nice (inspired by rsync),
but I think, this is less important.

and I would add:

 * ability to run a thorough verification of an rdiff-backup archive.
   The current verification process is flawed as has been discussed in
   earlier threads here. The best strategy at the moment is to run a
   verification for a date at or earlier than the earliest backup run
   date, and then to run one or two backups for dates between the
   earliest date and the current date, but although this provides 'high
   confidence' about the integrity of the overall archive it does not,
   at least from a theoretical point of view, guarantee that the full
   history of all files, whether currently present or deleted, can be
   recovered. The only way to get this at present is to run a separate
   verification for every previous backup run, which is not realistic
   for a long-standing repository.
 * add a switch to enable 'forced' regression of an archive. At present
   rdiff-backup will only regress an archive that it considers to be
   broken. (However you can work around this limitation.)


Overall, am unsure whether it is more appropriate
   - to learn from the experience the great rdiffproject gave us and use
the base operators from rdiff-backup to maybe rewrite a whole new thing
with the above issues fixed (especially with a less fragile repository)
or
   - to continue fixing bugs on the small way in a project that seems
unmaintained (unfortunately, I lack the pro-grade python skills in order
to do it right).

I hope to start a discussion here on this thoughts, please contribute :)

There was a discussion a while ago here and there was a strong view that the existing project should be fixed rather than a new one started, I suppose because rdiff-backup as it stands is 99.5% perfect and any project, even if it fixed the 0.5%, is likely to introduce new bugs and failings. But in either case it needs someone to take on the responsibility and workload. I think Daniel Miller began some work on a replacement for rdiff-backup but I don't know where his project stands.

AFAIK the only other open source project like rdiff-backup is duplicity. It has slightly different objectives, uses forward-deltas and has different maintainers; maybe it is more actively maintained? But I value the reverse-diff approach of rdiff-backup because it means the most recent data is the most reliable and fastest to retrieve, and you can continue to build up data history (for years even) without having to start over at regular intervals. I would feel nervous if I had a 3 year backup history but needed to use an original dataset and then 1000 daily forward-diff files in order to get the latest backup of a file (which is usually what you need). With rdiff-backup, if you do start to run out of space, you can easily delete the older data without endangering more recent backups.

Two other possibilities (neither of which I have tried) are:

 * use rsync (or scripts based on it such as rsnapshot) but store the
   backup datasets on a deduplication file system such as lessfs.
 * put the filesystem on top of lvm and just take and keep regular lvm
   snapshots, these can then be the backups. Recent linux kernels allow
   you to revert a volume to an earlier snapshot if required. I don't
   think this was an intended use of lvm snapshots, but it should work
   and be quick'n'easy too, though I don't think it could or should be
   used over a prolonged period because of space issues (and perhaps
   speed). Of course the backups remain in the same volume as the
   original data; they can be copied to another location but then they
   will each take up the full space of the data.


Dominic
http://www.timedicer.co.uk





reply via email to

[Prev in Thread] Current Thread [Next in Thread]