rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] About backups and increments


From: Maarten Bezemer
Subject: Re: [rdiff-backup-users] About backups and increments
Date: Mon, 22 Aug 2011 21:26:10 +0200 (CEST)


On Mon, 22 Aug 2011, Robert Nichols wrote:

About space requirements: I assume the space required for the backup is:

- the space of the source files themselves
- the space of all the increments
- extra space required to compute the increment?
* Is this space stored on the source or destination drive? * This should be the size of the file currently computed + it's increments right? So should I assume that to backup the *second* increment of some space X (where X can possibly be just one huge file) I need at least X * 2 space for the backup - just for temporary files? * This brings me back to my first question: what happens when the destination is full?

I'm not aware of any extra space needed for computing the increment, but the
increment itself, of course, does need to be stored on the destination drive.
If the destination drive runs out of space, the rdiff-backup session will
fail.

If it is detected that a file has changed (based on file attributes), a new file in the destination directory is created using a "temp name", and it is synced to its new contents, using the old version to speed up the rsync process. After that, an increment is created, and only then will the old version be removed. This process is followed sequentially for all files, so the total space needed would be the space for the increments that are created during this session, plus the size of the largest file in the repository. Of course, you usually don't know in advance how large the increments will be...

I don't really understand what you mean by 'the second increment'.
Worst case would be that you'd need the current size of the source, plus the total size of your last backup including all increments (if everything in the tree is replaced by something else), plus a small metadata overhead. If you repeat for a second increment and again all data has been replaced by other data, you would again need the current source size plus the total size of the backup tree. If, however, the data you backup changes only slightly or is mostly 'append-only' data like log files, each time the space used by increments would be quite limited.

It all depends on your data set...


About backup speed. rdiff-backup doesn't seem to support both backupping *and* pruning the increments at the same time (yes, I've read the man page). Though this sounds like a very sensible thing to do: knowing that you will prune several old increments, you can avoid to calculate the reverse diffs. Has this been considered?

There's not much point in combining those two, totally independent actions.
Computing the reverse diffs for session N vs. session N-1 is totally
independent of the existence (or lack thereof) of earlier sessions in the
archive.

Adding to that:
One will always have to calculate a reverse diff to go from the newly synced (N) version to the previous (N-1) version. If someone wants to avoid calculating reverse diffs for a file, that is the same as having no history at all. Better use rsync then, instead of rdiff-backup... If you don't calculate a reverse-diff for a file, you won't be able to regress a backup run that failed half-way through... leaving you with a useless backup.

But!
Maybe I now know what I didn't understand in your line of questioning. With rdiff-backup, increments are for individual files, and only when these individual files have been changed. So, there are no reverse diffs if a file has not been changed. For a data set of 1000 files with only 10 files changing since the previous run, the increments dir would only contain 10 reverse diff files for this run. Likewise, if a file hasn't been changed for 3 months and it is changed today, but I only want to keep 1 month of history, I can NOT simply ditch the 3-months old version. Maybe it wasn't changed for all these months, but it is still yesterday's version and has to be kept in history for the coming month minus 1 day...


--keep-increments N (where N is the number of most recent increments to keep, irregardless of time).
[snip]
Let's say I want always to keep at all times at least 2 increments (or 2 months, if that matters), I have no way to do that directly (I could list the increments and calculate the time myself, but that's ugly).

So.. lets assume you make weekly backups. (Hoping it will be more often, but just as an example.)
You want to keep history of 2 months. That's about 8 or 9 weeks.
But sometimes you make an extra backup halfway through a week, and sometimes you go on a vacation and don't run any backup. So, in these cases, you might want to keep history for 2 months, but also at least 5 increments, even if that means it will be more than 2 months? Would it really be useful to.. eh.. keep increments from 4 months ago if you forgot to run backups for the last 2 months? This sounds just like "oh, I didn't make backups over the last two months, but I do happen to have some historic versions from 3 months ago containing your PhD thesis you've been working on... for the last 3 months....."

Let's just say that I don't think having such an option would be a really nice thing to have ;-)
And creating a small script would indeed be far easier ;-)

Side note: I never automate the removal of old increments. Always do that by hand, first without --force to check the increment dates it announces that will be removed, then with --force if it looks OK. The only thing that's automated wrt increment removal is a cron job reminding me of the task. I could even modify it to remind me daily if increment removal is due and wasn't done yet, but for now, I keep these reminders in my inbox until the removal is done.


--
Maarten



reply via email to

[Prev in Thread] Current Thread [Next in Thread]