rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re[11]: [rdiff-backup-users] Verify times increasing


From: listserv . traffic
Subject: Re[11]: [rdiff-backup-users] Verify times increasing
Date: Tue, 8 Dec 2009 11:05:58 -0800

I know this discussion ended, in practical terms a while ago, but I
started some thoughts back then, and didn't have time to finish them.
So, I did so today, and thought they might be helpful in a conceptual
way.

One way to cut down on verify times would be to limit the number of delta's in 
the system.

(That's why you can verify a tape backup in only 2x the non-verify
times - you don't have to "compute" anything for each file.)

But you'll need more space to do that, which will make things more expensive.

But I think you're up against the "cheap, quick, well-done, pick any two" rule.

I'd guess it will still be cheaper to break the system down along
quarters, or months than going to tape. (You're trading compute time
for space [or data/delta size]. If the rdiff compute-time becomes too
excessive, then you can trade back in space. Reduce the number of
RDiffs in the repository. This should allow you to run a recursive
verify on the whole thing in a reasonable time.)

Use a repository for say, a quarter and throw in all the diffs into the same 
repository.

Use a recursive script to verify the whole repository - as often as needed. 

Obviously the longer the time-frame of the run, the longer a comprehensive full 
verify will take at the end of the time-period.

Once you reach the end of the period, you simply backup that repository to a 
couple of off-site media and you're good. You keep that archive repository for 
the length of your backup window.

Your window of time limit for a single repository will simply be how long you 
can tolerate for a comprehensive --verify.

If you wanted, say a years worth of file history and you could have a month in 
each repository before it grows un-verify-able in the time you have to do so, 
here would be the requirements as I see it.

You'd need 12/24 disks for each month's repository to go offsite. (12 or 24 
depending on if you want redundant copies of each repository. We'll call these 
os1-os12 [offsite 1] and osr1-osr12  [offsite redundant 1]

You'd need 3 disks for the current set and it's "remote" backup. We'll call 
these ol1 [online 1] and rem1 [remote 1], remr1 [remote redundant 1].

So, you backup to ol1 each night (...or whatever your backup period is - it 
could be every hour if you want, or just once a week...)

Between backups, you'll sync the repository growing on ol1 to rem1. You'll also 
sync between rem1 and remr1 so you have three copies of the current repository.

On a weekend, you'll do a full recursive --verify of the ol1 repository.

At the end of the month, you'll sync remr1 to os1 and osr1. (Take the whole 
month repository and make a copy of it to your monthly off-site set of disks.)

You'll kill the current repository on ol1. (Erase it, and start over.)

Then you start the process over - building a repository with a months worth of 
data in it.

Then sync to the next set of off-site [and redundant] disks.

When you get to the end of the year, obviously you'll start recycling the 
oldest os and osr disks.

If you have to do a restore from more than a month ago, you'll grab the 
appropraite set of os or osr disks and restore the file.

If the restore is in the last 30 days, you'll just do it out of the current 
repository.

This method is still a lot cheaper than going to tape for large volumes of data 
that only have moderate changes in them. [i.e. for data-sets that are handled 
well by rdiff-backup.]

If the data set isn't handled well, then you won't get much trade off by doing 
more compute cycles for space and rdiff-backup isn't for you.

---

Hope that's helpful.

-Greg



> [Inline]

>> On Nov 25, 2009, at 12:25 PM, address@hidden wrote:
>>> <snip explanation of how rdiff-backup works>

>> Sounds good.

>>> So, a --verify isn't needed to "verify" the current files. The very
>>> nature of RDB is that they're exact. (provided you trust the RDB
>>> protocol...which we assume.)

>> OK, I can accept this (and this makes my backup time shorter, nice).

>>> A --verify IS needed when you want to check an "older" version to be
>>> sure that something hasn't borked your repository for an older delta
>>> set. [But the "current" files are already verified, IMO]

>> When and why would I ever use this? If I need to restore an old backup
>> it might be nice to know that I have access to good data, but I'll  
>> take whatever I can get at that point. --verify doesn't seem to be  
>> very useful to do a general repository health check (bummer).

> Well, the "repository" is the "current" files and then the meta-data
> and rdiffs to get to previous versions of the files.

> It does check the repository. When a backup is done, it stores a SHA1
> hash of the "source" file.

> So a --verify that completes successfully does the following:
> Takes a "current" file, applies all the relevant rdiffs as the
> meta-data says it should be applied. Once done, it calcs a SHA1 hash
> for the "restored" file and compares it to the stored SHA1 hash of
> that file when it was backed up on the relevant date.

> If the two match, we know the system worked properly.

> So, a -verify back to the oldest backup does do a fairly
> comprehensive check - just not exhaustive. It does verify that the
> meta-data/rdiffs for a lot of the system does work, and isn't
> corrupt.

> Again, it's not deterministic, which I'd like - but it's not half bad
> either.

> A totality/deterministic check would certainly be nice, but I think it's 
> do-able
> the way it is now.

>>> So, your most important data the current data is verified.
>>> [IMO] Progressively older delta sets are each less certain, as they
>>> all get layered on top of each other in reverse order to get to
>>> "older" sets. [But in general, I consider each "older" set to be
>>> progressively less important - at least in general.]

>> I half agree here. I certainly agree that the most important data is  
>> the most current data. However, I would like to keep (at least) one  
>> years worth of backup history, and I need to know that my history is  
>> good.

>>> So, I see your problem as the following.
>>>
>>> 1) Verify that the current backup completed properly.
>>> (I do this via logs and exit codes. I don't "double" check the
>>> current backup by doing a --verify on the current backup set. I
>>> implicitly trust that RDB does it's job properly and that at the end
>>> the hashes will match properly and that the current "remote" files do
>>> equal the current "local" files. {i.e. the files that were the
>>> source of the backup equal the backup files)

>> That's very trusting of you. I guess I'm a little more paranoid since
>> my job depends on it :)

> Well, RDB creates SH1 hashes of both files, and then compares it and
> if it's different it does all the work to be sure they're the same.

> Doing another SHA1 hash compare at the end seems redundant.

> Either you trust that the RDB protocol does what it says it does, or
> you don't. If you don't, then don't use the tool. [I'm being a bit
> bombastic, but I think you get the point...]

> And doing a --verify won't get you there, since it's just "verifying"
> the file (reconstructed or not) with the SHA1 hash generated by RDB
> at the backup date/time. [If you don't trust RDB, then you shouldn't trust
> it's stored SHA1 hash or it's verify either, K? :)]

>>> 2) Verify that your older delta's are as intact as possible. That all
>>> the meta-data, deltas and current files can be merged and rolled-back
>>> to whatever desired end-point you want.
>>>
>>> (This is where I use --verify - it's not perfect because there's not
>>> a way to check every delta-set for every single file in the
>>> repository - at least not easily. [A recursive loop checking every
>>> version would do that, but as you say, it's going to be very resource
>>> expensive.])

>> Agreed. This is where I'd like to see a new feature in rdiff-backup.  
>> I'm willing to write code if I ever get time and no one else does first.

> Agreed - a deterministic, full-repository check would be excellent!

>>> 3) Verify that the data is exact from your FW800 drive to the USB
>>> drive on the mac-mini.
>>>
>>> (I wouldn't use a --verify for this. As long as the files are equal
>>> from the FW drive to the USB drive, if you can --verify on the FW  
>>> drive
>>> [source] you should be able to --verify on the USB drive too. So I'd
>>> either "trust" rsync to be sure they're equal - or do something like
>>> you are doing - checking that the FW files are exactly equal to the
>>> USB files.
>>>
>>> I'd do a verify on the fastest drive on the most powerful system.
>>> Plus you don't need to do this all the time, say once a week - over a
>>> weekend probably works. [And perhaps a full recursive loop through
>>> all the diffs would be possible. If you write a bash script to do
>>> that, I'd love to have it!])

>> The bash script would be hugely inefficient. I'd much rather spend the
>> time modifying rdiff-backup support an internal consistency check.

>> The problem with doing it once a week is that it only ever hits one of
>> the drives that is normally in secure storage. It would be a matter of
>> weeks or possibly months to make sure that all drives have been  
>> verified (e.g. each time a particular drive is in use on a Friday).

>>> To recap:
>>> ** Trust RDB does the backup properly and that source = destination
>>> without additional checks.
>>>
>>> ** --verify the backup repository on the FW drive, and as much as
>>> possible that all the older deltas and meta-data are intact and
>>> functioning properly.
>>>
>>> ** check that the FW drive does copy exactly to the off-site USB
>>> drive - but don't use --verify to accomplish this task. Just make
>>> sure that the "off-site" repository is exactly equal to the "on-site"
>>> FW drive.

>> I never do a direct compare between the two drives. I just use rsync  
>> to copy from the FW to the USB drive. Here's my concerns: without some
>> type of regularly executed integrity check of the data on the drive  
>> (FW or USB), how would I detect that a drive is failing before it is  
>> catastrophic and the bad data has propagated to all of the redundant  
>> USB drives? Will rdiff-backup and/or rsync tell me if the drive is  
>> failing when they do a backup/copy? (I don't think so) The only way  
>> know that the data is good in my setup is to run some type of  
>> consistency check on the USB drive each day after the rsync is  
>> complete. If that fails then I know I have a problem somewhere. BTW it
>> looks like yafic won't work for me now either. there seems to be a bug
>> that causes it to stop half-way through the check  :(

> IMO, the key piece is the FW drive and the main repository. There's
> nothing that in on the USB drives that isn't on the FW drives. [i.e.
> the FW drive is a superset of everything on the "off-site" usb
> drives, right?]

> If you can successfully verify the FW drive and keep it's verify time period
> (periodicity) shorter than the time to cycle through all the
> "off-site" drives, you're golden. [So, if you have a failed FW
> repository you'll know before you overwrite all the "off-site"
> drives.]

>> So back to the drawing board (or google) to find a different utility  
>> to do the integrity check.

>> Thanks a lot for your input and generously patient explanations, Greg.
>> I do value your input.

>> ~ Daniel












reply via email to

[Prev in Thread] Current Thread [Next in Thread]