rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Q. on max-file-size behavior


From: Maarten Bezemer
Subject: Re: [rdiff-backup-users] Q. on max-file-size behavior
Date: Sun, 14 Mar 2010 15:31:13 +0100 (CET)


On Sat, 13 Mar 2010, Whit Blauvelt wrote:

On Sat, Mar 13, 2010 at 11:58:42PM +0100, Jernej Simonÿÿiÿÿ wrote:

I'd say this is expected behaviour - the destination saw the file on
previous run, but didn't see it on current run (because the source
likely doesn't inform it about files it skips), so it treats the file
as deleted on source.

Probably so. A corner case then. Even though it would be easy for the source
to inform it about files skipped and avoid this, it's probably not worth the
coding effort.

I don't think this is even a corner case. If you want to exclude large files, then a file that is larger than the limit you specify (something you explicitly and deliberatly do!) should not be in the backup. Also, it should not _remain_ in the 'current' backup tree, because it would no longer match the original in the source tree. Since rdiff-backup keeps history of the backups, there is no other way than to treat it as 'deleted from the source'. That's the only way to keep the history intact AND have a proper 'current' backup tree.


Another question comes up though. If gzip'ing a huge file can cause a
resonably fast machine to tie up considerable resources for > 30 minutes
because it's logic tells it it's time to gzip a 16g file, it would be good
if there's a way to ask it not to do that.

Why would it?
If you want to remove a file from the backup (including the history), feel free to add wishlist-items for patches or external tools to accomplish that. Aside from that, you could also run rdiff-backup with nice and/or ionice so it wouldn't "tie up" resources. (BTW, spending 30 minutes on a 16GB file, I don't think that would be so strange. Even md5sum-ing a 4.7GB iso image can take a few minutes on a busy system with lots of disk i/o.)

I see that compression can be
turned off for all files, but not how to turn compression off just for the
largest files. Is there some trick that would accomplish that? Basically,
compression on smaller files is always good; compression on the very largest
files almost always bad; and somewhere in between - depending on system
resources - it gets iffy. It would be useful to have a flag to set a
file-size threshold where only files below that would compress.

These are quite strong claims without any proof or supporting theory.
Compressing a 7KB file might indeed make it considerably smaller, suppose it would be 4.1K when zipped. But on file systems with 4KB blocks, that would not even save 1 block. And filesystems supporting multiple 16GB files tend to have larger block sizes... Larger files on the other hand can often be compressed with much larger space-savings. As always, it all depends on the type of data in the files, so YMMV.

Contrary to what you suggest, I could think of two wishlist-items that would make more sense. And I'm not even posting them as wishlist-items as I don't think they would be worth implementing.
1) limit the (cpu) time spent on compressing a file, and leave the file
   uncompressed when it takes too long. Heck, maybe even make it a
   user-configurable duration.
2) if compressing is taking longer than X seconds/minutes, check if
   compression is doing any good (check compression ratio for the part of
   the file that has already been processed) and leave the file
   uncompressed when the ratio suggests it wouldn't be worth continuing
   the compression process.

Both of these would not help me with the disk image files I have here. Those tend to have large space-savings at the end of the file. But then again, I wouldn't use rdiff-backup on them anyway.


Just my 2 cents.

Maarten

reply via email to

[Prev in Thread] Current Thread [Next in Thread]