Colin Ryan wrote:
In thinking about the uses of duplicity, I'm always concerned about the
trade off between having to do the occasional large full backup verus
"infinite incrementals". I see the "infinite incremental" approach as
the dirty little secret of the hosted storage business where no-one
seems to acknowledge the sensitivity of this technique to the corruption
of even a single bit in a single file.
That is the main risk of almost any backup system, including duplicity.
That's the reason we recommend regular full backups, plus local and
remote copies of each.
My concern about the latter is should even one incremental in the chain
become corrupt everything from that point on (I assume) is
unrecoverable. So I was wondering if there is any technique that could
be used to "periodically" roll up the incrementals on the remote
respository side into a full to create an "new single full" which
contains all the incrementals, but that would allow duplicity to simply
continue on with incremental backups on the client end. This would
simply - for what it's worth - reduce the number of files that must be
100% intact but would allow one to always run duplicity in just
incremental mode while periodically generating a full.
Such a rollup would be possible, but it would require a lot of network
bandwidth, equivalent to restoring all of the changed files and their
increments, then writing them back to the host as a single incremental
backup, verifying, then deleting the intermediate incrementals.
As a side note has anyone put any thought to using Par2 parity files on
the tar files that duplicity generates. Yes this would increase the back
end storage but would allow for recoverability of the file provided data
corruption was 5-10-20% of the file.
Yes, par2 has been studied. Its in the plans, but down the list a bit.
Of course, finding 5-10% of a file corrupted should alert you to some
serious hardware and/or network problems.