[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Duplicity-talk] Multilevel backup
From: |
edgar . soldin |
Subject: |
Re: [Duplicity-talk] Multilevel backup |
Date: |
Sat, 26 Feb 2022 11:33:09 +0100 |
User-agent: |
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 |
hey Peter,
thanks for the detailed explanation! answers inline below
On 26.02.2022 02:00, zga9uhnq4g--- via Duplicity-talk wrote:
I'm not an expert on rdiff-backup either, but let me try to describe my understanding of what it does.
While rdiff-backup can operate in a bandwidth efficient manner over a pipe (e.g. an ssh tunnel) to backup
to or from a remote location, let me use the simplest case, described here
<https://rdiff-backup.net/docs/examples.html#backup>, of backing up one local directory
"foo" to another local directory "bar".
After running "rdiff-backup foo bar" at time T0, bar will end up a copy of the contents of foo at time T0, but will also contain the extra directory "rdiff-backup-data" for storing metadata. Next after making some changes in the directory foo, and running the same command at time T1, bar will now end up a copy of the contents of foo at time T1, but the rdiff-backup-data directory will contain "reverse diffs" to recreate the contents of foo at time T0 from the contents of foo at time T1. If we continue making changes in the directory foo, and run the same rdiff-backup command at times T2 and T3, we end up with bar being a copy of the contents of foo at time T3 and the rdiff-backup-data directory containing 3 sets of reverse diffs, one to recreate the contents of foo at time T2 from the contents of foo at time T3, one to recreate the contents of foo at time T1 from the contents of foo at time T2, and one to recreate the contents of foo at time T0 from the contents of foo at
time T1.
As you can see, the bar directory is always a mirror of the foo directory at the time of the
latest backup. That means that for this local backup we can use the "cp" command to
restore from the latest snapshot, as described here
<https://rdiff-backup.net/docs/examples.html#restore>. Of course we can also use the
rdiff-backup command for restoring to (and we have to do that to restore to or from a remote
location or for older snapshots), but it can be as efficient as cp when restoring content from
the latest snapshot).
When, sometime after T3, we ask rdiff-backup to restore some content as it was at time T0
by specifying "--restore-as-of T0", rdiff-backup will start with the latest
(T3) snapshot, convert it to the T2 snapshot using the reverse diffs from T3 to T2, then
convert that to the T1 snapshot using the reverse diffs from T2 to T1, then convert that
to the T0 snapshot using the reverse diffs from T1 to T0.
The benefits I see to the rdiff-backup approach are
1. It combines the best features of a mirror and an incremental backup,
and it supports unlimited incremental backups without the need for space
consuming regular full backups.
still, with every incremental there is a chance of corruption. did you ever try
to corrupt some bits and restore a point before the corruption? the danger is
the long chain regardless on which end the full is located.
2. Manipulating the latest snapshot is the most efficient, while manipulating
older snapshots gets more costly the farther back in time the snapshot is from
handy indeed
3. (most interesting for this thread) manipulating the contents of a snapshot
at time T requires access to the snapshots newer than time T, but never
requires access to snapshots older than time T. This means that I can always
delete the oldest snapshots without affecting newer snapshots because old
snapshots depend on newer snapshots, but new snapshots never depend on older
snapshots (which seems to be the opposite of duplicity).
well no, to backup you need full access the one previous backup . that's where
duplicity uses the signatures, which are much smaller than the original data,
to make it more bandwidth efficient.
4. It is bandwidth efficient because, like rsync, only differences are
transmitted.
not correct either. at least not in terms of duplicity. duplicity deals with
dumb storage backends. all it expects are methods to put,get,remove,list . if
duplicity is supposed to write a new full copy to the backend it will need to
do it in full. there is no way to just patch/change files on the backend.
The only downside of rdiff-backup (for me), which is also the big benefit of
duplicity, is that it doesn't support encryption, which (for me) is a
requirement for off-site backups to cloud storage.
an absolute necessity agreed. seeing users running duplicity unencrypted to
cloud services always makes me wonder why they bother to set it up at all.
I make no guarantees that my description of what rdiff-backup does is accurate
(I haven't read the code), but hopefully this helps.
it did. thanks again! ..ede/duply.net