Re: [Duplicity-talk] Multilevel backup

duplicity-talk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Duplicity-talk] Multilevel backup

From:	edgar . soldin
Subject:	Re: [Duplicity-talk] Multilevel backup
Date:	Sat, 26 Feb 2022 11:33:09 +0100
User-agent:	Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.6.1

hey Peter,

thanks for the detailed explanation! answers inline below

On 26.02.2022 02:00, zga9uhnq4g--- via Duplicity-talk wrote:

I'm not an expert on rdiff-backup either, but let me try to describe my understanding of what it does.  
While rdiff-backup can operate in a bandwidth efficient manner over a pipe (e.g. an ssh tunnel) to backup 
to or from a remote location, let me use the simplest case, described here 
<https://rdiff-backup.net/docs/examples.html#backup>, of backing up one local directory 
"foo" to another local directory "bar".
After running "rdiff-backup foo bar" at time T0, bar will end up a copy of the contents of foo at time T0, but will also contain the extra directory "rdiff-backup-data" for storing metadata. Next after making some changes in the directory foo, and running the same command at time T1, bar will now end up a copy of the contents of foo at time T1, but the rdiff-backup-data directory will contain "reverse diffs" to recreate the contents of foo at time T0 from the contents of foo at time T1. If we continue making changes in the directory foo, and run the same rdiff-backup command at times T2 and T3, we end up with bar being a copy of the contents of foo at time T3 and the rdiff-backup-data directory containing 3 sets of reverse diffs, one to recreate the contents of foo at time T2 from the contents of foo at time T3, one to recreate the contents of foo at time T1 from the contents of foo at time T2, and one to recreate the contents of foo at time T0 from the contents of foo attime T1.
As you can see, the bar directory is always a mirror of the foo directory at the time of the 
latest backup.  That means that for this local backup we can use the "cp" command to 
restore from the latest snapshot, as described here 
<https://rdiff-backup.net/docs/examples.html#restore>. Of course we can also use the 
rdiff-backup command for restoring to (and we have to do that to restore to or from a remote 
location or for older snapshots), but it can be as efficient as cp when restoring content from 
the latest snapshot).
When, sometime after T3, we ask rdiff-backup to restore some content as it was at time T0 
by specifying "--restore-as-of T0", rdiff-backup will start with the latest 
(T3) snapshot, convert it to the T2 snapshot using the reverse diffs from T3 to T2, then 
convert that to the T1 snapshot using the reverse diffs from T2 to T1, then convert that 
to the T0 snapshot using the reverse diffs from T1 to T0.

The benefits I see to the rdiff-backup approach are

 1. It combines the best features  of  a  mirror and  an incremental backup, 
and it supports unlimited incremental backups without the need for space 
consuming regular full backups.


still, with every incremental there is a chance of corruption. did you ever try 
to corrupt some bits and restore a point before the corruption? the danger is 
the long chain regardless on which end the full is located.

 2. Manipulating the latest snapshot is the most efficient, while manipulating 
older snapshots gets more costly the farther back in time the snapshot is from


handy indeed

 3. (most interesting for this thread) manipulating the contents of a snapshot 
at time T requires access to the snapshots newer than time T, but never 
requires access to snapshots older than time T.  This means that I can always 
delete the oldest snapshots without affecting newer snapshots because old 
snapshots depend on newer snapshots, but new snapshots never depend on older 
snapshots (which seems to be the opposite of duplicity).


well no, to backup you need full access the one previous backup . that's where 
duplicity uses the signatures, which are much smaller than the original data, 
to make it more bandwidth efficient.

 4. It is bandwidth efficient because, like rsync, only differences are 
transmitted.


not correct either. at least not in terms of duplicity. duplicity deals with 
dumb storage backends. all it expects are methods to put,get,remove,list . if 
duplicity is supposed to write a new full copy to the backend it will need to 
do it in full. there is no way to just patch/change files on the backend.

The only downside of rdiff-backup (for me), which is also the big benefit of 
duplicity, is that it doesn't support encryption, which (for me) is a 
requirement for off-site backups to cloud storage.


an absolute necessity agreed. seeing users running duplicity unencrypted to 
cloud services always makes me wonder why they bother to set it up at all.

I make no guarantees that my description of what rdiff-backup does is accurate 
(I haven't read the code), but hopefully this helps.


it did. thanks again! ..ede/duply.net

[Prev in Thread]

Current Thread

[Next in Thread]

[Duplicity-talk] Multilevel backup, Håkan T Johansson, 2022/02/21
- Re: [Duplicity-talk] Multilevel backup, edgar . soldin, 2022/02/22
  - Re: [Duplicity-talk] Multilevel backup, zga9uhnq4g, 2022/02/23
    - Re: [Duplicity-talk] Multilevel backup, edgar . soldin, 2022/02/23
    - Re: [Duplicity-talk] Multilevel backup, zga9uhnq4g, 2022/02/25
    - Re: [Duplicity-talk] Multilevel backup, edgar . soldin <=

Prev by Date: Re: [Duplicity-talk] Multilevel backup
Next by Date: Re: [Duplicity-talk] How to backup stuff>5GB
Previous by thread: Re: [Duplicity-talk] Multilevel backup
Next by thread: [Duplicity-talk] duplicity man page on gitlab site
Index(es):
- Date
- Thread