duplicity-talk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Duplicity-talk] Incremental backup when data changes but timestamp


From: edgar . soldin
Subject: Re: [Duplicity-talk] Incremental backup when data changes but timestamp does not
Date: Sun, 14 May 2023 16:55:49 +0200
User-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0

On 14.05.2023 07:31, Nate Eldredge via Duplicity-talk wrote:
Returning to a thread from many years ago 
(https://lists.gnu.org/archive/html/duplicity-talk/2013-07/msg00015.html), I am 
looking for a way to do an incremental backup involving files whose data has 
changed but the timestamp, permissions and size stayed the same.

hi Nate :)

this sounds like a corner case. so firstly i'd really like to see examples of 
those files. could you provide those? maybe restore them from these backups of 
yours?

This actually comes up in a real-life situation, not via some sort of 
deliberate timestamp abuse.  They're files from the same package in two 
different versions of Ubuntu.  I assume the packages were built simultaneously 
from the same source, but using different compiler versions, and the files for 
each one happened to be created within the same second.  So if you upgrade from 
one package to the other, the new version of the file is different, but has the 
same mtime and permissions and possibly even the same size.  Then `duplicity 
incremental` doesn't notice the change, and your backup stays with the old 
version.

rsync provides a `--checksum` parameter for that. but that of course is 
io-heavy as the file would have to be read in full to decide if there are 
changes. not sure if we already keep per-file-checksums in the meta-data.

At one time I worked around this by hacking in a command-line option which causes 
ROPath.__eq__ (https://gitlab.com/duplicity/duplicity/-/blob/main/duplicity/path.py#L331) 
to always return 0.  Then every file is treated as "changed", and so the 
changes in question are picked up.  For those files that haven't actually changed, the 
rdiff is trivial, and so the only practical impact is that the backup takes a long time 
and you get a big new-signatures file, which I can live with.  For me it usually only 
happens when I upgrade OS versions, so I would run an incremental with this option at 
those times. (Or, I would bite the storage-space bullet and run a full backup even if I 
didn't otherwise need one.)

enforcing to treat every file as changed sound more reasonable compared to 
`--checksum`. it will read the file once too, but in this run will come up with 
the changes already. all the code would need to do is verfify, if there were 
changes and skip adding the result to a volume if there were none, not sure how 
intelligent the code is already in this regard.

It'd be nice to have something more efficient and robust, though.  One thought 
would be to check whether the ctime is newer than the date of the previous 
backup.

i wonder why rsync does not use ctime by default though. there may be a reason 
for that. fs-standard of course mandates mod-time changes only when the file is 
changed. c-time is supposed to be fixed.

We could also check the birth time on filesystems that support it.  We would 
get false positives in cases like replacing a disk and `cp -a`ing over all the 
files (which normally would preserve mtime but not ctime), but it could still 
be useful as an option.

that sounds fishy. i don't see how a containing filesystem change should 
trigger a recompare by default.


I'm curious if anyone has other suggestions, or tips on how / where to 
implement them.

i'm still curious which files you come up with that equal in
- file name
- size
- mod time
but have a different content. not saying they do not exist, just saying it is a 
very rare phenomenon.

in summary, easiest way around would be a forced check, similarly as you hacked 
it. not sure how much effort it'd be to implement though.

sunny regards.. ede/duply.net




reply via email to

[Prev in Thread] Current Thread [Next in Thread]