[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Duplicity-talk] Incremental backup when data changes but timestamp
From: |
edgar . soldin |
Subject: |
Re: [Duplicity-talk] Incremental backup when data changes but timestamp does not |
Date: |
Sun, 14 May 2023 16:55:49 +0200 |
User-agent: |
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 |
On 14.05.2023 07:31, Nate Eldredge via Duplicity-talk wrote:
Returning to a thread from many years ago
(https://lists.gnu.org/archive/html/duplicity-talk/2013-07/msg00015.html), I am
looking for a way to do an incremental backup involving files whose data has
changed but the timestamp, permissions and size stayed the same.
hi Nate :)
this sounds like a corner case. so firstly i'd really like to see examples of
those files. could you provide those? maybe restore them from these backups of
yours?
This actually comes up in a real-life situation, not via some sort of
deliberate timestamp abuse. They're files from the same package in two
different versions of Ubuntu. I assume the packages were built simultaneously
from the same source, but using different compiler versions, and the files for
each one happened to be created within the same second. So if you upgrade from
one package to the other, the new version of the file is different, but has the
same mtime and permissions and possibly even the same size. Then `duplicity
incremental` doesn't notice the change, and your backup stays with the old
version.
rsync provides a `--checksum` parameter for that. but that of course is
io-heavy as the file would have to be read in full to decide if there are
changes. not sure if we already keep per-file-checksums in the meta-data.
At one time I worked around this by hacking in a command-line option which causes
ROPath.__eq__ (https://gitlab.com/duplicity/duplicity/-/blob/main/duplicity/path.py#L331)
to always return 0. Then every file is treated as "changed", and so the
changes in question are picked up. For those files that haven't actually changed, the
rdiff is trivial, and so the only practical impact is that the backup takes a long time
and you get a big new-signatures file, which I can live with. For me it usually only
happens when I upgrade OS versions, so I would run an incremental with this option at
those times. (Or, I would bite the storage-space bullet and run a full backup even if I
didn't otherwise need one.)
enforcing to treat every file as changed sound more reasonable compared to
`--checksum`. it will read the file once too, but in this run will come up with
the changes already. all the code would need to do is verfify, if there were
changes and skip adding the result to a volume if there were none, not sure how
intelligent the code is already in this regard.
It'd be nice to have something more efficient and robust, though. One thought
would be to check whether the ctime is newer than the date of the previous
backup.
i wonder why rsync does not use ctime by default though. there may be a reason
for that. fs-standard of course mandates mod-time changes only when the file is
changed. c-time is supposed to be fixed.
We could also check the birth time on filesystems that support it. We would
get false positives in cases like replacing a disk and `cp -a`ing over all the
files (which normally would preserve mtime but not ctime), but it could still
be useful as an option.
that sounds fishy. i don't see how a containing filesystem change should
trigger a recompare by default.
I'm curious if anyone has other suggestions, or tips on how / where to
implement them.
i'm still curious which files you come up with that equal in
- file name
- size
- mod time
but have a different content. not saying they do not exist, just saying it is a
very rare phenomenon.
in summary, easiest way around would be a forced check, similarly as you hacked
it. not sure how much effort it'd be to implement though.
sunny regards.. ede/duply.net