rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Tar replacement - format proposal


From: Kevin Spicer
Subject: Re: [rdiff-backup-users] Tar replacement - format proposal
Date: 27 Sep 2003 00:34:26 +0100

Some thoughts on the subject of inodes, steming from posts by Will Dyson
and John Goerzen, as well as Ben's webpage...

It seems to me there are two (semi-separate) issues related to inodes...

1) The original inode numbers (from the source filesystem)
2) A system of inode numbers for the new file format so that it could be
used accessed as a filesystem, given the appropriate kernel support.

Of these probably 2) is the most adventurous [although I think it should
either be designed in, or the format should be extensible enough to
allow it to be added later] and 1) is the most important.

Starting with 1)

There are a number of reasons why the original inode number may be
needed or useful...

a) For spotting hard links
b) For tracking files that change name only (there would have to be some
other means to confirm this, by checking contents, as inodes are reused
- so probably a busy filesystem could reuse an inode during a single
backup interval)
c) For spotting files which have an identical name, but probably are a
different file (i.e. the reverse of b) - this probably isn't a good idea
though, given the common practice of writing an altered copy of a file
the renaming over the original, thus changing the inode.

A couple of problems spring to mind.  Unlike some other backup systems
AFAIK Duplicity does not restrict to the contents of a single
filesystem.  Say for example we backup / (hda1) which also has /usr
(hda2) and /var (hda3) mounted we could have three files with the same
inode number on each of hda1,2,3.  The obvious solution to that is to
also encode the partition name, or some other identifier.  Its also
worth noting that the inode number can change during the life of a file
for reasons touched on above.  Dealing with hard links also presents a
few issues. Presumably it would be best to backup the file the first
time it is encountered, and then record further occurrences of the same
inode as hard links.  Given that the archive may  be accessed in a
random order these inode numbers need to be recorded somewhere (maybe in
the central index) so that the 'file' can be found when a restore of one
of the 'links' is requested.  There may also be some benefit to using an
internal inode numbering system as part of this indexing process. 
Theres lots of devils in the details here methinks.  One that springs to
mind is that what happens when someone requests a restore of one of the
hard linked files, then later on someone restores another of the links
to the same file and voila, two separate copies of the same file -
rather than one file with two entries.  This gets even more complicated
when the first restored file gets altered before the subsequent restore
of another link to that. 

Now 2) inode numbers for treating the archive as a filesystem. 

Because of the issues stated above with backing up multiple filesystems
it would be necessary to maintain an internal inode numbering system. 
There is a theoretical risk that a large backup of multiple filesystems
could exceed the number of available inodes in the archive structure. 
Also that some systems will not support similar possible numbers of
inodes.  







BMRB International 
http://www.bmrb.co.uk
+44 (0)20 8566 5000
_________________________________________________________________
This message (and any attachment) is intended only for the 
recipient and may contain confidential and/or privileged 
material.  If you have received this in error, please contact the 
sender and delete this message immediately.  Disclosure, copying 
or other action taken in respect of this email or in 
reliance on it is prohibited.  BMRB International Limited 
accepts no liability in relation to any personal emails, or 
content of any email which does not directly relate to our 
business.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]