duplicity-talk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Duplicity-talk] Tar replacement - format proposal


From: Will Dyson
Subject: Re: [Duplicity-talk] Tar replacement - format proposal
Date: Fri, 26 Sep 2003 16:35:40 -0400

On Fri, 2003-09-26 at 01:14, Ben Escoto wrote:
> Based on some useful suggestions from the rdiff-backup mailing list,
> I've updated the page at
> 
> http://www.nongnu.org/duplicity/new_format.html

In general, I think it looks good.

Some quick comments on the questions at the end. 

Filesystem-abilty:
I've written a filesystem driver for the Linux kernel (befs), although I
did not design the filesystem itself. As far as the linux VFS layer is
concerned, an inode number is simply a unique identifier for a file.
Individual filesystems may treat it as encoding some information, but
the kernel does not. The kernel simply relies on it being unique and
providing some way for the filesystem to find the file's data and
metadata, given only the inode number. Therefore, the inner offset of a
file's index entry would make a fine inode number (on systems that
support a 64bit inode number). So would the index of the file's index
entry (avoids the 64bit issue).

In light of that, adding a list of filenames and their associated inode
numbers to each directory would be very useful. In the absense of that,
the filesystem would need to do a linear scan of the file index for each
name lookup operation (although Linux fortunatly caches the results of
name lookups in the VFS layer).

Block index entries:
I can see  how extensibility for the block index entries could be nice.
However, for the specific extension idea you mention (hash of each
block), that extra information could go in the archive header. The block
index table is a datatructure that should be as simple as possible.

On the other hand, putting the block index entries in xml makes it
logical to store the offsets as strings rather than binary numbers. This
avoids the question of the endianness of the numbers, but adds bloat to
the archive.

Putting the archive header at the end:
Since the file index entries corespond so well to inodes, I think it is
natural to start viewing the archive header as the archive superblock.
Taking this view, it makes some sense to put the superblock at the end,
and merge the two offsets currently at the end into fields of the
superblock (although we'd then need the outer offset of the start of the
superblock at the very end). If we decide to make the block index
textual, then I'd consider merging that into the superblock as well.

Order of enclosed files:
I'd go with no predefined order. Let the creator of the archive decide
which case to optimise for. All read accesses of files will be going
through the inode table anyway.

Growing the archive:
As John Goerzen noted in another email, for simple appends it would work
for a new superblock could be written at the new end of the archive
which provides a link back to the old superblock. However, that does
make the read case much hairier (need to consult every superblock and
inode table to list the contents of the archive). After many appends,
the chain could grow quite long.

If we have the superblock at the end of the archive, I would suggest
instead that the old superblock simply be overwritten with new data
blocks, and a new superblock be written at the end. This could be
extended to the block index table and perhaps the inode table as well
(if the inode table is always in a separate data block). 

The linked list approach does have the advantage that it would be easy
to append to an archive written onto write-once media. This is how
multisession IS09660 filesystems work.

-- 
Will Dyson
"Back off man, I'm a scientist!" -Dr. Peter Venkman





reply via email to

[Prev in Thread] Current Thread [Next in Thread]