duplicity-talk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Duplicity-talk] Re: [rdiff-backup-users] Tar replacement - format propo


From: John Goerzen
Subject: [Duplicity-talk] Re: [rdiff-backup-users] Tar replacement - format proposal
Date: Fri, 26 Sep 2003 08:16:42 -0500
User-agent: Mutt/1.4i

On Thu, Sep 25, 2003 at 10:14:13PM -0700, Ben Escoto wrote:
> Based on some useful suggestions from the rdiff-backup mailing list,
> I've updated the page at
> 
> http://www.nongnu.org/duplicity/new_format.html
> 
> to sketch out a possible format.  Again comments and suggestions are
> greatly appreciated.  It would be bad if we start discussing details
> (or worse, start implementing!) when the whole scheme is flawed in
> some way, so let me know what you think.

Took a look.  Some of the good things about it are indexing and compression
on the inside (similar to ZIP).  You should also look into incremental
backup support and multi-volumen archives ala tar (so that this format could
be used for backupst.) If memory serves, GNU tar does incremental backups by
maintaining an index file on disk and noting in the generated archive when a
file that was in the list no longer exists, so a restore can properly delete
that file.  You could perhaps extend it to store rdiffs from a previous
version in some fashion.

A couple of things you may be overlooking:
  1. inode numbers (or rather, something like them).  Needed to record hard
     links.
  2. An ability to record sparse files.
  3. ZIP has a format similar to yours and yet supports writing to an
     existing archive without too much difficulty.  One thing you may
     consider, at least for simple appends, is that the end-of-file index
     could contain a pointer telling a reading program to "chain" to the
     previous index after reading this one.

One problem I see is XML.  Yes, it is versatile, but it is also overkill for
this and it is complex.  An archive should be as simple as possible, and it
should be able to be restored with as few tools as possible.  XML is not
simple, and using it will generally require a libxml on the system.  This
can right away put your format out of the running for things like
installers, critical system backups, and anything else that is extremely
space-conscious for program footprint and required shared libraries.

Moreover, you don't need XML for what you are trying to achieve.  All you
need is something "more versatile than tar".  It shouldn't be hard to arrive
at a key/value system.  For instance, for files, you could have:

  NAME\0/foo/bar/baz\0
  MTIME\0514341312\0
  CTIME\03413413214\0

Of course, you could write the MTIME and CTIME in binary, and you could
abbreviate those names to "N", "M", and "C" to save time.  What's more, this
format is quite extensible, almost to the same degree as XML, and you save
space in the archive and space in memory and library requirements, not to
mention ease processing.

Null-terminated strings are easy for anyone to parse without having to load
a separate library.  (You could also use Pascal-style "leading length byte"
strings, which are also easy to parse.)

-- John




reply via email to

[Prev in Thread] Current Thread [Next in Thread]