rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Tar replacement - format proposal


From: Kevin Spicer
Subject: Re: [rdiff-backup-users] Tar replacement - format proposal
Date: 26 Sep 2003 10:33:01 +0100

On Fri, 2003-09-26 at 06:14, Ben Escoto wrote:
> Based on some useful suggestions from the rdiff-backup mailing list,
> I've updated the page at
> 
> http://www.nongnu.org/duplicity/new_format.html
> 
> to sketch out a possible format.  Again comments and suggestions are
> greatly appreciated.  

Interesting ideas.  You seem very focused on backup to disk, have you
considered what happens if someone wants to backup to tape.  Since the
index is at the end of the file they would need to read the entire file
to get the index, then rewind and read (possibily) the entire file again
to extract what they want.  I appeciate your reasons for not putting the
index at the start, however this could be a serious issue for some.  One
possible solution (in the implementation) is to allow the user to
specify that the archive index should be in a second file, so that you
can seek past the first (file) archive to read the index then rewind and
read the file.  Thinking on this, the first file should contain the
index at the end as normal (so that every duplicity archive is self
contained - so you don't end up in a position where you have the archive
but not the index), then the option allows a copy of the index to be
stored as well.  You could even allow the option of making copies of the
index to alternative locations.  For example this would allow the bulky
archives to be stored in offline storage and the smaller indexes to be
kept on disk - which would allow files to be located before retrieving
the appropriate tape from storage.

There would be some benefit in including in the header information to
indicate whether a full archive or index and some unique identifier so
that indexes and archives can be related.

You talk about whether header entries can be a fixed size or whether you
should use xml. I think it would be a good idea to use xml whenever you
can, to permit extensibility.  You're finding limitations of tar now
because of decisions just like that.

Some other, random unstructured thoughts... [disclaimer - I've not
actually used duplicity, although I have read the docs I may have
missed/ misunderstood some of its existing features]
Presumably you'll be compressing prior to encrypting, IIRC you get
better compression ratios that way?  Will you be supporting other
compressions scheme (like bzip2) & alternative encryption algorithms? 
What about signed byt unencrypted archives (for those who are only
worried about making sure the backup has not been changed/corrupted?
Will individual blocks be signed, or just the full archive - this could
be important since it may permit undamaged portions of a damaged archive
to be restored, on the other hand this would add to size.

Still thinking on my feet, its not clear from your page (not to me
anyway) whether the metadata is stored at the block level or in the
index at the end.  I would suggest that the block level is better.  In
fact I think the index should require no information that is absolutely
necessary to restore the whole file (although obviously its useful in
selecting individual files) because...
  * Should a file become truncated (maybe out of space on device or
whatever) undamaged blocks could still be recovered.
  * It would then be possible to read/write an archive from/to a stream
(like tar, gzip, bzip2 do).

[I've just reread the page and now I think this is what you are
proposing, but I'm not 100% sure]

Depending on which stage you implement compression at you may like to
think about having a customisable block size, different settings to gzip
and bzip2 use different block sizes (bzip2 is much bigger IIRC), so if
your intention is to... 1) Build Block 2) Compress Block  3)
encrypt/sign block ... Then you might think about matching your inner
block size to that of the compression algorithm in use, to optimise the
compression you get.  I don't know about the impact of block size on
encryption, anyone care to enlighten me?  It output files may be written
to tape theres also an implication there for the block size of output
(i.e.not the inner block size).

Final, off the wall thought, recent WIndows filesystems (just NTFS?)
have the capability of having multiple streams associated with a simgle
filename (although this isn't being used by anybody very much yet AFAIK)
I'm not sure how you would go about handling these, but if you didn't
already know about them they are there.  Just thought I'd mention it (in
case its something that needs addressing in the file format, rather than
just the implementation).




BMRB International 
http://www.bmrb.co.uk
+44 (0)20 8566 5000
_________________________________________________________________
This message (and any attachment) is intended only for the 
recipient and may contain confidential and/or privileged 
material.  If you have received this in error, please contact the 
sender and delete this message immediately.  Disclosure, copying 
or other action taken in respect of this email or in 
reliance on it is prohibited.  BMRB International Limited 
accepts no liability in relation to any personal emails, or 
content of any email which does not directly relate to our 
business.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]