rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Tar replacement - format proposal


From: Ben Escoto
Subject: Re: [rdiff-backup-users] Tar replacement - format proposal
Date: Fri, 26 Sep 2003 18:10:35 -0700

On 26 Sep 2003 10:33:01 +0100 Kevin Spicer <address@hidden> wrote:
> Interesting ideas.  You seem very focused on backup to disk, have you
> considered what happens if someone wants to backup to tape.  Since the
> index is at the end of the file they would need to read the entire file
> to get the index, then rewind and read (possibily) the entire file again
> to extract what they want.

I think this got cleared up later in your discussion with John
Goerzen.  The entire file wouldn't need to be read to find the index.
I don't have much personal experience with tapes (and none with
industrial tape usage), but from reading the thread I get the
impression that the format would be tape-writing friendly, in that it
could be written to tape in a single pass, and not that tape
unfriendly when it comes to reading.

Tar will be faster for restoring a whole archive from tape since no
winding will be required, but this format might be quicker to restore,
even from tape, a part of an archive, because the tape can wind there
directly instead of needing to wind/read pass every file as tar
typically requires.

> For example this would allow the bulky archives to be stored in
> offline storage and the smaller indexes to be kept on disk
> - which would allow files to be located before retrieving the
> appropriate tape from storage.

Yep, if a tape archiver using this format were ever made, this could
be a good idea.  But the issue of whether the index can also be stored
separately seems independent of deciding on the file format.

> Will you be supporting other compressions scheme (like bzip2) &
> alternative encryption algorithms? What about signed byt unencrypted
> archives (for those who are only worried about making sure the
> backup has not been changed/corrupted?

The compression/encryption method could be specified in the archive
header.  For signatures, perhaps there needs to be an archive footer
also, which gets written after the block index?

> Will individual blocks be signed, or just the full archive - this
> could be important since it may permit undamaged portions of a
> damaged archive to be restored, on the other hand this would add to
> size.
    ...
>   * Should a file become truncated (maybe out of space on device or
> whatever) undamaged blocks could still be recovered.

Yes, I would like more input on this.  What kind of error correction
should be included?  Each block could have a CRC or something for
instance.  Tar doesn't seem to do any of that though, so it seems an
error (like an overrun) in a tar volume could mess up the entire rest
of the archive.  Does this happen in practice?

> In fact I think the index should require no information that is
> absolutely necessary to restore the whole file (although obviously
> its useful in selecting individual files) because...
>   * It would then be possible to read/write an archive from/to a stream
> (like tar, gzip, bzip2 do).

Right now the file cannot be unpacked from a stream, although it can
be written in a stream.  To get both, the metadata would have to be
interspersed with the file contents.  This would make searching the
archive take longer, and also compression wouldn't be as effective
because the metadata wouldn't be altogether.

But it would be an option.  It would be nice to hear whether it would
be worth it.

> Depending on which stage you implement compression at you may like to
> think about having a customisable block size

Well the block index includes both inner and outer offsets, so the
block size can vary on a block vs block level.

> Final, off the wall thought, recent WIndows filesystems (just NTFS?)
> have the capability of having multiple streams associated with a simgle
> filename (although this isn't being used by anybody very much yet AFAIK)
> I'm not sure how you would go about handling these, but if you didn't
> already know about them they are there.  Just thought I'd mention it (in
> case its something that needs addressing in the file format, rather than
> just the implementation).

Yep, MacOS has resource forks, and Will Dyson mentions that extended
attributes may be too long to place in the index.  Nothing in current
format prohibits multiple "file contents" from being associated with
one "file index entry".


So, to summarize unanswered questions from above:
1)  Allow for archive footer?  If so, at outer level or inner?
2)  What kind of error correction should be included?
3)  How important is it to be able to do stream unpacking?


-- 
Ben Escoto

Attachment: pgpA7OO1voaZF.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]