duplicity-talk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Duplicity-talk] What is the process for creating signature files ?


From: Cyril Russo
Subject: Re: [Duplicity-talk] What is the process for creating signature files ? [DOC-2]
Date: Wed, 11 Mar 2009 11:39:55 +0100
User-agent: Thunderbird 2.0.0.19 (Windows/20081209)

Kenneth Loafman a écrit :
Cyril Russo wrote:
  
Cyril Russo a écrit :
    
Hi,

 If you have a bit of time, can you explain in few lines how (and
where in the code) the signature files are created ?

I'm trying to split the signatures to a specified volume size, but I
don't want to break anything, and a grep on the code with "signature"
is very verbose.
Sincerely,
Cyril

      
*Organization of a backup archive (TAR format)*
    The backup archive are (currently) using the well know Gnu's TAR
format.
    When the files are scanned on the filesystem for backing up (using
Rsync algorihtm for computing the smallest difference distance), they
are cut in smaller part or blocks,
    that are then saved in the backup archive. The current processing on
the file (encrypting / diffing / comparing) will be explained in better
detailed in the next part.
    The block to be stored are either coming from file (in that case we
name them /fileblock/) or from signature (in that case, we name them
/sigblock/)

The current work of reading the block from an existign tar archive is
done by the file diffdir.py
This files declares the following objects:

/DirSig/  (used in rdiffdir)
A simple class used to iterate the sigblock.

/DirFull, DirFull_WriteSig/ (used in rdiffdir and duplicity main)
A simple class to store the files' content in tar blocks
Because it's easier to have common code used everywhere, the process
compute the difference from the files found, and a virtual empty file
(producing a difference equal to the file itself). A similar process is
used when the files already exists, the virtual empty file becomes the
previous version's file.
The WriteSig version also compute the signature and write it to the
given output file pointer

/DirDelta/ (used in rdiffdir and duplicity main, it's the default
implemation of DirFull)
This is the actual code computing the difference between the given
path's files and the given reference (either nothing, or a previous
backup archive).
The process compute both the file's content difference, and the file's
information difference (has a file been added, deleted, unmodified or
modified ?).
The file's content goes to the backup archive, while the file's
information goes to the signatures.

/FileWithReadCounter /_(private)_
The name says it all. It keeps track of the amount read.

FileWithSignature (private)
A read only file class that computes the signature (from rsync
algorithm) while it's being read.
The computed signature for each block produce a simple code (depending
on the block state: added, modified, deleted etc...)

/TarBlock/ (private)

/TarBlockIter/ (abstract, private)
This class use a given (file) iterator on input, and matching the
matching tar'ed block of the given size while iterating.
The behviour depend on the following child classes:
/DummyTarBlockIter /
    Doesn't read the file, but instead count the files passed in.
/SigTarBlockIter/
    This one returns the tar block from a signature's archive file
/DeltaTarBlockIter/
    This one returns the tar block for the files archive.


That's all for this email, again, please spot the errors.
This one doesn't explain anything about splitting the signature files,
but, I hope, makes the understanding of the backup process clearer.
I'll continue with explaining the backup algorithm in the next email (if
I understand it correclty).

For now and what I've understood, we could hack the Collection stuff to
actually parse file with both "signature.gpg" and "sig000.gpg" as a
valid signature files, and in the later case, start returning the
signature archive collection. I still haven't found how to split the
signatures during creation, but I hope it'll appear in the next email.
    

Cyril,

Thanks for all the docs you're writing.  This has been sorely needed.

I'm starting the design of Checkpoint/Restart and we may need to
collaborate with you more on this.  It appears that if we can cleanly
synchronize the creation of difftar and sigtar files in parallel, then
Checkpoint is merely the last full volume of each.  A crash during the
creation of a volume means that Restart would clean that up, and proceed
from that point and complete.

A subgoal is that a crash during the Nth volume would leave a fully
restorable, and restartable, set of N-1 volumes.  That may mean I'll
have to address the manifest file as well.

Thoughts and suggestions from anyone are welcome.

  
Hi,

I was just thinking about this, and in fact, it would even be better if all tar-block like output were serialized (à-la Java)

difftar & sigtar & partar would be created (in parallel or not), but as soon as one of them reach a target limit (volume size for one, but we could set up a time limit too, to let duplicity works 24/7 on the background, in the future maybe, and create archive every hours), then all of them are serialized to the backend.
This means that only one will be of the volume size limit (but the others might not) if the limit was the volume size.

Upon failure, we can restart from where we left by opening the last, more up-to-date signature file, re-iterate the path, and continue on the first different item (either a new file not in signature, or the first modified file/dir).

This would be seen like an iterative step by the code.

Couldn't we even merge all of those in a single random access, tar-like file, so it's even easier ?

This would break current backup code, but it's not that bad since current code still have to full backup from time to time, so the next full backup will catch up with the new format.

I'm preparing the DOC-3 part with the archive description if I understand it.
Cheers,
Cyril


reply via email to

[Prev in Thread] Current Thread [Next in Thread]