duplicity-talk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Duplicity-talk] Speeding up duplicity


From: edgar . soldin
Subject: Re: [Duplicity-talk] Speeding up duplicity
Date: Mon, 11 Feb 2013 14:53:43 +0100
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130107 Thunderbird/17.0.2

On 07.02.2013 15:30, Yuri D'Elia wrote:
> Hi everyone.
> 
> I was trying to use duplicity to perform a full system backup of our system 
> here (~7TB of data to be backed up) on our EMC ATMOS "local cloud" storage.
> 
> The first problem I encountered if that duplicity performs poorly on large 
> files (>2gb), due to the small block size.
> 
> I updated this bug report:
> 
>   https://bugs.launchpad.net/duplicity/+bug/897423

looks healthy to me

> 
> but I wanted to know if you see anything against this approach beyond 
> increased diff size (that is, is the backup compatible with deltas that have 
> a different block size, etc?), and if there's anything I can do to get this 
> patch integrated.

not sure about different block sizes. Ken or Mike maybe can comment on that.

posting here is the second best approach to get changes merged. register with 
launchpad and adding a branch there is the best way.
 
> With this patch I can now increase the block size and run duplicity on files 
> up to ~50g (tested so far) without peaking CPU usage.

very good

> Second problem is that local metadata is still quite big. I end-up with 
> nearly 80GB of metadata (~1% of original), which is ok, but I would like to 
> reduce to less than 1GB (if possible) for practicality. In this case I was 
> also thinking about introducing a --min-blocksize option too.

makes sense

> Network bandwidth though is still not optimally utilized. I'm using the 
> --ansync-upload option, but only one upload at a time is performed, which 
> limits the upload speed to one node of the ATMOS storage.
 
true

> Since we're talking about weeks of backup still, I really need to be able to 
> create concurrent upload requests, so that an upload request get handled by 
> potentially different nodes on the storage. Currently it takes ~10 days to 
> perform a full backup, which is unacceptable as I would like to compute daily 
> deltas and at least perform a full backup once a month.

the current workaround is to backup to file:// and to upload later. it finishes 
faster and you can optimize the upload with the software you like. provided you 
have the spare local space of course.

> I would like pointers from developers here, if you think that this approach 
> is feasible with the current code (I was perusing the async scheduler before, 
> but thought to ask before going forward). I would simply like to allow the 
> produce to create volumes up to the requested number (say, --async-requests 
> 5), so that upload can happen concurrently.

sorry.. cannot help you there

> Another issue I have is that duplicity is not threaded, and thus sometimes 
> stalls on CPU even when I have >32cpus available. I can read at ~200MB/s from 
> my local storage, but not with duplicity due to cpu contention. I would like 
> to separate the compression/encryption stage here (which is performed by gpg 
> currently) in a simple pipeline instead, so that I can arbitrarily choose the 
> compressor and get better system CPU usage though separate processes. Again, 
> this sounds simple enough, but would such a patch being accepted? Any major 
> problems?

this would make the volumes backward incompatible, so it would have to 
accompanied by a major version change. maybe creating volumes in parallel, 
keeping gpg workflow as it is could be another option.

> 
> I will be also releasing the ATMOS backend, which uses the atmos-python 
> (although modified) code from http://code.google.com/p/atmos-python/ as soon 
> as is battle-tested enough. If anybody is interested in testing, please don't 
> hesitate to ask.
> 
> Also, any pointer in using duplicity for large-scale backup would be 
> appreciated. I was using it for some small systems, but since we had the 
> ability to use the EMC ATMOS storage from another facility, it suddenly 
> became quite useful for this larger task.

duplicity didn't keep up with the explosion of storage space, so - there are 
lot's of issues when trying to backup tenth of gigabytes or more (e.g. manifest 
files are currently not split by volume size). with the help of the community 
(people like you) this will hopefully be squashed bit by bit over time.

..ede/duply.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]