[Duplicity-talk] Speeding up duplicity

duplicity-talk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Duplicity-talk] Speeding up duplicity

From:	Yuri D'Elia
Subject:	[Duplicity-talk] Speeding up duplicity
Date:	Thu, 07 Feb 2013 15:30:37 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130116 Icedove/10.0.12

Hi everyone.

I was trying to use duplicity to perform a full system backup of oursystem here (~7TB of data to be backed up) on our EMC ATMOS "localcloud" storage.

The first problem I encountered if that duplicity performs poorly onlarge files (>2gb), due to the small block size.


I updated this bug report:

  https://bugs.launchpad.net/duplicity/+bug/897423

but I wanted to know if you see anything against this approach beyondincreased diff size (that is, is the backup compatible with deltas thathave a different block size, etc?), and if there's anything I can do toget this patch integrated.

With this patch I can now increase the block size and run duplicity onfiles up to ~50g (tested so far) without peaking CPU usage.

Second problem is that local metadata is still quite big. I end-up withnearly 80GB of metadata (~1% of original), which is ok, but I would liketo reduce to less than 1GB (if possible) for practicality. In this caseI was also thinking about introducing a --min-blocksize option too.

Network bandwidth though is still not optimally utilized. I'm using the--ansync-upload option, but only one upload at a time is performed,which limits the upload speed to one node of the ATMOS storage.

Since we're talking about weeks of backup still, I really need to beable to create concurrent upload requests, so that an upload request gethandled by potentially different nodes on the storage. Currently ittakes ~10 days to perform a full backup, which is unacceptable as Iwould like to compute daily deltas and at least perform a full backuponce a month.

I would like pointers from developers here, if you think that thisapproach is feasible with the current code (I was perusing the asyncscheduler before, but thought to ask before going forward). I wouldsimply like to allow the produce to create volumes up to the requestednumber (say, --async-requests 5), so that upload can happen concurrently.

Another issue I have is that duplicity is not threaded, and thussometimes stalls on CPU even when I have >32cpus available. I can readat ~200MB/s from my local storage, but not with duplicity due to cpucontention. I would like to separate the compression/encryption stagehere (which is performed by gpg currently) in a simple pipeline instead,so that I can arbitrarily choose the compressor and get better systemCPU usage though separate processes. Again, this sounds simple enough,but would such a patch being accepted? Any major problems?

I will be also releasing the ATMOS backend, which uses the atmos-python(although modified) code from http://code.google.com/p/atmos-python/ assoon as is battle-tested enough. If anybody is interested in testing,please don't hesitate to ask.

Also, any pointer in using duplicity for large-scale backup would beappreciated. I was using it for some small systems, but since we had theability to use the EMC ATMOS storage from another facility, it suddenlybecame quite useful for this larger task.


Bests.

[Prev in Thread]

Current Thread

[Next in Thread]

[Duplicity-talk] Speeding up duplicity, Yuri D'Elia <=
- Re: [Duplicity-talk] Speeding up duplicity, edgar . soldin, 2013/02/11

Prev by Date: Re: [Duplicity-talk] [PATCH 0/2] consume a chain of sigtar files in rdiffdir delta mode
Next by Date: [Duplicity-talk] BackendException: ssh connection to ... failed: EOF when reading a line
Previous by thread: Re: [Duplicity-talk] [PATCH 0/2] consume a chain of sigtar files in rdiffdir delta mode
Next by thread: Re: [Duplicity-talk] Speeding up duplicity
Index(es):
- Date
- Thread