duplicity-talk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Duplicity-talk] How to backup stuff>5GB


From: Tashrif
Subject: Re: [Duplicity-talk] How to backup stuff>5GB
Date: Sat, 26 Feb 2022 12:50:39 -0500

Thank you Edgar. I am using Python 3 so I am guessing --s3-use-multiprocessing  wouldn't be an option for me at all.

Meanwhile, upon using boto3+s3 protocol, my 1 TB data got uploaded in 1/2 the time it took for s3+http protocol. But here's the problem: if I add the sizes from AWS s3 console (screenshot below):

sizes.JPG

it is way bigger than 1 TB. What is more wired is that I see the number of objects increasing even after duply backup task finished at the backend:

--------------[ Backup Statistics ]--------------
StartTime 1645828858.98 (Fri Feb 25 17:40:58 2022)
EndTime 1645895278.36 (Sat Feb 26 12:07:58 2022)
ElapsedTime 66419.38 (18 hours 26 minutes 59.38 seconds)
SourceFiles 307805
SourceFileSize 1135560226054 (1.03 TB)
NewFiles 307805
NewFileSize 1135560226054 (1.03 TB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 307805
RawDeltaSize 1135493620993 (1.03 TB)
TotalDestinationSizeChange 1069767354158 (996 GB)
Errors 0
-------------------------------------------------


But the object number (511)+total size (1.03 TB) was accurate for boto protocol. Do you know anything about this discrepancy? Is multipart upload algorithm causing it?

Best,
Tashrif

On Sat, Feb 26, 2022 at 5:49 AM edgar.soldin--- via Duplicity-talk <duplicity-talk@nongnu.org> wrote:
Tashrif,

no "discovery" or "subconscious fixes" there. what you found out is documented on the man page (read second paragraph).
https://duplicity.gitlab.io/duplicity-web/vers8/duplicity.1.html
"
--s3-use-multiprocessing
Allow multipart volumne uploads to S3 through multiprocessing. This option requires Python 2.6 and can be used to make uploads to S3 more efficient. If enabled, files duplicity uploads to S3 will be split into chunks and uploaded in parallel. Useful if you want to saturate your bandwidth or if large files are failing during upload.

This has no effect when using the newer boto3 backend. Boto3 always attempts to multiprocessing when it is believed it will be more efficient.

See also A NOTE ON AMAZON S3 below.
"

huge signature files are still a problem with other backends, that have fixed file size limits!

regards ..ede

On 25.02.2022 23:19, Tashrif via Duplicity-talk wrote:
> Hi Ken,
>
> I want to update you about my progress--I used TARGET="boto3+s3://my_bucket" instead of s3+http:// and that seems to have fixed the upload issue. Could it be the case that you have already fixed it in s3_boto3_backend subconsciously? To reiterate, my code discovery shows multipart algorithm is used in s3_boto3 backend only.
>
> Regardless, I am running another full backup to substantiate my success.
>
> Best,
> Tashrif
>
> On Fri, Feb 25, 2022 at 3:14 PM Tashrif <tashrifbillah@gmail.com <mailto:tashrifbillah@gmail.com>> wrote:
>
>     Thank you for the reference, Ken. In the meantime, I want to hack it. Which line in duplicity performs put request? Can I put an if condition for sigtar before that line so it is not attempted to put? And then I upload that sigtar using `aws s3 cp` command?
>
>     Best,
>     Tashrif
>
>     On Fri, Feb 25, 2022 at 12:40 PM Kenneth Loafman <kenneth@loafman.com <mailto:kenneth@loafman.com>> wrote:
>
>         No, it's due to this bug: https://bugs.launchpad.net/duplicity/+bug/385495 <https://bugs.launchpad.net/duplicity/+bug/385495>
>
>         I am working on the next major revision to duplicity, 0.9.x, which will fix this and some others.  It's going slowly.
>
>         ...Thanks,
>         ...Ken
>
>
>         On Fri, Feb 25, 2022 at 10:49 AM Tashrif <tashrifbillah@gmail.com <mailto:tashrifbillah@gmail.com>> wrote:
>
>             Hi Kenneth,
>
>             No, aws did not split the file in the bucket. I have been doing quite a bit of research on it. I see the following code segment in s3_boto3_backend only:
>
>             duplicity/backends/s3_boto3_backend.py:141:        transfer_config = TransferConfig(multipart_chunksize=config.s3_multipart_chunk_size,
>             duplicity/backends/s3_boto3_backend.py:142:                                         multipart_threshold=config.s3_multipart_chunk_size)
>
>             But I have used TARGET="s3+http://my_bucket <http://my_bucket>" which should be the old boto. Do you think the latter has anything to do with this error?
>
>             Best,
>             Tashrif
>
>             On Fri, Feb 25, 2022 at 11:43 AM Kenneth Loafman <kenneth@loafman.com <mailto:kenneth@loafman.com>> wrote:
>
>                 Hi Tashrif,
>
>                 The sigtar size problem has been around forever.  For now I suggest splitting the backup into smaller portions.
>
>                 I am surprised the aws command completes properly.  Did it split the file in the bucket?
>
>                 ...Ken
>
>
>                 On Thu, Feb 24, 2022 at 10:57 PM Tashrif via Duplicity-talk <duplicity-talk@nongnu.org <mailto:duplicity-talk@nongnu.org>> wrote:
>
>                     During a backup task, duplicity created a 7.5 GB file at the very end: duplicity-full-signatures.20220222T150726Z.sigtar.gz. However, its upload fails with the following traceback:
>
>                     ```
>                        File "min3-duply/lib/python3.9/site-packages/boto/s3/key.py", line 760, in send_file
>                          self._send_file_internal(fp, headers=headers, cb=cb, num_cb=num_cb,
>                        File "min3-duply/lib/python3.9/site-packages/boto/s3/key.py", line 957, in _send_file_internal
>                          resp = self.bucket.connection.make_request(
>                        File "min3-duply/lib/python3.9/site-packages/boto/s3/connection.py", line 667, in make_request
>                          return super(S3Connection, self).make_request(
>                        File "min3-duply/lib/python3.9/site-packages/boto/connection.py", line 1077, in make_request
>                          return self._mexe(http_request, sender, override_num_retries,
>                        File "min3-duply/lib/python3.9/site-packages/boto/connection.py", line 946, in _mexe
>                          response = sender(connection, request.method, request.path,
>                        File "min3-duply/lib/python3.9/site-packages/boto/s3/key.py", line 895, in sender
>                          raise provider.storage_response_error(
>                       boto.exception.S3ResponseError: S3ResponseError: 400 Bad Request
>                     <?xml version="1.0" encoding="UTF-8"?>
>                     <Error><Code>EntityTooLarge</Code><Message>Your proposed upload exceeds the maximum allowed size</Message><ProposedSize>7422574715</ProposedSize><MaxSizeAllowed>5368709120</MaxSizeAllowed><RequestId>HJD8DQ49S18RBFWQ</RequestId><HostId>7t7enU1YX/HY7ho7qA74knGEIzerBk/hDogp=</HostId></Error>
>
>                     Attempt of move Nr. 1 failed. S3ResponseError: Bad Request
>                     ```
>
>                     Meanwhile, `aws s3 cp duplicity-full-signatures.20220222T150726Z.sigtar.gz s3://my_bucket/` succeeds gracefully. That said, how do I enable duply/duplicity to upload files larger than 5GB?
>
>                     Thank you,
>                     Tashrif
>                     _______________________________________________
>                     Duplicity-talk mailing list
>                     Duplicity-talk@nongnu.org <mailto:Duplicity-talk@nongnu.org>
>                     https://lists.nongnu.org/mailman/listinfo/duplicity-talk <https://lists.nongnu.org/mailman/listinfo/duplicity-talk>
>
>
> _______________________________________________
> Duplicity-talk mailing list
> Duplicity-talk@nongnu.org
> https://lists.nongnu.org/mailman/listinfo/duplicity-talk


_______________________________________________
Duplicity-talk mailing list
Duplicity-talk@nongnu.org
https://lists.nongnu.org/mailman/listinfo/duplicity-talk

reply via email to

[Prev in Thread] Current Thread [Next in Thread]