bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Help: Why wget wall clock time much higher than download


From: David Bodin
Subject: Re: [Bug-wget] Help: Why wget wall clock time much higher than download time?
Date: Thu, 20 Jun 2019 16:36:03 -0700

Tim,

Genuine thanks for your response--and especially for your contribution of
wget2. I ran into an issue setting it up (on my aws ami) and can't find any
resources online that address the issue.

I followed the instructions you provided on how to build
<https://gitlab.com/gnuwget/wget2/blob/master/README.md> it, but after
building it and trying "*wget [url]*" , I first ran into  "*Failed to
connect: Wget has been built without TLS support*," but then found the
solution <https://github.com/rockdaboot/wget2/issues/201> and fixed it with
"*sudo yum -y install gnutls-devel*" and confirmed this by running
"./configure" and checking "SSL/TLS support:    yes", and then rebuilt it
and tried to use wget again "*wget [url]*", but then ran into:

TLS False Start requested but Wget built with insufficient GnuTLS version

WARNING: OCSP is not available in this version of GnuTLS.

ERROR: The certificate is not trusted.

ERROR: The certificate doesn't have a known issuer.

Failed to connect: Certificate error

But when I try to install/update "*gnutls,*" I'm informed:

Package gnutls-2.12.23-21.18.amzn1.x86_64 already installed and latest
version
so I'm not sure how to proceed as it shows the most up to date package.

Thanks in advance for any help you can provide.

Sincerely,
Dave

P.S.
I'm going to use wget2, but wanted to briefly follow up on my original
question with wget to hopefully learn a little more.

1.) Thanks for the note on on not using "*--random-wait*" in the future,
but it made no difference when I ran my command with or without this flag.
Even with the flag, the download completed in 35s, but with 248 file, if
the wait was only .5s (for a best case scenario), it should have taken
around 62s.

2.) If I ran my wget command with "*--no-clobber*", it would correctly
download all files the first time, and the second time I ran the same
command, it would acknowledge it has already downloaded all the files and
finish almost immediately. I tried to parallelize the downloads by running
multiple instances of the program (wget --noclobber [url] & wget
--noclobber [url]), but it didn't download multiple files at the same time.
I expected the first program to start the download of a file, and the
second program to see it and to skip to the next file that needed to be
downloaded, and for the programs to move in parallel downloading all the
files. Do you know why this behavior happened instead of what I expected?

Many thanks.


On Thu, Jun 20, 2019 at 12:55 AM Tim Rühsen <address@hidden> wrote:

> On 6/17/19 10:32 PM, David Bodin wrote:
> > *wget --page-requisites --span-hosts --convert-links --adjust-extension
> > --execute robots=off --user-agent Mozilla
> > --random-wait
> https://www.invisionapp.com/inside-design/essential-steps-designing-empathy/
> > <
> https://www.invisionapp.com/inside-design/essential-steps-designing-empathy/
> >*
> >
> > This command above provides the following stats:
> >
> > Total wall clock time: 35s
> >
> > Downloaded: 248 files, 39M in 4.2s (9.36 MB/s)
> >
> > This website takes about 5 seconds to download and display all files on a
> > hard refresh in the browser.
> >
> > Why is the wall clock time *significantly longer* than the download time
> > and is there a way to make it faster?
>
> First of all, --random-wait waits 0.5 to 1.5 seconds after each
> downloaded page. Don't use it - there have been times when web servers
> blocked fast clients, but that shouldn't be the case today.
>
> Wget uses just one connection for downloading, no compression by
> default, no http/2.
>
> You can try Wget2 which uses as many parallel connections as you like,
> uses compression by default and http/2 if possible. Depending on the
> HTTP server, Wget2 is often 10x faster then Wget just with it's default
> settings.
>
> You find the latest Wget2 tarball at
> https://gnuwget.gitlab.io/wget2/wget2-latest.tar.gz.
>
> Instructions how to build at
> https://gitlab.com/gnuwget/wget2/blob/master/README.md
>
> Regards, Tim
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]