bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Miscellaneous thoughts & concerns


From: Jeffrey Fetterman
Subject: Re: [Bug-wget] Miscellaneous thoughts & concerns
Date: Sat, 7 Apr 2018 21:52:37 -0500

Yes! Multiplexing was indeed partially the culprit, I've changed it
to --http2-request-window=5

However the download queue (AKA 'Todo') still gets enormous. It's why I was
wanting to use non-verbose mode in the first place, screens and screens of
'Adding url:'. There should really be a limit on how many urls it adds!

Darshit, as it stands it doesn't look like --force-progress does anything
because --progress=bar forces the same non-verbose mode, and
--force-progress is meant to be something used in non-verbose mode.

However, the progress bar is still really... not useful. See here:
https://i.imgur.com/KvbGmKe.png

It's a single bar displaying a nonsense percentage, and it sounds like with
multiplexing there's supposed to be, by default, 30 transfers going
concurrently.

> Both reduce RTT by 1, but they can't be combined.

I was using TLS Resume because, well, for a 300+GB download it just seemed
to make sense, so it wouldn't have to check over 100GB of files before
getting back to where I left off.

> You use TLS Resume, but you don't explicitly need to specify a file. By
default it will use ~/.wget-session.

I figure a 300GB+ transfer should have its own session file just in case I
do something smaller between resumes that might overwrite .wget-session,
plus you've got to remember I'm on WSL and I'd rather have relevant files
kept within my normal folders rather than my WSL filesystem.

On Sat, Apr 7, 2018 at 3:04 AM, Darshit Shah <address@hidden> wrote:

> Hi Jefferey,
>
> Thanks a lot for your feedback. This is what helps us improve.
>
> * Tim Rühsen <address@hidden> [180407 00:01]:
> >
> > On 06.04.2018 23:30, Jeffrey Fetterman wrote:
> > > Thanks to the fix that Tim posted on gitlab, I've got wget2 running
> just
> > > fine in WSL. Unfortunately it means I don't have TCP Fast Open, but
> given
> > > how fast it's downloading a ton of files at once, it seems like it
> must've
> > > been only a small gain.
> > >
> TCP Fast Open will not save you a lot in your particular scenario. It
> simply
> saves one round trip when opening a new connection. So, if you're using
> Wget2
> to download a lot of files, you are probably only opening ~5 connections
> at the
> beginning and reusing them all. It depends on your RTT to the server, but
> 1 RTT
> when downloading several megabytes is already an insignificant amount if
> time.
>
> > >
> > > I've come across a few annoyances however.
> > >
> > > 1. There doesn't seem to be any way to control the size of the download
> > > queue, which I dislike because I want to download a lot of large files
> at
> > > once and I wish it'd just focus on a few at a time, rather than over a
> > > dozen.
> > The number of parallel downloads ? --max-threads=n
>
> I don't think he meant --max-threads. Given how he is using HTTP/2,
> there's a
> chance what he's seeing is HTTP Stream Multiplexing. There is also,
> `--http2-request-window` which you can try.
> >
> > > 3. Doing a TLS resume will cause a 'Failed to write 305 bytes (32:
> Broken
> > > pipe) error to be thrown', seems to be related to how certificate
> > > verification is handled upon resume, but I was worried at first that
> the
> > > WLS problems were rearing their ugly head again.
> > Likely the WSL issue is also affecting the TLS layer. TLS resume is
> > considered 'insecure',
> > thus we have it disabled by default. There still is TLS False Start
> > enabled by default.
> >
> >
> > > 3. --no-check-certificate causes significantly more errors about how
> the
> > > certificate issuer isn't trusted to be thrown (even though it's not
> > > supposed to be doing anything related to certificates).
> > Maybe a bit too verbose - these should be warnings, not errors.
>
> @Tim: I thunk with `--no-check-certificate` these should not be either
> warnings
> or errors. The user explicitly stated that they don't care about the
> validity
> of the certificate. Why add any information there at all? Maybe we keep it
> only
> in debug mode
> >
> > > 4. --force-progress doesn't seem to do anything despite being
> recognized as
> > > a valid paramater, using it in conjunction with -nv is no longer
> beneficial.
> > You likely want to use --progress=bar. --force-progress is to enable the
> > progress bar even when redirecting (e.g. to a log file).
> > @Darshit, we shoudl adjust the behavior to be the same as in Wget1.x.
>
> I think the progress bar options are sometimes a little off since we don't
> have
> tests for those and I am the only one using them.
>
> When exactly did you try to use --force-progress? I will change the
> documentation today to reflect its actual usecase. --force-progress is
> useful
> only in --quiet mode. Which, TBH, doesn't make much sense to me since
> simply
> --progress=bar will essentially put you in the same mode. AFAIR, this comes
> from trying to bring in option compatibility from Wget 1.x.
>
> @Tim: Adjusting behaviour to the same as Wget 1.x doesn't make a lot of
> sense
> for the progress bar. In Wget 1.x, the default mode is: progress bar +
> verbose.
> Whereas, in Wget2, progress-bar will effectively enable the non-verbose
> mode
> where only warnings and errors are printed. I am noting this down for now.
> When
> I have a little time, I will think about all the progress and verbosity
> options
> in Wget 1.x and make sure that they do something similar in Wget2. Though,
> they
> won't have the exact same behaviour.
> >
> > > 5. The documentation is unclear as to how to disable things that are
> > > enabled by default. Am I to assume that --robots=off is equivalent to
> -e
> > > robots=off?
> >
> > -e robots=off should still work. We also allow --robots=off or
> --no-robots.
> >
> > > 6. The documentation doesn't document being able to use 'M' for
> chunk-size,
> > > e.g. --chunk-size=2M
> >
> > The wget2 documentation has to be brushed up - one of the blockers for
> > the first release.
> >
> > >
> > > 7. The documentation's instructions regarding --progress is all wrong.
> > I'll take a look the next days.
>
> Thanks for the heads up. Will look into it when I look at the rest of the
> progress options.
> >
> > >
> > > 8. The http/https proxy options return as unknown options despite
> being in
> > > the documentation.
> > Yeah, the docs... see above. Also, proxy support is currently limited.
> >
> >
> > > Lastly I'd like someone to look at the command I've come up with and
> offer
> > > me critiques (and perhaps help me address some of the remarks above if
> > > possible).
> >
> > No need for --continue.
> > Think about using TLS Session Resumption.
> > --domains is not needed in your example.
> >
>
> You use TLS Resume, but you don't explicitly need to specify a file. By
> default
> it will use ~/.wget-session.
>
> > Did you build with http/2 and compression support ?
> >
> > Regards, Tim
> > > #!/bin/bash
> > >
> > > wget2 \
> > >       `#WSL compatibility` \
> > >       --restrict-file-names=windows --no-tcp-fastopen \
> > >       \
> > >       `#No certificate checking` \
> > >       --no-check-certificate \
> > >       \
> > >       `#Scrape the whole site` \
> > >       --continue --mirror --adjust-extension \
> > >       \
> > >       `#Local viewing` \
> > >       --convert-links --backup-converted \
> > >       \
> > >       `#Efficient resuming` \
> > >       --tls-resume --tls-session-file=.\tls.session \
> > >       \
> > >       `#Chunk-based downloading` \
> > >       --chunk-size=2M \
> > >       \
> > >       `#Swiper no swiping` \
> > >       --robots=off --random-wait \
> > >       \
> > >       `#Target` \
> > >       --domains=example.com example.com
> > >
> >
> >
> >
>
> --
> Thanking You,
> Darshit Shah
> PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]