bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Miscellaneous thoughts & concerns


From: Jeffrey Fetterman
Subject: Re: [Bug-wget] Miscellaneous thoughts & concerns
Date: Fri, 6 Apr 2018 21:31:56 -0500

> The number of parallel downloads ? --max-threads=n

Okay, well, when I was running it earlier, I was noticing an entire
directory of pdfs slowly getting larger every time I refreshed the
directory, and there were something like 30 in there. It wasn't just five.
I was very confused and I'm not sure what's going on there, and I really
would like it to not do that.


> Likely the WSL issue is also affecting the TLS layer. TLS resume is
considered 'insecure', thus we have it disabled by default. There still is
TLS False Start enabled by default.

Are you implying TLS False Start will perform the same function as TLS
Resume?


> You likely want to use --progress=bar. --force-progress is to enable the 
> progress
bar even when redirecting (e.g. to a log file). @Darshit, we shoudl adjust
the behavior to be the same as in Wget1.x.

That does work but it's very buggy. Only one shows at a time and it doesn't
even always show the file that is downloading. Like it'll seem to be
downloading a txt file when it's really downloading several larger files in
the background.


> Did you build with http/2 and compression support ?

Yes, why?


P.S. I'm willing to help out with your documentation if you push some stuff
that makes my life on WSL a little less painful, haha. I'd run this in a VM
in an instant but I feel like that would be a bottleneck on what's supposed
to be a high performance program. Speaking of high performance, just how
much am I missing out on by not being able to take advantage of tcp fast
open?


On Fri, Apr 6, 2018 at 5:01 PM, Tim Rühsen <address@hidden> wrote:

> Hi Jeffrey,
>
>
> thanks for your feedback !
>
>
> On 06.04.2018 23:30, Jeffrey Fetterman wrote:
> > Thanks to the fix that Tim posted on gitlab, I've got wget2 running just
> > fine in WSL. Unfortunately it means I don't have TCP Fast Open, but given
> > how fast it's downloading a ton of files at once, it seems like it
> must've
> > been only a small gain.
> >
> >
> > I've come across a few annoyances however.
> >
> > 1. There doesn't seem to be any way to control the size of the download
> > queue, which I dislike because I want to download a lot of large files at
> > once and I wish it'd just focus on a few at a time, rather than over a
> > dozen.
> The number of parallel downloads ? --max-threads=n
>
> > 3. Doing a TLS resume will cause a 'Failed to write 305 bytes (32: Broken
> > pipe) error to be thrown', seems to be related to how certificate
> > verification is handled upon resume, but I was worried at first that the
> > WLS problems were rearing their ugly head again.
> Likely the WSL issue is also affecting the TLS layer. TLS resume is
> considered 'insecure',
> thus we have it disabled by default. There still is TLS False Start
> enabled by default.
>
>
> > 3. --no-check-certificate causes significantly more errors about how the
> > certificate issuer isn't trusted to be thrown (even though it's not
> > supposed to be doing anything related to certificates).
> Maybe a bit too verbose - these should be warnings, not errors.
>
> > 4. --force-progress doesn't seem to do anything despite being recognized
> as
> > a valid paramater, using it in conjunction with -nv is no longer
> beneficial.
> You likely want to use --progress=bar. --force-progress is to enable the
> progress bar even when redirecting (e.g. to a log file).
> @Darshit, we shoudl adjust the behavior to be the same as in Wget1.x.
>
> > 5. The documentation is unclear as to how to disable things that are
> > enabled by default. Am I to assume that --robots=off is equivalent to -e
> > robots=off?
>
> -e robots=off should still work. We also allow --robots=off or --no-robots.
>
> > 6. The documentation doesn't document being able to use 'M' for
> chunk-size,
> > e.g. --chunk-size=2M
>
> The wget2 documentation has to be brushed up - one of the blockers for
> the first release.
>
> >
> > 7. The documentation's instructions regarding --progress is all wrong.
> I'll take a look the next days.
>
> >
> > 8. The http/https proxy options return as unknown options despite being
> in
> > the documentation.
> Yeah, the docs... see above. Also, proxy support is currently limited.
>
>
> > Lastly I'd like someone to look at the command I've come up with and
> offer
> > me critiques (and perhaps help me address some of the remarks above if
> > possible).
>
> No need for --continue.
> Think about using TLS Session Resumption.
> --domains is not needed in your example.
>
> Did you build with http/2 and compression support ?
>
> Regards, Tim
> > #!/bin/bash
> >
> > wget2 \
> >       `#WSL compatibility` \
> >       --restrict-file-names=windows --no-tcp-fastopen \
> >       \
> >       `#No certificate checking` \
> >       --no-check-certificate \
> >       \
> >       `#Scrape the whole site` \
> >       --continue --mirror --adjust-extension \
> >       \
> >       `#Local viewing` \
> >       --convert-links --backup-converted \
> >       \
> >       `#Efficient resuming` \
> >       --tls-resume --tls-session-file=.\tls.session \
> >       \
> >       `#Chunk-based downloading` \
> >       --chunk-size=2M \
> >       \
> >       `#Swiper no swiping` \
> >       --robots=off --random-wait \
> >       \
> >       `#Target` \
> >       --domains=example.com example.com
> >
>
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]