bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Miscellaneous thoughts & concerns


From: Tim Rühsen
Subject: Re: [Bug-wget] Miscellaneous thoughts & concerns
Date: Sat, 7 Apr 2018 09:53:22 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0

On 07.04.2018 04:31, Jeffrey Fetterman wrote:
> > The number of parallel downloads ? --max-threads=n
>
> Okay, well, when I was running it earlier, I was noticing an entire
> directory of pdfs slowly getting larger every time I refreshed the
> directory, and there were something like 30 in there. It wasn't just
> five. I was very confused and I'm not sure what's going on there, and
> I really would like it to not do that.
>
It's unclear to me what you exactly mean. Maybe you have an example !?
>
> > Likely the WSL issue is also affecting the TLS layer. TLS resume is
> considered 'insecure', thus we have it disabled by default. There
> still is TLS False Start enabled by default.
>
> Are you implying TLS False Start will perform the same function as TLS
> Resume?
>

Both reduce RTT by 1, but they can't be combined.

>
> > You likely want to use --progress=bar. --force-progress is to enable
> the progress bar even when redirecting (e.g. to a log file)address@hidden,
> we shoudl adjust the behavior to be the same as in Wget1.x.
>
> That does work but it's very buggy. Only one shows at a time and it
> doesn't even always show the file that is downloading. Like it'll seem
> to be downloading a txt file when it's really downloading several
> larger files in the background.
>
>
> > Did you build with http/2 and compression support ?
>
> Yes, why?
>
Just to possibly increase download speed. HTTP/2 only works with TLS
though...
>
> P.S. I'm willing to help out with your documentation if you push some
> stuff that makes my life on WSL a little less painful, haha. I'd run
> this in a VM in an instant but I feel like that would be a bottleneck
> on what's supposed to be a high performance program. Speaking of high
> performance, just how much am I missing out on by not being able to
> take advantage of tcp fast open?
>
With a VM you can at least test whether a problem (e.g. progress bar) is
WSL related or not.

TFO reduces RTT by one (on 'hot' connections only). So only under
certain conditions,e.g. when closing and opening connections to the same
IP often.
It combines with TLS False Start, so that you can drop connection
latency from 3RTT to 1RTT. 0RTT is possible with TLS1.3, which is coming
soon (GnuTLS already supports the current draft 26 - but we didn't
test/implemented it yet).

> On Fri, Apr 6, 2018 at 5:01 PM, Tim Rühsen <address@hidden
> <mailto:address@hidden>> wrote:
>
>     Hi Jeffrey,
>
>
>     thanks for your feedback !
>
>
>     On 06.04.2018 23:30, Jeffrey Fetterman wrote:
>     > Thanks to the fix that Tim posted on gitlab, I've got wget2
>     running just
>     > fine in WSL. Unfortunately it means I don't have TCP Fast Open,
>     but given
>     > how fast it's downloading a ton of files at once, it seems like
>     it must've
>     > been only a small gain.
>     >
>     >
>     > I've come across a few annoyances however.
>     >
>     > 1. There doesn't seem to be any way to control the size of the
>     download
>     > queue, which I dislike because I want to download a lot of large
>     files at
>     > once and I wish it'd just focus on a few at a time, rather than
>     over a
>     > dozen.
>     The number of parallel downloads ? --max-threads=n
>
>     > 3. Doing a TLS resume will cause a 'Failed to write 305 bytes
>     (32: Broken
>     > pipe) error to be thrown', seems to be related to how certificate
>     > verification is handled upon resume, but I was worried at first
>     that the
>     > WLS problems were rearing their ugly head again.
>     Likely the WSL issue is also affecting the TLS layer. TLS resume is
>     considered 'insecure',
>     thus we have it disabled by default. There still is TLS False Start
>     enabled by default.
>
>
>     > 3. --no-check-certificate causes significantly more errors about
>     how the
>     > certificate issuer isn't trusted to be thrown (even though it's not
>     > supposed to be doing anything related to certificates).
>     Maybe a bit too verbose - these should be warnings, not errors.
>
>     > 4. --force-progress doesn't seem to do anything despite being
>     recognized as
>     > a valid paramater, using it in conjunction with -nv is no longer
>     beneficial.
>     You likely want to use --progress=bar. --force-progress is to
>     enable the
>     progress bar even when redirecting (e.g. to a log file).
>     @Darshit, we shoudl adjust the behavior to be the same as in Wget1.x.
>
>     > 5. The documentation is unclear as to how to disable things that are
>     > enabled by default. Am I to assume that --robots=off is
>     equivalent to -e
>     > robots=off?
>
>     -e robots=off should still work. We also allow --robots=off or
>     --no-robots.
>
>     > 6. The documentation doesn't document being able to use 'M' for
>     chunk-size,
>     > e.g. --chunk-size=2M
>
>     The wget2 documentation has to be brushed up - one of the blockers for
>     the first release.
>
>     >
>     > 7. The documentation's instructions regarding --progress is all
>     wrong.
>     I'll take a look the next days.
>
>     >
>     > 8. The http/https proxy options return as unknown options
>     despite being in
>     > the documentation.
>     Yeah, the docs... see above. Also, proxy support is currently limited.
>
>
>     > Lastly I'd like someone to look at the command I've come up with
>     and offer
>     > me critiques (and perhaps help me address some of the remarks
>     above if
>     > possible).
>
>     No need for --continue.
>     Think about using TLS Session Resumption.
>     --domains is not needed in your example.
>
>     Did you build with http/2 and compression support ?
>
>     Regards, Tim
>     > #!/bin/bash
>     >
>     > wget2 \
>     >       `#WSL compatibility` \
>     >       --restrict-file-names=windows --no-tcp-fastopen \
>     >       \
>     >       `#No certificate checking` \
>     >       --no-check-certificate \
>     >       \
>     >       `#Scrape the whole site` \
>     >       --continue --mirror --adjust-extension \
>     >       \
>     >       `#Local viewing` \
>     >       --convert-links --backup-converted \
>     >       \
>     >       `#Efficient resuming` \
>     >       --tls-resume --tls-session-file=.\tls.session \
>     >       \
>     >       `#Chunk-based downloading` \
>     >       --chunk-size=2M \
>     >       \
>     >       `#Swiper no swiping` \
>     >       --robots=off --random-wait \
>     >       \
>     >       `#Target` \
>     >       --domains=example.com <http://example.com> example.com
>     <http://example.com>
>     >
>
>
>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]