bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Miscellaneous thoughts & concerns


From: Tim Rühsen
Subject: Re: [Bug-wget] Miscellaneous thoughts & concerns
Date: Sat, 7 Apr 2018 19:09:11 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0

WSL fix for TLS:


Search libwget/ssl_gnutls.c for EINPROGRESS and extend the code to also
check errno for 22 and 32.

There are just two places in _ssl_writev().


After these changes TLS works for me including --tls-resume.
But you still have to use --no-tcp-fastopen.

Regards, Tim


On 07.04.2018 04:31, Jeffrey Fetterman wrote:
> > The number of parallel downloads ? --max-threads=n
>
> Okay, well, when I was running it earlier, I was noticing an entire
> directory of pdfs slowly getting larger every time I refreshed the
> directory, and there were something like 30 in there. It wasn't just
> five. I was very confused and I'm not sure what's going on there, and
> I really would like it to not do that.
>
>
> > Likely the WSL issue is also affecting the TLS layer. TLS resume is
> considered 'insecure', thus we have it disabled by default. There
> still is TLS False Start enabled by default.
>
> Are you implying TLS False Start will perform the same function as TLS
> Resume?
>
>
> > You likely want to use --progress=bar. --force-progress is to enable
> the progress bar even when redirecting (e.g. to a log file)address@hidden,
> we shoudl adjust the behavior to be the same as in Wget1.x.
>
> That does work but it's very buggy. Only one shows at a time and it
> doesn't even always show the file that is downloading. Like it'll seem
> to be downloading a txt file when it's really downloading several
> larger files in the background.
>
>
> > Did you build with http/2 and compression support ?
>
> Yes, why?
>
>
> P.S. I'm willing to help out with your documentation if you push some
> stuff that makes my life on WSL a little less painful, haha. I'd run
> this in a VM in an instant but I feel like that would be a bottleneck
> on what's supposed to be a high performance program. Speaking of high
> performance, just how much am I missing out on by not being able to
> take advantage of tcp fast open?
>
>
> On Fri, Apr 6, 2018 at 5:01 PM, Tim Rühsen <address@hidden
> <mailto:address@hidden>> wrote:
>
>     Hi Jeffrey,
>
>
>     thanks for your feedback !
>
>
>     On 06.04.2018 23:30, Jeffrey Fetterman wrote:
>     > Thanks to the fix that Tim posted on gitlab, I've got wget2
>     running just
>     > fine in WSL. Unfortunately it means I don't have TCP Fast Open,
>     but given
>     > how fast it's downloading a ton of files at once, it seems like
>     it must've
>     > been only a small gain.
>     >
>     >
>     > I've come across a few annoyances however.
>     >
>     > 1. There doesn't seem to be any way to control the size of the
>     download
>     > queue, which I dislike because I want to download a lot of large
>     files at
>     > once and I wish it'd just focus on a few at a time, rather than
>     over a
>     > dozen.
>     The number of parallel downloads ? --max-threads=n
>
>     > 3. Doing a TLS resume will cause a 'Failed to write 305 bytes
>     (32: Broken
>     > pipe) error to be thrown', seems to be related to how certificate
>     > verification is handled upon resume, but I was worried at first
>     that the
>     > WLS problems were rearing their ugly head again.
>     Likely the WSL issue is also affecting the TLS layer. TLS resume is
>     considered 'insecure',
>     thus we have it disabled by default. There still is TLS False Start
>     enabled by default.
>
>
>     > 3. --no-check-certificate causes significantly more errors about
>     how the
>     > certificate issuer isn't trusted to be thrown (even though it's not
>     > supposed to be doing anything related to certificates).
>     Maybe a bit too verbose - these should be warnings, not errors.
>
>     > 4. --force-progress doesn't seem to do anything despite being
>     recognized as
>     > a valid paramater, using it in conjunction with -nv is no longer
>     beneficial.
>     You likely want to use --progress=bar. --force-progress is to
>     enable the
>     progress bar even when redirecting (e.g. to a log file).
>     @Darshit, we shoudl adjust the behavior to be the same as in Wget1.x.
>
>     > 5. The documentation is unclear as to how to disable things that are
>     > enabled by default. Am I to assume that --robots=off is
>     equivalent to -e
>     > robots=off?
>
>     -e robots=off should still work. We also allow --robots=off or
>     --no-robots.
>
>     > 6. The documentation doesn't document being able to use 'M' for
>     chunk-size,
>     > e.g. --chunk-size=2M
>
>     The wget2 documentation has to be brushed up - one of the blockers for
>     the first release.
>
>     >
>     > 7. The documentation's instructions regarding --progress is all
>     wrong.
>     I'll take a look the next days.
>
>     >
>     > 8. The http/https proxy options return as unknown options
>     despite being in
>     > the documentation.
>     Yeah, the docs... see above. Also, proxy support is currently limited.
>
>
>     > Lastly I'd like someone to look at the command I've come up with
>     and offer
>     > me critiques (and perhaps help me address some of the remarks
>     above if
>     > possible).
>
>     No need for --continue.
>     Think about using TLS Session Resumption.
>     --domains is not needed in your example.
>
>     Did you build with http/2 and compression support ?
>
>     Regards, Tim
>     > #!/bin/bash
>     >
>     > wget2 \
>     >       `#WSL compatibility` \
>     >       --restrict-file-names=windows --no-tcp-fastopen \
>     >       \
>     >       `#No certificate checking` \
>     >       --no-check-certificate \
>     >       \
>     >       `#Scrape the whole site` \
>     >       --continue --mirror --adjust-extension \
>     >       \
>     >       `#Local viewing` \
>     >       --convert-links --backup-converted \
>     >       \
>     >       `#Efficient resuming` \
>     >       --tls-resume --tls-session-file=.\tls.session \
>     >       \
>     >       `#Chunk-based downloading` \
>     >       --chunk-size=2M \
>     >       \
>     >       `#Swiper no swiping` \
>     >       --robots=off --random-wait \
>     >       \
>     >       `#Target` \
>     >       --domains=example.com <http://example.com> example.com
>     <http://example.com>
>     >
>
>
>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]