bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] GSoC project proposals, speed up


From: Darshit Shah
Subject: Re: [Bug-wget] GSoC project proposals, speed up
Date: Wed, 11 Mar 2015 01:05:38 +0530

Hi Laura,

>
> It is great to see we are many students taking an interest in Wget. When I
> went through the list of proposed projects, yours really caught up my eye,
> C, protocols, unambiguous and certainly useful, oh yes! :)

Thanks a lot for those kind words. Trust me when I say, we're
overwhelmed with the response we've seen for GSoC this year. I can
only hope the same level of enthusiasm remains throughout the period
and we can convert a few of these students into regular Wget
developers.

>
> I have been exploring the speed up ideas, namely the if-modified-since
> headers and the TCP Fast Open implementation and I would like to make sure
> I am walking in the right direction, both with the approach and the
> assumptions.
>
> *if-modified-since*
>
> The idea is to reduce the amount of requests to obtain modified documents,
> moving from the current three steps (HEAD, last-modified check and GET) to
> the new conditional header. This should include a better handling of the
> possible responses as well, like HTTP_STATUS_NOT_MODIFIED, that seems to be
> defined but not treated. Plus a new argument (e.g. --if-modified), config,
> tests...

You're mostly on track with this one. I'm not sure if we want a
--if-modified switch. Instead, the use of the if-modified-since header
should be enabled in all cases where it is relevant. However, this
does not mean that the old timestamp checking is no longer used. A lot
of websites do not support if-modified-since and we do not wish to
change the default behavior of Wget. Hence, both the checks should
exist, but if-modified-since gets the upper priority.

>
> It is a nice improvement for particular applications, e.g. efficient
> updates for caches or time-saving crawlers, and for an overall bandwidth
> reduction.
>
> *TCP Fast Open*
>
> On the other hand, TFO has a wider application; of course it lives in a
> lower level. TFO allows servers to start sending their responses directly
> after the SYN/ACK message, without waiting for the third handshake. It is
> based on the exchange of a secure token/cookie during the first connection
> and saving one RTT per request after.
>
> Particularly for HTTP requests, with short data flows, the overall impact
> can be very high (the RFC estimates it up to 40%, which sounds like forcing
> a bit too much the best case). Taking some measures after the
> implementation to verify it will complement the project nicely.
>
> Linux contains the full implementation of TFO and, since 3.13, it is
> enabled by default. The rest of most common OS don't support it (Windows,
> Mac OS); but others are considering it (FreeBSD), maybe by summer...
>
> For this task, I see I should introduce the MSG_FASTOPEN flag to the calls,
> moving from connect() to sendmsg()/sendto().Should this become a default or
> should it be configurable? It sounds like the kind of thing that could
> leave in your .wgetrc, but I honestly don't find any reason to force the
> conventional TCP. It should just happen automatically if the remote server
> doesn't support TFO.

I think we would like TFO to be the default option when it is
supported by the server. The first thing Wget needs to do is identify
if the server supports TFO and then calibrate the remaining requests
accordingly. You're right though, there's no reason to force
conventional TCP if both the ends support TFO. You must also try to
account for the situation where Wget is interacting with a proxy that
supports TFO, but the actual end of the connection doesn't. These are
corner cases and shouldn't be your major focus right now, but it helps
to keep them in mind.
>
> I would love to hear your ideas and comments to improve upon my proposal
> draft. In the meantime, I will start reading the codebase and try fixing
> small bugs, as already suggested in the list.

I'm assuming you've seen the GitHub wiki page for GSoC '15? You've
started out on the right track towards your proposal. Eventually it
will require a more detailed discussion and a timeline. But we will
work on those things later.
>
> Many thanks in advance,
>
> Laura



-- 
Thanking You,
Darshit Shah



reply via email to

[Prev in Thread] Current Thread [Next in Thread]