bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [RFC] Extend concurrency support


From: Tim Ruehsen
Subject: Re: [Bug-wget] [RFC] Extend concurrency support
Date: Tue, 20 May 2014 09:45:45 +0200
User-agent: KMail/4.12.4 (Linux/3.14-1-amd64; KDE/4.12.4; x86_64; ; )

On Tuesday 20 May 2014 01:23:55 Jure Grabnar wrote:
> Hello,
> 
> for GSoC project I will do the following:
> 1. implement downloading one file through a mirror-list
> 2. implement downloading multiple files from multiple servers
> 3. fix Metalink support.
> 
> I'd like to get your opinions regarding implementation of the first
> one, although I will soon RFC for the second one aswell.
> 
> 1. Single file through a mirror-list
> 
> a) Backend
> A user would specify a number of threads N and a list of mirror servers.
> A flowchart would look like this:
> 
> 1) Go through mirrors and find first available server (available -
> responds in < MAX_RETRIES retries).
> 
> 2) Try to figure out file size with Content-Length header. If size is
> unknown fallback to a single thread download. Would it be sensible to
> allow user to specify file size with some switch?
> 
> 3) The main thread maintains a pool of available servers. It spawns at
> most N threads if N < M or at most M threads if M < N, where M is
> number of available mirrors. Every thread downloads each own chunk from
> each own mirror using current implementation of concurrent download
> for Metalink. If some mirror becomes unavailable during download from
> i-th thread, that threads terminates and notifies the main thread. The
> main thread spawns a new thread from available mirrors; if none is
> available at the moment, it waits until some mirror becomes available
> (whenever some other thread finishes downloading its chunk).
> 
> It might occur that a mirror that was unavailable becomes available
> during download. Such mirros should be added to the pool of available
> mirrors. I was thinking about creating another thread that would
> occasionaly "poke" unavailable servers and add them to the pool if they
> respond.
> 
> It might occur that when M < N and therefore M threads were spawned, a
> fresh mirror is added to the pool (see previous paragraph). In this
> case it's probably best to divide file into N pieces no matter what -
> but only M threads will be active at the beginning. The newly added
> server can be used to spawn another thread.
> 
> 4) A file would be downloaded to a single temporary file as described
> here: http://lists.gnu.org/archive/html/bug-wget/2014-05/msg00025.html
> I'm still fixing the patch, because at least one memory corruption bug
> is still lurking around which is yet to be found.
> 
> b) Front end
> What would be a good way to specify mirror list? Specifying a switch
> and listing all mirrors could be quite awkward. Should we introduce
> some sort of a simple file format?
> I believe we should take into consideration number 2: downloading
> multiple files from multiple servers. Do we want to apply different
> switches (options) to different files?
> What about if we want to combine 1. and 2.: multiple files from multiple
> mirror list? The simplest way would be to use Metalink file for such
> purpose but is it the most elegant?
> 
> All your suggestions are greatly appreciated.

Hi Jure,

most of this is already solved in https://github.com/rockdaboot/mget which was 
originally thought as a 'modern' Wget. I would like to see Mget and Wget merge 
into something like 'Wget2'. At least, feel free to move code from Mget into 
Wget as you wish (I am the author and copyright holder of Mget, both projects 
have the same license).

History...
I have been at the same point as you some years ago. And after looking at Wget 
I found Wget's code has to be redesigned. I had two choices: struggling with 
grown code or restart from scratch. I did the second because I didn't see a 
chance to get huge code changes into Wget. Either you have to discuss every 
little change or you end up with your own code branch, which might become 
integrated into master during the next few years.

It has been asked many times and I do it again: shouldn't we start with Wget2 
development, maybe having Jure as "project leader" (if you want). I made a 
start with Mget (e.g. consequently putting reusable code into a library)... 
and I would spend some time helping to merge Mget and Wget.
Due to the library based character of Mget, I shouldn't be too hard.

I hope I am not too OT for you, Jure. But if I am, just ignore my comments ;-)

Regards, Tim




reply via email to

[Prev in Thread] Current Thread [Next in Thread]