mldonkey-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Mldonkey-users] [pango 20030105a] new chunks order


From: Lionel Bouton
Subject: Re: [Mldonkey-users] [pango 20030105a] new chunks order
Date: Tue, 07 Jan 2003 23:18:16 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3a) Gecko/20021212

Goswin Brederlow wrote:

I'm not yet familiar with the inner workings of mldonkey and the eDonkey protocol details but here's an idea that could help if I didn't miss something.

Pierre Etchemaite <address@hidden> writes:

Le Sun, 05 Jan 2003 20:16:53 +0100, Sven Hartge <address@hidden>
a écrit :

         "new chunks order": experimental. Try to optimize file
         completion
               time using cost-benefit algorithm. The estimations could
               be improved a lot, but the basic algorithm is there.
               To allow for experimenting, the new algorithm is only
               used when random_order_download is enabled.

in the ChangeLog, but this explanation is a bit short.

What are the benefits of this new algorithm?
I'm not sure yet :)

They're many discussions about improving file propagation, how not
downloading in totally random order is bad (including trying to get first
and last chunks in priority, or trying too hard to get rare chunks), how
completing chunks quickly is good, etc.

Getting the first and last chunk might not be good for propagation but
it is vital for previewing. With mplayer under linux its vital to get
the beginning of an avi. With windows playern is usually vital to get
the index (which is at the end of the avi) too. Considering all the
files I killed after a few MB download because they where fakes I
think getting first/last chunk first saves a lot.

The only download order (for other chunks that first/last) that could maximize the file spreading speed is the one where you download a chunk few people already have on the network (in order to maximize its disponibility).

This is why in my opinion to maximize the spreading speed you'd better control the uploads : When you start spreading a file, you don't want to send the same chunk to several peers : you want all chunks out as fast as possible (to make new peers distribute your file and lower the equivalent of a "slashdot effect"). Edonkey peers should maintain upload statistics *per chunk* and give priority to peers asking for least downloaded chunks.
This way the order of the downloads doesn't matter anymore.

This can use not so much memory : you don't need to count much downloads as you'll spread them accross all your chunks. So for each chunk one byte should be enough to store the download number (or maybe 4 bits if you consider that very few chunks would hit 16 downloads and at that time making the difference between them won't matter). You could reset the counters regularly (on a period that would be roughly equal to the time needed to upload your whole share 16 times or 256 times computed from your max_upload_rate and share size). For a 9GB share of big vacation videos this would be 1000+ chunks -> 1000 bytes or 500 bytes used. The problem is "what do we count for a download" whole chunk downloads are clearly the best : we know the whole chunk is now available from another peer : we've hit our goal. But partial chunk downloads are difficult to manage. We could enlarge the download count structure to have more precision and count partial download with the following rule :
download_score = (download_size/chunk_size)*(whole_download_score).
With 4 bits precision :
- whole chunk downloaded -> whole_download_score = 16
- 4MB -> score = 7
- 200 bytes -> score = 0
Very small downloads on unreliable links won't be counted...

Newest chunks on our systems automaticaly are given maximum priority and we help distribute them accross the P2P network. Ideally edonkey peers should never honor a chunk request if someone asks for another chunk less downloaded.

The idea (that's certainly wrong, I'm now almost sure), was that since the
whole file is completed when all chunks are, the "work per source" should be
as uniform as possible over all chunks.

If your selfish the work should be distributed so that all chunks
finish at the same time. Calculate an ETA for each chunk and request
the chunk that will take longest next. But don't be selfish.



Being selfish only hurt the network you use so hurts you, so it's quite a good advice...


This is certainly wrong, because we want to complete some chunks as soon as
possible. And since a chunk is often much smaller than the whole file, only the scheduling of the last remaining chunks will determine when the
whole file completes (= completing other chunks a bit sooner or a bit later
doesn't matter).

Instead, I think we should try to use all the available sources for the
smallest number of chunks at a time, within some reasonable limit (no need
to use all clients for the same chunk). Work per source should be as uniform
as possible over those chunks only.

I still don't know how hard we should try to get rare chunks when there's an
opportunity...

For rare files getting rare chunks is vital. For comon files it
probably doesn't matter whether you have 20 or 30 sources for a
chunk. The 1000 sources you are not connected too could completly
reverse the rareness.

I don't thinks it's a good idea : everybody would flood the poor peers that have the chance of owning the rare file chunks.
This would only lower their bandwidth and hurt these chunks propagation.

I think rareness should realy be calculated over all known chunks and
not just connected chunks.


Overall I think two things should be considered:

1. Complete chunks as fast as possible. Once they are shared you free
up slots on other clients because people will connect to you instead.
You also can compute the md4 and correct errors. and fragmentation of
the file isn't too bad.

Agreed.

2. Request chunks in a way that leaves the maximum number of sources
for the remaining chunks. For example if you have a rare chunk with
one source and 100 sources for other chunks. If you first complete all
other chunks you will have only one source left for the last chunk and
100 competitors. If you grab that rare chunk instead if possible you
still have 100 sources left for the other chunks.


Don't agree : trust the network : some other peer will complete the chunk, distribute it and by this help you get it if you can't.

To combine the two I would suggest keeping a count of the overall
rareness (connected and unconnected clients) of chunks and try to get
rare chunks first.  Get as many surces for the rarest chunk up to a
minimum sub chunk size of say 256K, i.e. as long as there is a gap of
256K in the file where only one source downloads try to split that
into two parts with two sources. Thats how lopster is doing it.

The limit of 256K should be set low to allow many sources per chunk
but high enough that you don't end up asking for very small blocks all
the time and spend more time asking than downloading.


Reconnecting to clients should also consider rareness of chunks.
Clients with rare chunks should be tried more often, clients with
common chunks less often.

Please don't hammer them. Everytime you try to get something before others you are selfish and hurt the network (this is what you do by requesting more often these chunks than others). Doing so makes your client better for you on the short term, but other clients will be upgraded and you'd end up on the same levelling field only with rare chunk owners more busy answering requests and uploading less their rare chunks...

Hope it helped in some way.

LB





reply via email to

[Prev in Thread] Current Thread [Next in Thread]