Re: [Mldonkey-users] [pango 20030105a] new chunks order

mldonkey-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Mldonkey-users] [pango 20030105a] new chunks order

From:	Lionel Bouton
Subject:	Re: [Mldonkey-users] [pango 20030105a] new chunks order
Date:	Tue, 07 Jan 2003 23:18:16 +0100
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3a) Gecko/20021212

Goswin Brederlow wrote:

I'm not yet familiar with the inner workings of mldonkey and the eDonkeyprotocol details but here's an idea that could help if I didn't misssomething.

Pierre Etchemaite <address@hidden> writes:

Le Sun, 05 Jan 2003 20:16:53 +0100, Sven Hartge <address@hidden>
a écrit :

         "new chunks order": experimental. Try to optimize file
         completion
               time using cost-benefit algorithm. The estimations could
               be improved a lot, but the basic algorithm is there.
               To allow for experimenting, the new algorithm is only
               used when random_order_download is enabled.

in the ChangeLog, but this explanation is a bit short.

What are the benefits of this new algorithm?

I'm not sure yet :)

They're many discussions about improving file propagation, how not
downloading in totally random order is bad (including trying to get first
and last chunks in priority, or trying too hard to get rare chunks), how
completing chunks quickly is good, etc.


Getting the first and last chunk might not be good for propagation but
it is vital for previewing. With mplayer under linux its vital to get
the beginning of an avi. With windows playern is usually vital to get
the index (which is at the end of the avi) too. Considering all the
files I killed after a few MB download because they where fakes I
think getting first/last chunk first saves a lot.

The only download order (for other chunks that first/last) that couldmaximize the file spreading speed is the one where you download a chunkfew people already have on the network (in order to maximize itsdisponibility).

This is why in my opinion to maximize the spreading speed you'd bettercontrol the uploads :When you start spreading a file, you don't want to send the same chunkto several peers :you want all chunks out as fast as possible (to make new peersdistribute your file and lower the equivalent of a "slashdot effect").Edonkey peers should maintain upload statistics *per chunk* and givepriority to peers asking for least downloaded chunks.

This way the order of the downloads doesn't matter anymore.

This can use not so much memory : you don't need to count much downloadsas you'll spread them accross all your chunks. So for each chunk onebyte should be enough to store the download number (or maybe 4 bits ifyou consider that very few chunks would hit 16 downloads and at thattime making the difference between them won't matter). You could resetthe counters regularly (on a period that would be roughly equal to thetime needed to upload your whole share 16 times or 256 times computedfrom your max_upload_rate and share size).For a 9GB share of big vacation videos this would be 1000+ chunks ->1000 bytes or 500 bytes used.The problem is "what do we count for a download" whole chunk downloadsare clearly the best : we know the whole chunk is now available fromanother peer : we've hit our goal. But partial chunk downloads aredifficult to manage. We could enlarge the download count structure tohave more precision and count partial download with the following rule :

download_score = (download_size/chunk_size)*(whole_download_score).
With 4 bits precision :
- whole chunk downloaded -> whole_download_score = 16
- 4MB -> score = 7
- 200 bytes -> score = 0
Very small downloads on unreliable links won't be counted...

Newest chunks on our systems automaticaly are given maximum priority andwe help distribute them accross the P2P network.Ideally edonkey peers should never honor a chunk request if someone asksfor another chunk less downloaded.

The idea (that's certainly wrong, I'm now almost sure), was that since the
whole file is completed when all chunks are, the "work per source" should be
as uniform as possible over all chunks.


If your selfish the work should be distributed so that all chunks
finish at the same time. Calculate an ETA for each chunk and request
the chunk that will take longest next. But don't be selfish.

Being selfish only hurt the network you use so hurts you, so it's quitea good advice...

This is certainly wrong, because we want to complete some chunks as soon as

possible. And since a chunk is often much smaller than the whole file,only the scheduling of the last remaining chunks will determine when the

whole file completes (= completing other chunks a bit sooner or a bit later
doesn't matter).

Instead, I think we should try to use all the available sources for the
smallest number of chunks at a time, within some reasonable limit (no need
to use all clients for the same chunk). Work per source should be as uniform
as possible over those chunks only.

I still don't know how hard we should try to get rare chunks when there's an
opportunity...


For rare files getting rare chunks is vital. For comon files it
probably doesn't matter whether you have 20 or 30 sources for a
chunk. The 1000 sources you are not connected too could completly
reverse the rareness.

I don't thinks it's a good idea : everybody would flood the poor peersthat have the chance of owning the rare file chunks.

This would only lower their bandwidth and hurt these chunks propagation.

I think rareness should realy be calculated over all known chunks and
not just connected chunks.


Overall I think two things should be considered:

1. Complete chunks as fast as possible. Once they are shared you free
up slots on other clients because people will connect to you instead.
You also can compute the md4 and correct errors. and fragmentation of
the file isn't too bad.


Agreed.

2. Request chunks in a way that leaves the maximum number of sources
for the remaining chunks. For example if you have a rare chunk with
one source and 100 sources for other chunks. If you first complete all
other chunks you will have only one source left for the last chunk and
100 competitors. If you grab that rare chunk instead if possible you
still have 100 sources left for the other chunks.

Don't agree : trust the network : some other peer will complete thechunk, distribute it and by this help you get it if you can't.

To combine the two I would suggest keeping a count of the overall
rareness (connected and unconnected clients) of chunks and try to get
rare chunks first.  Get as many surces for the rarest chunk up to a
minimum sub chunk size of say 256K, i.e. as long as there is a gap of

256K in the file where only one source downloads try to split that

into two parts with two sources. Thats how lopster is doing it.

The limit of 256K should be set low to allow many sources per chunk
but high enough that you don't end up asking for very small blocks all
the time and spend more time asking than downloading.


Reconnecting to clients should also consider rareness of chunks.
Clients with rare chunks should be tried more often, clients with
common chunks less often.

Please don't hammer them. Everytime you try to get something beforeothers you are selfish and hurt the network (this is what you do byrequesting more often these chunks than others).Doing so makes your client better for you on the short term, but otherclients will be upgraded and you'd end up on the same levelling fieldonly with rare chunk owners more busy answering requests and uploadingless their rare chunks...


Hope it helped in some way.

LB

[Prev in Thread]

Current Thread

[Next in Thread]

[Mldonkey-users] [pango 20030105a] new chunks order, Sven Hartge, 2003/01/05
- Re: [Mldonkey-users] [pango 20030105a] new chunks order, Pierre Etchemaite, 2003/01/05
  - Re: [Mldonkey-users] [pango 20030105a] new chunks order, Goswin Brederlow, 2003/01/07
    - Re: [Mldonkey-users] [pango 20030105a] new chunks order, Lionel Bouton <=
    - Re: [Mldonkey-users] [pango 20030105a] new chunks order, Goswin Brederlow, 2003/01/07
    - Re: [Mldonkey-users] [pango 20030105a] new chunks order, Lionel Bouton, 2003/01/07
    - Re: [Mldonkey-users] [pango 20030105a] new chunks order, Goswin Brederlow, 2003/01/08
    - [Mldonkey-users] Re: [pango 20030105a] new chunks order, Sven Hartge, 2003/01/09

Prev by Date: Re: [Mldonkey-users] fonts
Next by Date: [Mldonkey-users] [patch] Update for debian/control
Previous by thread: Re: [Mldonkey-users] [pango 20030105a] new chunks order
Next by thread: Re: [Mldonkey-users] [pango 20030105a] new chunks order
Index(es):
- Date
- Thread