mldonkey-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Mldonkey-users] [pango 20030105a] new chunks order


From: Goswin Brederlow
Subject: Re: [Mldonkey-users] [pango 20030105a] new chunks order
Date: 08 Jan 2003 00:36:09 +0100
User-agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Military Intelligence)

Lionel Bouton <address@hidden> writes:

> Goswin Brederlow wrote:
> 
> I'm not yet familiar with the inner workings of mldonkey and the
> eDonkey protocol details but here's an idea that could help if I
> didn't miss something.
> 
> 
> >Pierre Etchemaite <address@hidden> writes:
> >
> >
> 
> >>Le Sun, 05 Jan 2003 20:16:53 +0100, Sven Hartge <address@hidden>
> >>a écrit :
> >>
> >>
> 
> >>>          "new chunks order": experimental. Try to optimize file
> >>>          completion
> >>>                time using cost-benefit algorithm. The estimations could
> >>>                be improved a lot, but the basic algorithm is there.
> >>>                To allow for experimenting, the new algorithm is only
> >>>                used when random_order_download is enabled.
> >>>
> >>>in the ChangeLog, but this explanation is a bit short.
> >>>
> >>>What are the benefits of this new algorithm?
> >>>
> 
> >>I'm not sure yet :)
> >>
> >>They're many discussions about improving file propagation, how not
> >>downloading in totally random order is bad (including trying to get first
> >>and last chunks in priority, or trying too hard to get rare chunks), how
> >>completing chunks quickly is good, etc.
> >>
> 
> >
> >Getting the first and last chunk might not be good for propagation but
> >it is vital for previewing. With mplayer under linux its vital to get
> >the beginning of an avi. With windows playern is usually vital to get
> >the index (which is at the end of the avi) too. Considering all the
> >files I killed after a few MB download because they where fakes I
> >think getting first/last chunk first saves a lot.
> >
> 
> 
> The only download order (for other chunks that first/last) that could
> maximize the file spreading speed is the one where you download a
> chunk few people already have on the network (in order to maximize its
> disponibility).
> 
> 
> This is why in my opinion to maximize the spreading speed you'd better
> control the uploads :
> 
> When you start spreading a file, you don't want to send the same chunk
> to several peers :

Afaik does mldonkey already do that. If its uploading a chunk it hides
that chunk from other clients. It apears like he doesn't have it and
thus they won't download it.
 
> you want all chunks out as fast as possible (to make new peers
> distribute your file and lower the equivalent of a "slashdot effect").
> 
> Edonkey peers should maintain upload statistics *per chunk* and give
> priority to peers asking for least downloaded chunks.

I don't think you actually ask for a chunk. At least I hope so. You
should ask for a file and only when you are connected for an actual
download you say what byte offset you want. Otherwise you ask for a
chunk and during the time you are sitting in the clients queue you get
that chunk elsewhere.

> This way the order of the downloads doesn't matter anymore.
> 
> This can use not so much memory : you don't need to count much
> downloads as you'll spread them accross all your chunks. So for each
> chunk one byte should be enough to store the download number (or maybe
> 4 bits if you consider that very few chunks would hit 16 downloads and
> at that time making the difference between them won't matter). You
> could reset the counters regularly (on a period that would be roughly
> equal to the time needed to upload your whole share 16 times or 256
> times computed from your max_upload_rate and share size).
> 
> For a 9GB share of big vacation videos this would be 1000+ chunks ->

I don't share anything and have 20-30 GB shared due to temporary
files.

...
> Newest chunks on our systems automaticaly are given maximum priority
> and we help distribute them accross the P2P network.

Newest? In what way? I got that last? the time when you get a chunk is
completly unrelated to rareness.

> Ideally edonkey peers should never honor a chunk request if someone
> asks for another chunk less downloaded.
> 
> 
> >>The idea (that's certainly wrong, I'm now almost sure), was that since the
> >>whole file is completed when all chunks are, the "work per source" should be
> >>as uniform as possible over all chunks.
> >>
> 
> >
> >If your selfish the work should be distributed so that all chunks
> >finish at the same time. Calculate an ETA for each chunk and request
> >the chunk that will take longest next. But don't be selfish.
> >
> 
> 
> Being selfish only hurt the network you use so hurts you, so it's
> quite a good advice...

Only if everyone does it but that what it boils down to. You have to
balance being selfish and helping others. If you go into extremes the
network dies or people will get frustratet and start exploiting.

...
> >For rare files getting rare chunks is vital. For comon files it
> >probably doesn't matter whether you have 20 or 30 sources for a
> >chunk. The 1000 sources you are not connected too could completly
> >reverse the rareness.
> >
> 
> 
> I don't thinks it's a good idea : everybody would flood the poor peers
> that have the chance of owning the rare file chunks.
> 
> This would only lower their bandwidth and hurt these chunks propagation.

You connect and get queued and you retry at the normal interval for
rare chunks. Same with frequent chunks.

A difference only appears if a client is offline. A client that has
only frequent chunks and is offline is irelevant. Try it every few
hours. If its a rare chunk its important to get it as soon as the
client comes back online. Be a little bit selfish. Get in ahead of the
crowd.

The main difference would be to less frequently query unintresting
clients and not querying intresting clients more often. There are far
to many connects already all the time.

> >I think rareness should realy be calculated over all known chunks and
> >not just connected chunks.
> >
> >
> >Overall I think two things should be considered:
> >
> >1. Complete chunks as fast as possible. Once they are shared you free
> >up slots on other clients because people will connect to you instead.
> >You also can compute the md4 and correct errors. and fragmentation of
> >the file isn't too bad.
> >
> 
> 
> Agreed.
> 
> >2. Request chunks in a way that leaves the maximum number of sources
> >for the remaining chunks. For example if you have a rare chunk with
> >one source and 100 sources for other chunks. If you first complete all
> >other chunks you will have only one source left for the last chunk and
> >100 competitors. If you grab that rare chunk instead if possible you
> >still have 100 sources left for the other chunks.
> >
> >
> 
> 
> Don't agree : trust the network : some other peer will complete the
> chunk, distribute it and by this help you get it if you can't.

I think you misunderstood me here.

You are connecting to the client with the rare chunks and some
frequent chunks. You got your slot and now you have to decide what to
get. The rare chunk or a frequent chunk. You will block the client
one way or the other. Then its better to get the rare chunk. It helps
you to finish the file earlier (because you can get the other chunks
somewhere else) and it helps spreading because you can spread the rare
chunk earlier.

> >To combine the two I would suggest keeping a count of the overall
> >rareness (connected and unconnected clients) of chunks and try to get
> >rare chunks first.  Get as many surces for the rarest chunk up to a
> >minimum sub chunk size of say 256K, i.e. as long as there is a gap of
> >
> 
> >>256K in the file where only one source downloads try to split that
> >>
> 
> >into two parts with two sources. Thats how lopster is doing it.
> >
> >The limit of 256K should be set low to allow many sources per chunk
> >but high enough that you don't end up asking for very small blocks all
> >the time and spend more time asking than downloading.
> >
> >
> >Reconnecting to clients should also consider rareness of chunks.
> >Clients with rare chunks should be tried more often, clients with
> >common chunks less often.
> >
> 
> 
> Please don't hammer them. Everytime you try to get something before
> others you are selfish and hurt the network (this is what you do by
> requesting more often these chunks than others).
> 
> Doing so makes your client better for you on the short term, but other
> clients will be upgraded and you'd end up on the same levelling field
> only with rare chunk owners more busy answering requests and uploading
> less their rare chunks...

More often and less often is always ment in comparison to each other
and NOT as "more often than now". Usually i'm hitting the
"max_clients_per_second" limit with the current rate of reconnects
anyway. I want to sort out unintresting clients and save miself the
trouble of connecting to them. Free up mine and their bandwith.

MfG
        Goswin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]