mldonkey-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Mldonkey-users] New source management


From: Goswin Brederlow
Subject: Re: [Mldonkey-users] New source management
Date: 02 Jan 2003 20:31:38 +0100
User-agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Military Intelligence)

wh <address@hidden> writes:

>    Goswin Brederlow wrote:
> 
>  >Lets throw in my 2 c of thougths:
>  >
>  >1. new source should be tried within a reasonable time limit but they
>  >should not flood out older sources. Each file could get a next_new
>  >counter. Every time a source is added the earliest time it might be
>  >tried would be the next_new counter and the counter is increased. If
>  >the next_new is too low it should be advanced to say a second from now
>  >or something.
>  >
> Therefore, you would need two (or three) seperate queues: One for known
> good sources, one for new and untried ones and another one maybe for bad
> sources that have been deleted from the normal sources and that should
> be retried very seldomly.

Yes, i realised that while writing. A global list for new sources
would probably be enough though. Ask one new source to identify itself
and then try one old source. A global list to conserv memory and
ease handling. If there 1000 new sources for one file and 1 source for
another doing 1000 new sources for the one file and then the 1 one for
the other shouldn't be too bad.
 
>  >2. each file should have its own queue. Otherwise popular files would
>  >kill sources of sparse files.
>  >
> Yes, an own queue per file is definitly necessary
> 
>  >3. Each file should have a priority queue.
>  >- The priority should be initialised with some small offset depending
>  >  on the IP. Known dynamic IPs get a penalty.
>  >
>  >
> Are you sure that this would work?

I know that anything *.dip.t-dialin.net is a dynamic IP. Its slow and
it has a short lifetime. If I just found that source and a connect
fails the source should be blacklisted for the next few hours. No
sense trying it again. The user has gone offline or gotten a new ip.

Ips not in that range are to be prefered. More important than an
initial penalty is getting rid of those IPs once they fail to
work. Its a waste of my bandwith trying and anoying to the next person
getting that ip (unless hes a donkey user too :).
 
>  >- When a connection is made or failes the priority gets addjusted up
>  >  or down.
>  >- When the client is identified the client and version give some small
>  >  adjustment to the priority. Nothing big but some.
>  >- If a client has chunks I need it gets a bonus for each. _Otherwise_
>  >  for each chunk it needs from me it gets a malus. (its more likely
>  >  the client will get a chunk I already have than a new one).
>  >  Sparseness of chunks could be considered here.
>  >
> Yes, this way of dynamic source priorization should improve our download
> speed problems, although it has to be tested very much...
> 
>  >- If I get some upload from a client the source should get a
>  >  bonus. Maybe the speed could be factored in too.
>  >
> This factor shouldn't be very high, else we would get a credit system
> again. And I don't think a credit system is a good thing for many reasons.

Why? Why shouldn't i leech of those high bandwith bastards and only
fall back to low bandwith ordinary people when all else fails?
Sources that gave me data before are preferable to sources that put me
into a queue for ages and then went offline.

I'm not (at this time and not here) suggesting uploading more to
clients that upload to me, which would be a credit system.

>  >- The time since the last upload and last connect should be factored
>  >  in. A successful connect without a download isn't neccessarily a
>  >  good thing.
>  >
>  >- The queue should have a high and low watermark (as it has now). If
>  >  the high mark is reached the worst sources are purged one by one
>  >  until the low mark is reached. If that means puring new sources you
>  >  never connected to so be it. One could just purge the worst purge
>  >  whenever the queue is full but then it might allways be the newfound
>  >  super fast but untred source. With a high/low mark system there is a
>  >  certain time between finding more sources and purging where new
>  >  connects can be tested.
>  >
> ACK
> 
>  >
>  >Additionally I think the penalty for a connection failure for known
>  >dynamic IPs should be very high. Its quite unlikely that a dialin user
>  >has its donkey down for an hour and the start it up again. Its far
>  >more likely he got a new IP.
>  >
> This could perhaps be tried, but I don't think the positive effect would
> be too big. But as you say, this could be one of the priority factors
> after a restart of Mldonkey

A known dynamic IP thats older than 24h (or whatever the known
turnaround time for that range is) should be disgarded. No sense
trying to reach a *.dip.t-dialin.net client 24h since the ISP does a
disconnect and new IP every 24h.

>  >4. All file queues should be in a priority queue. Its priority there
>  >should depend on the top entry of each queue and the file itself. X
>  >connects/second are then tried from that queue.
>  >- each file should have a next_try counter.
>  >- the priority of a file should be factored in. High priority files
>  >  should be tried more often.
>  >- the priority of the top element of each queue should be factored in
>  >
>  >The easiest way would be to increase the next_try counter according to
>  >the files priority and the priority of the top element of that queue
>  >each time the queue is tried.
>  >
> ACK, but there is one problem: The first file (with low priority) has
> 1000 sources. The second file (high priority, very rare) has 20 sources.
> The 20 sources of the second file would be flooded with requests. There
> should be a min_retry_delay for each source, but I think this feature is
> already existant in MLDonkey right now.

At the start both files would get 20 queries alternating between the
files. Then only the first file with the 1000 sources would have
sources left that are ready for a retry for the next say 5
minutes. Than it would alternate again for 20 sources each.

The Idea is to do query sources for blocks at a constant rate. A
steady slow stream. No peaks that mess up download/upload rates.
Also each file should have a steady rate of queries according to its
priorities.
And last each source should have a steady slow rate of queries, or at
least no flooding, implemented by a simple retry delay.

There would be 3 mechanisms at work:
1. space out queries more or less evenly accross files (main queue)
2. space out queries more or less evenly accross a files sources
   (files source queue)
3. don't flood a source (retry delay)

I would implement the files queues with two values: A time when the
source might be queried and a score how good the source is.

The queue should be doubly sorted, once by time (for querying) and
once by score (to purge the worst sources). Every time a source gets
queried the time stamp would be set to now+offset where offset depends
on the score.

>  >5. To prevent a source flooding a queue with only new sources between
>  >the high and low watermark or in other words no new source has been
>  >tried since the last purge of the queue could be locked to not accept
>  >any new sources till all existing once have been tried. Sources that
>  >get a malus on their first try would be removed from the queue, other
>  >kept. When all new sources have been tried and the queue is still at
>  >the high watermark the normal purging is done.
>  >
> MLdonkey proposed, that every new source (no matter how many there are)
> should be tried once, before it is rated. I don't really know which way
> is better...

Yes. Unless there are realy realy a lot of them and we ran out of
ram. A global list of new sources should do the trick. The main loop
could then alternate between old sources and new sources and sort them
into the priority queue once they are identified. New sources that are
dead could be discarded after one try I think.

Split old and new sources as I hinted below:
>  >6. the next_new counter should probably be used for other sources
>  >too. Every time a source disconnects it should get a next_retry_time
>  >from the next_new counter if thats greater than the normal retry time
>  >or the average of both. Alternatively two queues could be used. One
>  >for existing sources and one for new ones and the two could be used
>  >alternating (unless the retry time is not yet reached). Actually I
>  >like that more.
>  >
> ACK
> 
>  >
>  >7. a source that has multiple files should be added to the file with
>  >the highest priority. This could be just the user set priority or
>  >derived from the user set priority, the sparseness of the file/blocks,
>  >the age of the file, the number of active connects and sources for the
>  >file, ... The user set priority should probably be the main facor here
>  >and the other factors should only make up for a bit, say +-10 points
>  >on the user set priority. Setting a user set priority of 5 would mean
>  >prever this but also get others. Setting it to 100 would mean
>  >absolutely get this.
>  >
> How does emule banning work? Does it ban for too frequently asking
> sources at all or does it ban for too frequently asking per source

No clue.

Uptime: 6823 seconds (0+01:53)
                Total seens:               3302
                    eDonkey:                128 (3.88 %)
               old mldonkey:                 22 (0.67 %)
               new mldonkey:                 94 (2.85 %)
                    Overnet:                 38 (1.15 %)
                  old eMule:                  0 (0.00 %)
                  new eMule:               3020 (91.46 %)
                     server:                  0 (0.00 %)
Total filerequests received:              17422
                    eDonkey:                138 (0.79 %)
               old mldonkey:               2275 (13.06 %)
               new mldonkey:               4405 (25.28 %)
                    Overnet:               1801 (10.34 %)
                  old eMule:                  0 (0.00 %)
                  new eMule:               8803 (50.53 %)
                     server:                  0 (0.00 %)
            Total downloads:           45361616
                    eDonkey:           12593119 (27.76 %)
               old mldonkey:            1668096 (3.68 %)
               new mldonkey:                  0 (0.00 %)
                    Overnet:           22785437 (50.23 %)
                  old eMule:                  0 (0.00 %)
                  new eMule:            8314964 (18.33 %)
                     server:                  0 (0.00 %)
              Total uploads:           43614813
                    eDonkey:             485376 (1.11 %)
               old mldonkey:            8088765 (18.55 %)
               new mldonkey:            2552240 (5.85 %)
                    Overnet:                  0 (0.00 %)
                  old eMule:                  0 (0.00 %)
                  new eMule:           32488432 (74.49 %)
                     server:                  0 (0.00 %)
                Total banneds:                 81
                    eDonkey:                  0 (0.00 %)
               old mldonkey:                  0 (0.00 %)
               new mldonkey:                  0 (0.00 %)
                    Overnet:                  0 (0.00 %)
                  old eMule:                  0 (0.00 %)
                  new eMule:                 81 (100.00 %)
                     server:                  0 (0.00 %)

Shouldn't overnet upload work in todays CVS + pango?

What are those 81 banned? Did I bann them? Did they ban me?

MfG
        Goswin

PS: I know the client is only running 2 hours so the stats are crap. :)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]