mldonkey-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Mldonkey-users] HDD use


From: Brett Dikeman
Subject: Re: [Mldonkey-users] HDD use
Date: Sat, 15 Mar 2003 23:15:34 -0500

At 1:53 PM +0100 3/13/03, Ficlu wrote:

I am wondering about the use of the disk by the mldonkey client. A friend of
mine (a pc sailer) said me that a lot of p2p users crashe their disk because
the client makes too many access to the disk.

This is pure BS- the head motion involved in, say, launching an application(especially if the system is low on memory and the drive fragmented), is certainly more than the activity generated by a p2p client, which reads and writes in a very predictable pattern. Assuming no fragmentation, almost everything a p2p client does is very linear. Hashing a chunk is a perfect example- you read start to finish one chunk. Little head motion is involved.

Desktop drives are designed for linear access- server market/enterprise drives are optimized for random access(ie, databases.) If you've got a desktop drive, chances are your drive will favor leaving the head where it is after a read or write, since chances are you're going to write another block nearby. This assumption is almost -always- correct with a file transfer, unless the disk is fragmented. This is why, for example, an IDE drive on a file server will appear to offer better performance with only one client than its much more expensive SCSI cousin. However, toss 10 clients at the IDE-powered server, or fragment the drive, and watch it slow to a crawl; the IDE drive's firmware isn't designed for that sort of thing, and the SCSI drive will wipe the floor with it. Give me 4 SCSI drives and a SCSI raid card, and I'll embarrass the pants off your IDE RAID card w/4 IDE drives, on damn near everything except sequential IO.

Things are changing, however- with the market/economy in a crunch, IDE is expanding upscale, and we're seeing IDE drives aimed at the low-end server market, with higher MTBF's and enterprise-class firmware from their SCSI cousins- to simpify, only the interface is different.

 After 6 months of 24h/day of
utilization, a kazaa client could destroy the disk (it happened for a
significant number of persons and that's the reason I am wondering about the
mldonkey client case).

Pure BS; your reseller friend is picking the wrong common factor among the users. A gross analogy is "everyone who eats tomatoes dies!" It's a very common problem for people to pick the wrong common factor in trying to figure out why something happens.

I suspect what is a greater factor is p2p power users may have more than one drive- and each time you add a drive, you increase your chances of drive failure; this is why plain RAID striping is almost never done anymore. As other people mentioned, folks running p2p apps might also leave their computers on longer. It has nothing to do with the drive not being "designed" to run continuously- they're simply not designed to have as high a MTBF as server/enterprise drives.

Basically, take the MTBF of your drives(for simplicity, we'll assume they're all the same) and divide by the number of drives. That's your effective MTBF for the cluster as a whole- one of them will kick the bucket approximately by that time.

Combine longer running with more drives(say, two drives, 24x7 instead of one drive, 8x5), and you set yourself up for much higher chances of failure(almost 8x more likely.)

Dirty Harry said it best- "Are you feelin' lucky, punk?  Well, are ya?" :-)


 Is there any policy in the mldonkey client when
writing on the disk ? I know that linux can be configured to synchronize the
buffer cache with the disk less often

What you are referring to is bdflush. On laptops, some users prefer to run it with a larger time period between flushes to minimize clashes between the drive's power saving mechanisms and the repetitive flushing. Buffers are not to be confused with disk cache.

What affects mechanism movement more is the disk IO elevator...its goal is to cluster reads and writes in a queue, and delay requests based on what the drive is currently doing(reading or writing) since it is 'expensive' to switch between reading and writing. -Grossly- simplified example:

r w r w r w r w
rrrr wwww


Notice one is significantly shorter than the other. For ALL non-zero values for an expense of switching, the first scenario will always be slower --overall--, which is why Linux tries to group types of IO. Also, in this particular example, ALL the requests, with the exception of the first write, were executed sooner in the second example!(this is due mostly to the fact that in my example a 'switch' is as expensive as a single IO transaction- I don't think that is a realistic assumption on my part, so there's your caveat.)

The maximum time limit on the age of a disk IO request can be tuned, but it's best left alone unless you really know what you're doing; I believe in 2.4 it is even self-tuning to some degree, but I'm not sure about that.

IMHO mldonkey shouldn't do anything specific or special with regard to IO- it is the job of the operating system(and its drivers) to decide what the 'best' way is. If you start 'tuning', you also set yourself up to having to re-tune later when things change- or making things worse for situations you didn't think of or dismissed.

Case and point is the 2.5 kernel- the anticipatory disk IO scheduler that has been introduced behaves rather differently from 2.4's scheduler. Wouldn't it suck to optimize for 2.4, then find your tricks don't work for 2.5, maybe even make things worse? Whoooops.

Brett
--
----
"They that give up essential liberty to obtain temporary
safety deserve neither liberty nor safety." - Ben Franklin
http://www.users.cloud9.net/~brett/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]