Re: [Gluster-devel] Improving real world performance by moving files clo

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Improving real world performance by moving files clo

From:	gordan
Subject:	Re: [Gluster-devel] Improving real world performance by moving files closer to their target workloads
Date:	Fri, 16 May 2008 15:36:51 +0100 (BST)
User-agent:	Alpine 1.10 (LRH 962 2008-03-14)

On Fri, 16 May 2008, Derek Price wrote:

address@hidden wrote:
Isn't that effectively the same thing? Unless there is quorum, DLM locksout the entire FS (it also does this when a node dies, until it getsdefinitive confirmation that it has been successfully fenced). For normalfile I/O all nodes in the cluster have to acknowledge a lock before it canbe granted.
Why? It requires a meta-data cache, but as long as every node in the quorumstores a given file's most recent revision # when any lock is granted, evenif it doesn't actually sync the file data, then any quorum should be able toagree on what the version number of the most up-to-date copy of a file is.All nodes are required to report only if you assume that any given file has asmall number of "owners" and that the querier doesn't know who the owner is.


That's to do with file versioning, not locking, though. What am I missing?

To remain fault tolerant, this requires that servers make some effort to stayup-to-date with the meta-data cache, but maybe this could be dealt withefficiently with the DHT someone else brought up?

I'm not sure that so much metadata caching is actually necessary. If afile open brings the file to the local machine (this cannot be guaranteedbecause the local machine may be out of space, and it may be unable tofree space by expunging an old file due to that file not being redundantenough in the network), then the metadata of that file, being attached tothe file, is implicitly "cached". But this isn't really caching at all -it's migration.


The algorithm for opening a file might be as follows:
1) node broadcasts/multicasts an open request to all peers

2) peers that have the file available respond with the metadata(size, version, etc) they have and possibly their current load (to assistwith load balancing by fetching the file from the least loaded peer)3.1) if the file is available locally, agree a lock with other nodes, anduse it.3.2) if the file is not available locally, but there is enough space,fetch it and do 3.1)3.3) if there isn't enough space locally to fetch the file, see if enoughspace can be freed. If this succeeds, do 3.2)3.4) if space cannot be freed, use the file remotely from the least loadedpeer.

Expunging algorithm would be similar - broadcast a file status request(similar to 1) above). If enough nodes respond with the latest version ofthe file (set some threshold depending on how much redundancy isrequired), the local file can be be removed and the space freed for a filethat is more useful locally. This shouldn't really happen until the localdata store starts to get full.

Locking could be handled somewhat lazily - a lock request gets broadcastand as long as quorum peers respond, and there are no peers saying "no, Ihave that lock!", the lock can be granted. A lock can have TTL (in case anode dies while holding a lock), and the refresh should be expected if thenode expects to keep the lock. This could be used to speed up locking(each node would have a list of currently valid locks, without the need tocheck explicitly, for example - it would only need to broadcast alock-request when it looks like the lock can be granted).

For file delta writes, an AFR type mechanism could be used to send thedeltas to all the nodes that have the file. This could all get quitetricky, because it might require a separate multicast group to be set upfor up to every node combination subset, in order to keep the networkbandwidth down (or you'd just end up broadcasting to all nodes, whichmeans things wouldn't scale as switches should, it'd be more like usinghubs).

This would potentially have the problem that there is only 24 bits ofIP multicast address space, but that should provide enough groups withsensible redundancy levels to cover all node combinations. This may or maynot be way OTT complicated, though. There is probably a simpler and moresane solution.


Gordan

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Gluster-devel] Improving real world performance by moving files closer to their target workloads, (continued)

Prev by Date: Re: [Gluster-devel] Improving real world performance by moving files closer to their target workloads
Next by Date: Re: [Gluster-devel] Improving real world performance by moving files closer to their target workloads
Previous by thread: Re: [Gluster-devel] Improving real world performance by moving files closer to their target workloads
Next by thread: Re: [Gluster-devel] Improving real world performance by moving files closer to their target workloads
Index(es):
- Date
- Thread