[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Nel] Data Ownership
From: |
John Hayes |
Subject: |
[Nel] Data Ownership |
Date: |
Wed, 30 Jan 2002 23:35:59 -0500 |
Who owns the data? Where is the data? Who's using the data?
Really, at the end of the day, the database owns the data - but that
isn't good enough for some applications. Databases can be difficult to
partition, they have some innefficient locking semantics, and the
built-in 4GLs are interpretted and unsuitable for some applications ...
like 3D math. Databases are really good at writing stuff to disk,
providing fast online remodelling, online backups/restores and
validation. Doing these things well is pretty hard, so server
applications (especially games) create partitions into the database and
an application server. The application server takes over some of the
game logic, locking and interactions and scaling across multiple
machines. But what are some of the ways application servers can be
implemented and what are the consequences?
---
Load Balancing: Symmetric vs. Asymetric
Symmetric load Balancing is like Symmetric Multi-Processing, every
processor (or server) is equally capable of processing any request. To
load balance this system, you either move connections around from one
server to the next - or in a connectionless system like HTTP you fire
each connection to the least balanced machine. The principal assumptions
that make this system efficient are:
1. More objects are read than written - there is usually a core of
objects that are not written at all, the static state.
2. A connection access a small pool of data. This is true for many
business applications where a person only looks at the invoices/work
orders/accounts that interest or involve them. In turn, these objects
touch only a few other objects or draw from the static state.
3. Servers in general don't care about new objects created on other
servers.
4. Objects do not have to broadcast their changes to a large number of
other objects.
To make it work, there must be a central locking server - this can be
inside the database or another application server. Since any server can
vie for modification of an object (being symmetrical) some entity must
act as the gatekeeper to prevent two transactions from clobbering each
other.
The benefits of such a system are:
1. Flexibility - it's very easy to add and remove servers to the
cluster. Connections can be easily migrated to other servers through
whatever load balancing mechanism is present.
2. Simplicity - the application servers don't talk to each other or
otherwise coordinate their activities.
3. Speed - if the assumptions hold true, a single transaction only has
to leave the application server to talk to the database (to write
changes) and the lock server (to coordinate with other servers).
The downside is scalability. As application servers are added, the
amount of additional useful memory does not increase proportionally. If
the size of the static state or the read objects is very large compared
to the size modified objects, read-only objects will consume most of the
memory. Since objects are arranged randomly and not according to their
relationships this is likely to be the case with a random distribution
of connections. Also, as servers are added, the probability for lock
collisison increases exponentially - the lock server becomes a
bottleneck.
---
Asymmetric load balancing places each object in a server where it
"lives" and processes transactions. The server a particular object lives
on is often decided statically, but the goal is to place an object on
the same server as other objects it interacts with. Games will often use
geography, since a character is much less likely to interact with
another character far away as a near one. The user's connection follows
the object they want to control.
The assumptions that make this work well:
1. It is possible to determine which objects will interact most often
with which other objects.
2. The user will not frequently switch objects or won't frequently
switch objects away from what's nearby.
3. Communication between objects can be well expressed with lage
commands (instead of many small commands)[1].
The great thing about this system is every object (with the exception of
the objects that form your static state) lives on a particular server.
Unlike the Symmetric load balancing, which has a centralized lock server
the coordinate activities, each server manages it's own locking. In it's
place, is a locator service which keeps a directory matching objects to
servers[2].
Load on the database server and memory requirements are smaller for
asynchronous servers becasue there's only one copy of any object. This
means it's only been loaded once *and* you can guarantee that this copy
is always the most recent. In the symmetric scheme, objects are loaded
on every server that happens to read them and if it changes on another
server it must be reread from the database becasue the application
servers do not talk to their peers.
The price for this wonderment is complexity. If you can imagine a
client-to-server protocol for stimulating objects, there also has to be
a server-to-server protocol for when one object wants to use another
that happens to live on another server. If the server's fail to
partition their objects efficiently, these server-to-server
conversations increase exponentially as do the transaction processing
times. Symmetrical servers guarantee that all application processing for
a server happens on a single node - the maximum for an asymmetrical
server is each call is remoted, the optimal case is as good and the
worst case is way worse. Even on a very high speed LAN, a single network
call adds an order of magnitude to the response time when compared to an
entirely local transaction.
The system as a whole is more fragile than symmetric load balancing.
Every server hold a disjoint part of the state, so if one of them fails
- you lose all of the state on that server until whatever management is
present can restore it. A symmetric system is more fluid - all can be
restored by the very next transaction. This assumes a "shared nothing"
partitioning; an alternative is doing replication to one of the other
servers, but the demon complexity is back again.[3]
---
John
[1] This is always good interface design - so nothing like a *huge*
performance issue to enforce it
[2] Another possibility is having a copy of the directory on every
server - then every server has to update every other server. If you
don't have many objects and they don't move very often this can save a
single point of failure.
[3] AD, NDS and Oracle all do a shared data ownership scheme - the
master server operates on the data and replicates to the backup server.
In these system the cost of replication is pretty small since reads by
far outnumber writes.
<<winmail.dat>>
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Nel] Data Ownership,
John Hayes <=