[Gluster-devel] Duplicate request cache (DRC)

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] Duplicate request cache (DRC)

From:	Rajesh Amaravathi
Subject:	[Gluster-devel] Duplicate request cache (DRC)
Date:	Mon, 20 Aug 2012 10:27:58 -0400 (EDT)


In RHS 2.1's road-map is the DRC(hereafter, cache), which has the following 
requirements in Docspace:

NFSv3 Duplicate reply cache for non-idempotent ops,
cluster aware, persistent across reboots. (performance, correctness)

* For persistence across reboots, one needs to implement DRC which caches
  the replies in files. However, this will significantly degrade the overall 
performance of
  non-idempotent operations (write, rename, create, unlink, etc).
  Having an in-memory cache eliminates the overhead of having to write each 
reply
  to persistent storage, but at the obvious cost of losing DRC on crashes and 
reboots.
  AFAIK, the Linux kernel's implementation is currently in-memory only.
  As such, we need to evaluate the actual impact on performance and weigh it 
against
  the advantages of having a persistent cache.

* For Cluster-aware DRC i.e, one where in if a server(say,A) goes down, another 
server (say, B)
  should take up the cache of the A to serve requests on behalf A. For this, 
both A
  and B should have a shared persistent storage for the DRC, along the lines of 
ctdb.
  One way of achieving shared persistent storage would be to simply use a 
gluster volume.

* Cache writes to disk/gluster volume could be the usual two ways: write-back 
and write-through.
  a. write-back: using this would help avoid the delay in waiting for 
synchronous writes to
     the cache, which would be significant given we need to do it for every 
request, and to
     glusterfs nevertheless (1 network round-trip). This actually provides a 
small window for
     failure if a cache write is lost in network transit just after the writing 
server goes down.
  b: write-through: using this would essentially add at least one more network 
round-trip to
     every non-idempotent request. Implementing this, IMO, is not worth the 
performance loss
     incurred.

We could implement the DRC this way:
1. Have the DRC turned OFF by default.
2. Implement DRC in three or five modes:
    * in-memory,
    * local disk cache(cache local to the server) and
    * cluster-aware cache (using glusterfs),
   last two of which could be write-back or write-through.
3. We also need to empirically derive an optimal default value for the cache 
size for each mode.


Choice of data structures i have in mind:
1. For in-memory:
   Two tables/hashed-tables of pointers to cached replies, one sorted/hashed on 
{XID,Client-Hostname/IP} pair,
   the other on time(for LRU eviction of cached replies). Considering that
   n(cache look-ups) >> n(cache hits), we need the fastest look-ups possible. I 
need suggestions
   for faster data structures for look-ups.

2. For on-disk storage of cache replies, i was thinking of a per-client 
directory with each
   reply being stored in a separate file and XID being the file name. This 
makes is easy for
   retrieval of cached replies by the fail-over server(s).
   One problem in cluster-aware drc with this approach is that if we have two 
clients from
   the same machine connected to different servers, XIDs may collide.
   This can be avoided by having the server ip/fqdn appended to the XIDs as 
file names.
   Also, having to cache multiple replies in one single file would be 
cumbersome.

We will start with in-memory implementation and proceed 
to further modes. 
I look forward to suggestions for changes and improvements on the design.

Thanks & Regards, 
Rajesh Amaravathi, 
Software Engineer, GlusterFS 
Red Hat

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] Duplicate request cache (DRC), Rajesh Amaravathi <=

Prev by Date: Re: [Gluster-devel] add-brick
Next by Date: [Gluster-devel] question on qemu gluster
Previous by thread: [Gluster-devel] glusterfs client crashed,help!
Next by thread: [Gluster-devel] question on qemu gluster
Index(es):
- Date
- Thread