gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] zero-copy readv


From: Anand Avati
Subject: Re: [Gluster-devel] zero-copy readv
Date: Wed, 09 Jan 2013 22:25:15 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120605 Thunderbird/13.0

(cc'ing gluster-devel)

Bharata, I got back from vacation only today. Apologies for the delay. Please find my reply inline.


On 01/09/2013 07:21 PM, Bharata B Rao wrote:
Avati,

I have some time to work on this item now and I would appreciate any quick
inputs from you on this.

Regards,
Bharata.

On Thu, Jan 03, 2013 at 12:27:33PM +0530, Bharata B Rao wrote:
Hi,

Wish you all a happy new year!

Avati and I had a brief chat regarding zero-copy readv last year and I would
like to spend some effort now on this. To be sure that we aren't duplicating
efforts, I would like to ask if anybody in your org is already working on it.

During my first glance through the code I see that the read data coming in
from the rpc socket is put into an iov which traverses via several translators
before being presented as @iov in glfs_preadv. I can see the read data being
copied onto the user supplied iov in glfs_preadv via iov_copy. So there are
afterall not many copies happening as I had assumed earlier. There is only
one copy in glfs_readv and rest of the xlators (I haven't looked at cluster
xlators though) just work on the iov generated in the rpc layer.

So Avati, when you discussed about zero-copy, did you mean the data from
the rpc socked should be read directly into user supplied iov buffer ? I guess
that's not such an easy thing to do and I am not sure if that is even
preferred.

That is correct. Currently we have one copy (not two or more) which we need to bring down to zero. The extra memory copy can have adverse effects on the system caches (L1/L2/L3) by blowing up all the optimizations performed by the hardware.

Given that I don't see any avenues to reduce the number of iov/buffer copies
in the read path, can you throw some light on any other places in the read
path, where there are redundant copies that could be removed/optimized ?

Implementing zero-copy is required both in the read path and the write path. The underlying principles for implementing zero-copy are actually very similar to those followed by the kernel for implementing O_DIRECT, i.e,

- provide special variants of the read/write fops (new fops?), one which follows the zero copy (direct) path with special iobufs holding pointers to user provided memory and another with regular iobufs into/out of which user data is copied. The new fop does NOT require change in the protocol or affect version compatibility between client and server. Both versions can interoperate (think how NFS protocol is agnostic to O_DIRECT behavior on both client and server side)

- On the write side, things are relatively simpler. Use special iobufs around user provided memory, make it a synchronous write in write-behind (i.e, act like a barrier for previous incomplete writes on the overlapping region and avoid small_write_collapse() optimization on it).

- On the read side things are a little more complicated. In rpc-transport/socket, there is a call to iobuf_get() to create a new iobuf for reading in the readv reply data from the server. We will need a framework changes where, if the readv request (of the xid for which readv reply is being handled) happened to be a "direct" variant (i.e, zero-copy), then the "special iobuf around user's memory" gets picked up and read() from socket is performed directly into user's memory. Similar, but equivalent, changes will have to be done in RDMA (Raghavendra on CC can help). Since the goal is to avoid memory copy, this data will be bypassing io-cache (and purging pre-cached data of those regions along the way).

Hope that helps,
Avati





reply via email to

[Prev in Thread] Current Thread [Next in Thread]