gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] HA translator questions


From: Martin Fick
Subject: Re: [Gluster-devel] HA translator questions
Date: Thu, 1 Jan 2009 11:17:41 -0800 (PST)

--- On Thu, 1/1/09, Krishna Srinivas <address@hidden> wrote:
> <address@hidden> wrote:
> > --- On Thu, 1/1/09, Krishna Srinivas
> <address@hidden> wrote:
> >>
> >> <address@hidden> wrote:
> > Hmm, I don't see this looping on failure in the
> code, but my understanding of the translator design is
> fairly minimal.  I will have to look harder.  I was hoping
> to be able to modify the subvolume looping to be able to
> loop back upon itself indefinitely if all the subvolumes
> failed.  If this could be done, it seems like this would be
> an easy way to achieve NFS style blocking when the server is
> down (see my other thread on this), by simply using the HA
> translator with only one subvolume.
> 
> Just curious, why do you want the application to hang till
> the server comes back up? the indefinite hang is not desirable to most
> users. 

Because very few applications are written to recover from intermittent errors.  
Once they see an error, they give up.  If you picture a bunch of clients 
relying on the FS on the server, if the server crashes they will likely all be 
hosed.  But since the client machines did not crash, they will likely never 
recover until someone reboots them.  Simply hanging and recovering when the 
server comes up is an essential feature for most networked filesystem clients.

> In case of NFS if the NFS server is down, won't the client
> error out saying that server is down?

No, it will hang indefinitely until the server comes up.  The clients will 
therefor not fail and simply continue along their own business as usual when 
the server returns with only a delay, no errors, no application 
restarts/reboots required.


> > Also, how about failure due to replies that do not
> > return because the link is down?  Are the requests saved
> > after they are sent until the reply arrives so that it can
> > be resent on the other link if the original link
> > successfully sends the request, but goes down afterwards and
> > cannot receive the reply?
> 
> > Yes requests are saved so that it can be retried on other
> > subvol if the current subvol goes down during operaion.

Cool, this brings up one last extreme corner case that concerns me with this.  
What if client A sends a write request to file foo through HA to subvolume 1 
and the link goes down after subvolume 1 services the request but before it can 
successfully reply that it has completed the write?  In this case you have 
confirmed that client A will retry on subvolume 2.  If subvolume 1 & 2 share 
the same backend, the write to file foo will already have taken place at this 
point.  This might make it possible for client B to read from file foo and 
write something new to it before the HA translator's client A write request to 
file foo is resent on subvolume 2.  When this resend from client A finally 
makes it to subvolume 2, it could then potentially rewrite the original write 
from client A on file foo overwriting client B's write which depended on client 
A's first write.

Is the scenario above possible?  Or would both subvolume 1 & 2 somehow know not 
to process client B's write request until they know that client A has received 
an ACK for it's original write request and therefor is not going to resend it?  
I know that this is somewhat of a far fetched corner case, but if this is 
possible, I believe that unfortunately this would be non-posix compliant 
behavior.  This is the same concern I had with case #3 in my proposed fixes on 
my NFS blocking thread.  Make sense at all?

I wonder how NFS deals with a similar potential problem?  It seems like this 
(case #3, not the HA case) might be possible with NFS also unless it keeps 
track of all writes that it knows the client hasn't received an ACK to yet, and 
does not allow other writes to the same place until then?

Thanks again,

-Martin



      




reply via email to

[Prev in Thread] Current Thread [Next in Thread]