Re: [Gluster-devel] Re: [bug #19614] System crashes when node fails even

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Re: [bug #19614] System crashes when node fails even

From:	Brent A Nelson
Subject:	Re: [Gluster-devel] Re: [bug #19614] System crashes when node fails even with xfr
Date:	Fri, 11 May 2007 12:50:52 -0400 (EDT)

I haven't seen this issue since before patch-164. If it makes sense toyou that that patch might have fixed it, you can probably consider thefailed reconnect (requiring kill -9 to glusterfs) closed. I think I wouldhave run across it by now if the issue was still present. I just tried aquick test, and it did fine (except for the issue discussed in the nexttwo paragraphs).

Reconnect has been very nice ever since, except for the other issue Idescribed (which has been there since the initial reconnect patches, Ibelieve): after a disconnect (say, kill and restart glusterfsd), theclient may not reconnect until the next I/O attempt (which is reasonable).That next I/O attempt (such as a df or ls) will trigger a reconnect, butthe I/O that triggered it will get an error rather than waiting for thereconnect to complete, issuing the I/O request, and getting the validresult. The next I/O attempt will work fine.

So, it seems that, if there's an I/O request when nodes that would affectthe I/O are in a disconnected state, reconnect should be given a moment tosucceed before returning an error for the I/O. If the reconnect succeeds,go ahead and do the I/O and return the result of it.


Or perhaps there's a better way to handle it?

Thanks,

Brent

On Fri, 11 May 2007, Krishna Srinivas wrote:

Hi Brent,

Did you see that problem again? what was the kind of setup
you were using? I am not sure which part of the code might
have caused the problem. Further details regarding the setup
will help.

Thanks
Krishna

On 5/8/07, Brent A Nelson <address@hidden> wrote:

I just had two nodes go down (not due to GlusterFS).  The nodes were
mirrors of each other for multiple GlusterFS filesystems (all unify on top
of afr), so the GlusterFS clients were understandably unhappy (one of the
filesystems was 100% served by these two nodes, others were only
fractionally served by the two nodes).  However, when the two server nodes
were brought back up, some of the client glusterfs processes recovered,
while others had to be kill -9'ed so the filesystems could be remounted
(they were blocking df and ls commands).

I don't know if it's related to the bug below or not, but it looks like
client reconnect after failure isn't 100%...

This was from a tla checkout from yesterday.

Thanks,

Brent

On Mon, 7 May 2007, Krishna Srinivas wrote:

> Hi Avati,
>
> There was a bug - when the 1st node went down, it would cause
> problem. This bug might be the same, the bug reporter has
> not given enough details to confirm though. We can move the
> bug to unreproducible or fixed state.
>
> Krishna
>
> On 5/6/07, Anand Avati <address@hidden> wrote:
>>
>> Update of bug #19614 (project gluster):
>>
>>                 Severity:              3 - Normal => 5 - Blocker
>>              Assigned to:                    None => krishnasrinivas
>>
>>     _______________________________________________________
>>
>> Follow-up Comment #1:
>>
>> krishna,
>>   can you confirm if this bug is still lurking?
>>
>>     _______________________________________________________
>>
>> Reply to this item at:
>>
>>   <http://savannah.nongnu.org/bugs/?19614>
>>
>> _______________________________________________
>>   Message sent via/by Savannah
>>   http://savannah.nongnu.org/
>>
>>
>
>
> _______________________________________________
> Gluster-devel mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] [bug #19614] System crashes when node fails even with xfr, Anand Avati, 2007/05/06
- [Gluster-devel] Re: [bug #19614] System crashes when node fails even with xfr, Krishna Srinivas, 2007/05/07
  - Re: [Gluster-devel] Re: [bug #19614] System crashes when node fails even with xfr, Brent A Nelson, 2007/05/07
    - Re: [Gluster-devel] Re: [bug #19614] System crashes when node fails even with xfr, Krishna Srinivas, 2007/05/11
    - Re: [Gluster-devel] Re: [bug #19614] System crashes when node fails even with xfr, Brent A Nelson <=
    - Re: [Gluster-devel] Re: [bug #19614] System crashes when node fails even with xfr, Anand Avati, 2007/05/11
    - Re: [Gluster-devel] Re: [bug #19614] System crashes when node fails even with xfr, Brent A Nelson, 2007/05/11
- [Gluster-devel] [bug #19614] System crashes when node fails even with xfr, Anand Avati, 2007/05/22

Prev by Date: Re: [Gluster-devel] Plans for FreeBSD support?
Next by Date: Re: [Gluster-devel] Re: [bug #19614] System crashes when node fails even with xfr
Previous by thread: Re: [Gluster-devel] Re: [bug #19614] System crashes when node fails even with xfr
Next by thread: Re: [Gluster-devel] Re: [bug #19614] System crashes when node fails even with xfr
Index(es):
- Date
- Thread