[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gluster-devel] Server Side AFR gets transport endpoint is not conne
From: |
Krishna Srinivas |
Subject: |
Re: [Gluster-devel] Server Side AFR gets transport endpoint is not connected |
Date: |
Sat, 30 Aug 2008 22:18:28 +0530 |
James,
It is planned for the later releases of 1.4.
Let us wait for Avati's reply regarding the timeframe.
Krishna
On Thu, Aug 28, 2008 at 7:03 PM, James E Warner <address@hidden> wrote:
> Thanks for the prompt reply. One final question.... is the HA translator
> still planned for the upcoming 1.4 release and if not do you have a rough
> idea of what release it is going into?
>
> Thanks Again,
>
> James Warner
> Computer Sciences Corporation
> Registered Office: 3170 Fairview Park Drive, Falls Church, Virginia 22042,
> USA
> Registered in Nevada, USA No: C-489-59
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> This is a PRIVATE message. If you are not the intended recipient, please
> delete without copying and kindly advise us by e-mail of the mistake in
> delivery.
> NOTE: Regardless of content, this e-mail shall not operate to bind CSC to
> any order or other contract unless pursuant to explicit written agreement
> or government initiative expressly permitting the use of e-mail for such
> purpose.
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
>
>
> "Krishna
> Srinivas"
> <address@hidden To
> h.com> James E Warner/DEF/address@hidden
> Sent by: cc
> krishna.srinivas@ address@hidden
> gmail.com Subject
> Re: [Gluster-devel] Server Side AFR
> gets transport endpoint is not
> 08/28/2008 01:03 connected
> AM
>
>
>
>
>
>
>
>
>
> On Thu, Aug 28, 2008 at 12:45 AM, James E Warner <address@hidden> wrote:
>>
>> Hi,
>>
>> I'm currently testing gluster to see if I can make it work for our HA
>> filesystem needs. And in initial testing things seem to be very good
>> especially with client side AFR performing replication to our server
> nodes.
>> However, we would like to keep our client network free of replication
>> traffic so I set up server side afr with three storage bricks replicating
>> data between themselves and round robin DNS for the node failover. The
>> round robin dns is working and the failover between the nodes is kind of
>> working, but if I pull the network cable on the currently active server
>> (the host that the glusterfs client is connected to) the next filesystem
>> operation (such as ls /mnt/glusterfs) fails with a "transport endpoint is
>> not connected" error. Similarly, if I have a large copy operation in
>> progress the copy will exit with a failure. All of the operations after
>> that work fine and netstat shows that the node has failed over to the
> next
>> server in the list, but by that point I the current file system operation
>> has failed. Anyway, this leads me to a few questions:
>>
>> 0. Do my config files look OK or does it look like I've configured this
>> thing incorrectly? :)
>> 1. Is this the expected behavior or is this a bug? From reading the
>> mailing list I had the impression that on failure the operation would be
>> tried on the remaining ip's that were cached in the clients list, so I
> was
>> surprised that the operation failed and I think that it is probably a
> bug,
>> but I could see an argument for how this might be considered normal
>> operation.
>
> That is the expected behavior.
>
>>
>> 2. If this is expected behavior is there any plan to change the behavior
>> in the future or is server side AFR always expected to work this way?
> I've
>> seen references to round robin dns being an interim measure on the
> mailing
>> list, so I'm not sure if there is another translator in the works or not.
>> If there is something in the works is that available in the current
>> glusterfs 1.4 snapshot releases or is that planned for a much later
>> version?
>
> Yes we plan to bring in a HA translator which will make this work fine.
>
>>
>> 3. Can you think of any option that I might have missed that would
> correct
>> the problem and allow the currently running file operation to succeed
>> during a failover?
>>
>> 4. Once again if this is as designed can you explain the reason that it
>> works this way? As I said I really expected it to transparently failover
>> in much the same way that client side afr seems to, so I was surprised
> that
>> it didn't.
>
> If AFR is on client side, it will maintain connections to its
> subvolumes separately.
> So if one node fails, it will still have connection to other subvols.
> However if AFR
> is on server side and the server goes down, it can not do anything about
> it.
> Now if we bring HA xlator into picture, it sits on the client and it
> can take care
> of seamless failure transition when the connection fails.
>
>>
>> Since I hope that this is a bug, the configuration files and the relevant
>> sections of the client log are below. I have used this configuration on
>> the gluster 1.3.11 version and the latest snapshot from August 27, 2008.
>>
>> Client Log Snippet:
>> ================
>>
>> 2008-08-27 12:53:34 D [fuse-bridge.c:839:fuse_err_cbk] glusterfs-fuse:
> 62:
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>