gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] AFR setup with Virtual Servers crashes


From: Urban Loesch
Subject: [Gluster-devel] AFR setup with Virtual Servers crashes
Date: Thu, 10 May 2007 12:15:13 +0200
User-agent: Thunderbird 1.5.0.10 (X11/20070302)

Hi,

I'm new to this list.
First: sorry for my bad english.

I was searching for some easy and transparent Clusterfilesystem with failover feature and I found on Wikipedia the GlusterFS project. It's a nice project and tried it on my test environment. I thought when it works good I use it in production too.

A very nice feature for me is the AFR setup. So I can replicate all the data over 2 Servers in RAID-1 Mode. But it seems that I make something wrong, because the "glusterfsd" crashes on both nodes.
But let me explain form the beginning.

Here's my setup:
Hardware:
2 different servers for storage
1 server as client
On top of the server I use a virtual server setup (details http://linux-vserver.org).

OS:
Debian Sarge with self compiled 2.6.19.2 (uname -r 2.6.19.2-vs2.2.0) and latest stable virtual server patch.
glusterfs-1.3.0-pre3.tar.gz

What I'm trying to do:
- Create a AFR Mirror over the 2 Servers.
- Mount the Volume on Server 3 (Client).
- Install on the mounted volume the hole virtual Server with Apache, MySql and so on.
So I have a full redundant Virtual Server mirrored over two bricks .

Here my current confuguration:
- Serverconfig on Server 1 (brick)

### Export volume "brick" with the contents of "/home/export" directory.
volume brick
 type storage/posix                   # POSIX FS translator
 option directory /gluster        # Export this directory
end-volume

### File Locking
volume locks
 type features/posix-locks
 subvolumes brick
end-volume

### Add network serving capability to above brick.
volume server
 type protocol/server
 option transport-type tcp/server     # For TCP/IP transport
option listen-port 6996               # Default is 6996
 subvolumes locks
 option auth.ip.locks.allow *         # access to "brick" volume
end-volume

- Serverconfig on Server 2 (brick-afr)
### Export volume "brick" with the contents of "/home/export" directory.
volume brick-afr
 type storage/posix                   # POSIX FS translator
 option directory /gluster-afr        # Export this directory
end-volume

### File Locking
volume locks-afr
 type features/posix-locks
 subvolumes brick-afr
end-volume

### Add network serving capability to above brick.
volume server
 type protocol/server
 option transport-type tcp/server     # For TCP/IP transport
option listen-port 6996               # Default is 6996
 subvolumes locks-afr
 option auth.ip.locks-afr.allow *         # access to "brick" volume
end-volume

- Clientconfiguration on Server 3 (
### Add client feature and attach to remote subvolume of server1
volume brick
 type protocol/client
 option transport-type tcp/client     # for TCP/IP transport
 option remote-host 192.168.0.1      # IP address of the remote brick
 option remote-port 6996              # default server port is 6996
 option remote-subvolume locks        # name of the remote volume
end-volume

### Add client feature and attach to remote subvolume of brick1
volume brick-afr
 type protocol/client
 option transport-type tcp/client     # for TCP/IP transport
 option remote-host 192.168.0.2      # IP address of the remote brick
 option remote-port 6996              # default server port is 6996
 option remote-subvolume locks-afr        # name of the remote volume
end-volume

### Add AFR feature to brick
volume afr
 type cluster/afr
 subvolumes brick brick-afr
 option replicate *:2                 # All files 2 copies (RAID-1)
end-volume

----------------------------------------------------------------------------------------------------------------------
I started the two Bricks in debug mode and it starts without problems.

- Server1
glusterfsd --no-daemon --log-file=/dev/stdout --log-level=DEBUG
....
[May 10 11:52:11] [DEBUG/proto-srv.c:2919/init()] protocol/server:protocol/server xlator loaded [May 10 11:52:11] [DEBUG/transport.c:83/transport_load()] libglusterfs/transport:attempt to load type tcp/server [May 10 11:52:11] [DEBUG/transport.c:88/transport_load()] libglusterfs/transport:attempt to load file /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/server.so

- Server2
glusterfsd --no-daemon --log-file=/dev/stdout --log-level=DEBUG
....
[May 10 11:51:44] [DEBUG/proto-srv.c:2919/init()] protocol/server:protocol/server xlator loaded [May 10 11:51:44] [DEBUG/transport.c:83/transport_load()] libglusterfs/transport:attempt to load type tcp/server [May 10 11:51:44] [DEBUG/transport.c:88/transport_load()] libglusterfs/transport:attempt to load file /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/server.so
------------------------------------------------------------------------------------------------------------------------------

So far so good.

After I mounted the volume on server 3 (client). It mounts without any problems. glusterfs --no-daemon --log-file=/dev/stdout --log-level=DEBUG --spec-file=/etc/glusterfs/glusterfs-client.vol /var/lib/vservers/mastersql
...
[May 10 13:59:00] [DEBUG/client-protocol.c:2796/init()] protocol/client:defaulting transport-timeout to 120 [May 10 13:59:00] [DEBUG/transport.c:83/transport_load()] libglusterfs/transport:attempt to load type tcp/client [May 10 13:59:00] [DEBUG/transport.c:88/transport_load()] libglusterfs/transport:attempt to load file /usr/lib/glusterfs/1.3.0-pre3/transport/tcp/client.so [May 10 13:59:00] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp: :try_connect: socket fd = 8 [May 10 13:59:00] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp: :try_connect: finalized on port `1022' [May 10 13:59:00] [DEBUG/tcp-client.c:255/tcp_connect()] tcp/client:connect on 8 in progress (non-blocking) [May 10 13:59:00] [DEBUG/tcp-client.c:293/tcp_connect()] tcp/client:connection on 8 still in progress - try later

OK. Nice.
A short check on the client:
df -HT
Filesystem    Type     Size   Used  Avail Use% Mounted on
/dev/sda1     ext3      13G   2.6G   8.9G  23% /
tmpfs        tmpfs     1.1G      0   1.1G   0% /lib/init/rw
udev         tmpfs      11M    46k    11M   1% /dev
tmpfs        tmpfs     1.1G      0   1.1G   0% /dev/shm
glusterfs:24914
             fuse     9.9G   2.5G   6.9G  27% /var/lib/vservers/mastersql

Wow it works. Now I can add, remove or edit files and directories without problems. The file are written to all two bricks without problems. Performance is good too.

But then I tried to start my virtual Server (called mastersql).
The virtual server not starts and I get the a lot of following debug output on the client:

[May 10 14:04:43] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp: :try_connect: socket fd = 4 [May 10 14:04:43] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp: :try_connect: finalized on port `1023' [May 10 14:04:43] [DEBUG/tcp-client.c:255/tcp_connect()] tcp/client:connect on 4 in progress (non-blocking) [May 10 14:04:43] [DEBUG/tcp-client.c:293/tcp_connect()] tcp/client:connection on 4 still in progress - try later [May 10 14:04:43] [ERROR/client-protocol.c:204/client_protocol_xfer()] protocol/client:transport_submit failed [May 10 14:04:43] [DEBUG/client-protocol.c:2604/client_protocol_cleanup()] protocol/client:cleaning up state in transport object 0x8076cf0 [May 10 14:04:43] [DEBUG/tcp-client.c:174/tcp_connect()] transport: tcp: :try_connect: socket fd = 7 [May 10 14:04:43] [DEBUG/tcp-client.c:196/tcp_connect()] transport: tcp: :try_connect: finalized on port `1022' [May 10 14:04:43] [DEBUG/tcp-client.c:255/tcp_connect()] tcp/client:connect on 7 in progress (non-blocking) [May 10 14:04:43] [DEBUG/tcp-client.c:293/tcp_connect()] tcp/client:connection on 7 still in progress - try later [May 10 14:04:43] [ERROR/client-protocol.c:204/client_protocol_xfer()] protocol/client:transport_submit failed [May 10 14:04:43] [DEBUG/client-protocol.c:2604/client_protocol_cleanup()] protocol/client:cleaning up state in transport object 0x80762d0

The two mirrorservers are crashing with the following debug code:

[May 10 11:54:26] [DEBUG/tcp-server.c:134/tcp_server_notify()] tcp/server:Registering socket (5) for new transport object of 192.168.0.3 [May 10 11:55:22] [DEBUG/proto-srv.c:2418/mop_setvolume()] server-protocol:mop_setvolume: received port = 1022 [May 10 11:55:22] [DEBUG/proto-srv.c:2434/mop_setvolume()] server-protocol:mop_setvolume: IP addr = *, received ip addr = 192.168.0.3 [May 10 11:55:22] [DEBUG/proto-srv.c:2444/mop_setvolume()] server-protocol:mop_setvolume: accepted client from 192.168.0.3

Trying to set: READ Is grantable: READ Inserting: READTrying to set: UNLOCK Is grantable: UNLOCK Conflict with: READTrying to set: WRITE Is grantable: WRITE Inserting: WRITETrying to set: UNLOCK Is grantable: UNLOCK Conflict with: WRITETrying to set: WRITE Is grantable: WRITE Inserting: WRITETrying to set: UNLOCK Is grantable: UNLOCK Conflict with: WRITETrying to set: WRITE Is grantable: WRITE Inserting: WRITE[May 10 12:00:09] [CRITICAL/common-utils.c:215/gf_print_trace()] debug-backtrace:Got signal (11), printing backtrace [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/usr/lib/libglusterfs.so.0(gf_print_trace+0x2e) [0xb7f53a7e] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:[0xb7f60420] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so [0xb75d1192] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/usr/lib/glusterfs/1.3.0-pre3/xlator/protocol/server.so [0xb75cded7] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/usr/lib/libglusterfs.so.0(transport_notify+0x1d) [0xb7f54ecd] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/usr/lib/libglusterfs.so.0(sys_epoll_iteration+0xe9) [0xb7f55b79] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/usr/lib/libglusterfs.so.0(poll_iteration+0x1d) [0xb7f54f7d] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:glusterfsd [0x804924e] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xc8) [0xb7e17ea8] [May 10 12:00:09] [CRITICAL/common-utils.c:217/gf_print_trace()] debug-backtrace:glusterfsd [0x8048c51]
Segmentation fault (core dumped)

It seems that there are come conflicts with "READ, WRITE, UNLOCK". But I'm not an expert on filesystems an locking features.

As you can see the filesystem is just mounted but not connected to the two bricks.
df -HT
Filesystem    Type     Size   Used  Avail Use% Mounted on
/dev/sda1     ext3      13G   2.6G   8.9G  23% /
tmpfs        tmpfs     1.1G      0   1.1G   0% /lib/init/rw
udev         tmpfs      11M    46k    11M   1% /dev
tmpfs        tmpfs     1.1G      0   1.1G   0% /dev/shm
df: `/var/lib/vservers/mastersql': Transport endpoint is not connected

I'm not sure if i make something wrong (configuration) or if it is a bug!
Can you experts please help me?

If you need any further information or something please let me know.

Thanks and regards
Urban















reply via email to

[Prev in Thread] Current Thread [Next in Thread]