gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] problems running glusterfs 2.5 patch 800 and xen


From: Jordi Moles Blanco
Subject: Re: [Gluster-devel] problems running glusterfs 2.5 patch 800 and xen
Date: Wed, 10 Dec 2008 09:39:04 +0100
User-agent: Thunderbird 2.0.0.18 (X11/20081125)

En/na Anand Avati ha escrit:
Right now, with this new version, when i ask Xen to create a new machine in
"/mnt/gluster", i fails.

Can you describe the failure better?

Even if it doesn't, then the machine will apper
"broken" on boot without any chance of recovering it.

What i have to do then is create it in my local disk and then move it to
/mnt/gluster, where it will start and "freeze" at the point that i said in
the previous mails.

Also, if i copy an image file of a vm from another Xen Server straight into
/mnt/gluster through "scp" command, it won't work. Instead, i have to "scp"
the file to my local harddisk form an external machine and then move it into
/mnt/gluster.

I don't know if this helps you identify the problem.

can you post your client log files somewhere? that can help us debug your issue.

avati

hi,

the information you are asking for was posted on the first message of this thread. In that mail i wrote the specs files for both nodes and Xen servers (clients) and also the log files for both.

I'm pasting the content of my first mail and i'll updated with the new things i've experienced from then.


*****************************************************************************************************************

i'm having trouble running Xen virtual machines on a glusterfs 2.5, patch 800.

i've got two xen servers, version 3.2 that store their machines on gluster. They are debian-lenny distros. I also have 3 nodes which provide the storage unit with glusterfs, also lenny distros.

the thing is that when i ran "./configure --enable-kernel-module" for fuse 2.7.3glfs10 on server's side, i got this:

***********
warning: fuse module is already present on kernel, it won't compile
***********

so...

i ran:

*********
./configure
make
make install
**********



when compiling glusterfs--patch-800 i didn't get any error or warning message at all.

On Xen's side, i ran the proposed configure with "enable-fuse-client" and so on, and i got no problems.

anyway...

nodes have this specs:

***************

volume esp
   type storage/posix
   option directory /glu0/data
end-volume

volume espai
   type performance/io-threads
   option thread-count 15
   option cache-size 512MB
   subvolumes esp
end-volume

volume nm
   type storage/posix
   option directory /glu0/ns
end-volume

volume ultim
  type protocol/server
  subvolumes espai nm
  option transport-type tcp/server
  option auth.ip.espai.allow *
  option auth.ip.nm.allow *
end-volume


***************

and Xen have these specs:

***********

volume espai1
      type protocol/client
      option transport-type tcp/client
      option remote-host 10.0.0.3
      option remote-subvolume espai
end-volume

volume espai2
      type protocol/client
      option transport-type tcp/client
      option remote-host 10.0.0.4
      option remote-subvolume espai
end-volume

volume espai3
      type protocol/client
      option transport-type tcp/client
      option remote-host 10.0.0.5
      option remote-subvolume espai
end-volume

volume namespace1
      type protocol/client
      option transport-type tcp/client
      option remote-host 10.0.0.4
      option remote-subvolume nm
end-volume

volume namespace2
      type protocol/client
      option transport-type tcp/client
      option remote-host 10.0.0.5
      option remote-subvolume nm
end-volume

volume grup1
      type cluster/afr
      subvolumes espai1 espai3
end-volume

volume grup2
      type cluster/afr
      subvolumes espai2
end-volume

volume nm
      type cluster/afr
      subvolumes namespace1 namespace2
end-volume

volume g01
      type cluster/unify
      subvolumes grup1 grup2
      option scheduler rr
      option namespace nm
end-volume

volume io-cache type performance/io-cache option cache-size 512MB option page-size 1MB option force-revalidate-timeout 2 subvolumes g01 end-volume
***********

so... everything seams to work fine at first, Xens are able to mount the glusterfs unit, but after a few seconds... i keep getting this on Xen's side:


*********
2008-12-04 18:48:56 E [client-protocol.c:4579:client_checksum] espai2: /domains: returning EINVAL 2008-12-04 18:48:56 E [client-protocol.c:4579:client_checksum] espai2: /domains/xen-gluton02: returning EINVAL
*********

there's no more log about the problem, only that.

on node's side:

************
2008-12-04 19:48:50 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:48:51 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:48:53 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:48:56 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:49:01 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:49:09 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:49:22 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:49:43 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:50:17 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:51:00 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:51:01 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:51:12 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:51:12 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:51:15 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:52:41 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:55:05 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:55:59 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:56:00 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:56:10 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:56:14 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 19:58:58 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:00:59 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:00:59 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:01:03 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:01:06 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:05:15 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:05:59 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:06:00 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:06:09 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:06:13 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:10:59 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:11:00 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:11:08 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:11:12 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:15:25 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:15:59 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:16:00 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:16:05 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:16:12 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:20:59 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:21:00 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:21:07 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:21:12 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:26:00 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:26:00 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:26:07 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null 2008-12-04 20:26:13 E [server-protocol.c:6050:server_protocol_interpret] ultim: bound_xl is null

***********

At first i didn't pay attention to that because we can operate on the storage unit, we can for example move some GB into it... but the thing is that when i try to run the virtual machine, it will freeze after a few seconds and this error i'm reporting will appear more often than before. However, i don't have to run a machine to make it appear, it does appear from the beginning.

finally... when i mount gluster from Xen, i do it this way:

**********
glusterfs -l /var/log/glusterfs/glusterfs.log -L WARNING -d disable -f /etc/glusterfs/glusterfs-client.vol /mnt/glusterfs
**********

i mean, with "-d disable" option which is supposed to be the thing to do with Xen.

and this is the point where my virtual machine freezes:

***********
[    1.104884] blkfront: sda2: barriers enabled
[    1.189612] XENBUS: Device with no driver: device/console/0
[    1.189620] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[    1.189630] Freeing unused kernel memory: 216k freed
[    1.461468] thermal: Unknown symbol acpi_processor_set_thermal_limit
[    2.047300] md: raid1 personality registered for level 1
[    2.089518] md: md0 stopped.
[    2.091932] md: md1 stopped.
[    2.096341] md: md2 stopped.
[    2.244344] EXT3-fs: INFO: recovery required on readonly filesystem.
[    2.244358] EXT3-fs: write access will be enabled during recovery.
[    2.286384] kjournald starting.  Commit interval 5 seconds
[    2.286398] EXT3-fs: recovery complete.
[    2.287274] EXT3-fs: mounted filesystem with ordered data mode.
[ 3.128883] Adding 524280k swap on /dev/sda1. Priority:-1 extents:1 across:524280k
[    3.208470] EXT3 FS on sda2, internal journal
[    3.641153] device-mapper: uevent: version 1.0.3
[ 3.641208] device-mapper: ioctl: 4.13.0-ioctl (2007-10-18) initialised: address@hidden
[    4.756035] NET: Registered protocol family 10
[    4.756035] lo: Disabled Privacy Extensions
***********

*********************

So.... i've also seen that i can't create virtual machines directly to /mnt/glusterfs, i have to create them on my local disk and then move them to the gluster mounted point. Is this a normal behaviour? It's not really a problem, but it would be much easier if i could create them directly to the glusterfs storage unit.

Finally... the main issue is that virtual machines won't work as they used to in earlier versions of fuse/gluster, they just freeze and gluster logs the error "bound_xl is null" all the time.

I've been trying to find out why on node's side i can't run "./configure --enable-kernel-module". Is it possible that "by default" there's a fuse module in the lenny's kernel? Is it possible that it causes the "bound_xl is null" problem? I mean, you say that this is mainly caused when you use different versions of gluster or fuse. Well... the only thing i cant think of is that "default fuse module" in lenny that is from an older version and interacts with fuse on Xen's and cause this message? Does that make any sense.

I would be very pleased if anyone could through some light on this.

Thanks.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]