gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] difficult bug in 2.5 mainline


From: Harris Landgarten
Subject: Re: [Gluster-devel] difficult bug in 2.5 mainline
Date: Mon, 2 Jul 2007 06:54:12 -0400 (EDT)

Backup test run on patch 252:

Zimbra client crashed with BT:

#0  0xb7f81f4c in raise () from /lib/libpthread.so.0
(gdb) bt
#0  0xb7f81f4c in raise () from /lib/libpthread.so.0
#1  0xb7fac628 in gf_print_trace (signum=6) at common-utils.c:211
#2  <signal handler called>
#3  0xb7e7e986 in raise () from /lib/libc.so.6
#4  0xb7e80043 in abort () from /lib/libc.so.6
#5  0xb7e7812d in __assert_fail () from /lib/libc.so.6
#6  0xb7fafe90 in inode_unref (inode=0x80b5fc8) at inode.c:336
#7  0x0804b7a4 in fuse_loc_wipe (fuse_loc=0x847e678) at fuse-bridge.c:97
#8  0x0804b82d in free_state (state=0x847e670) at fuse-bridge.c:129
#9  0x0804efb4 in fuse_entry_cbk (frame=0x84cb380, cookie=0x84ce360, 
this=0x8058db0, op_ret=8, op_errno=107, inode=0x80b5fc8, buf=0x84b8d90) at 
fuse-bridge.c:368
#10 0xb7fa9cac in default_lookup_cbk (frame=0x84ce360, cookie=0x84bb7e8, 
this=0x80587f0, op_ret=8, op_errno=107, inode=0x80b5fc8, buf=0x84b8d90) at 
defaults.c:40
#11 0xb7fa9cac in default_lookup_cbk (frame=0x84bb7e8, cookie=0x847dc28, 
this=0x8058760, op_ret=8, op_errno=107, inode=0x80b5fc8, buf=0x84b8d90) at 
defaults.c:40
#12 0xb75edbbd in unify_sh_opendir_cbk (frame=0x847dc28, cookie=0x8052500, 
this=0x80579e8, op_ret=8, op_errno=17, fd=0x843ffa0) at unify-self-heal.c:380
#13 0xb75f5f62 in client_opendir_cbk (frame=0x84b9088, args=0x80929b8) at 
client-protocol.c:3213
#14 0xb75f9077 in notify (this=0x8052a68, event=2, data=0x80902b8) at 
client-protocol.c:4191
#15 0xb7fada27 in transport_notify (this=0x6e5d, event=6) at transport.c:152
#16 0xb7fae499 in sys_epoll_iteration (ctx=0xbffcfff8) at epoll.c:54
#17 0xb7fadafd in poll_iteration (ctx=0xbffcfff8) at transport.c:260
#18 0x0804a170 in main (argc=6, argv=0xbffd00d4) at glusterfs.c:341

brick2 with namespace crashed as well.
brick1 stayed up
client2 recovered when brick2 was restarted.

No data was written from gluster to backup tmp.

Harris


----- Original Message -----
From: "Harris Landgarten" <address@hidden>
To: "Harris Landgarten" <address@hidden>
Cc: "gluster-devel" <address@hidden>, "Amar S. Tumballi" <address@hidden>
Sent: Sunday, July 1, 2007 10:03:02 PM (GMT-0500) America/New_York
Subject: Re: [Gluster-devel] difficult bug in 2.5 mainline

The backup hung as first described. No data was written from the secondary 
volume on gluster to the backup tmp dir.

Harris

----- Original Message -----
From: "Harris Landgarten" <address@hidden>
To: "Amar S. Tumballi" <address@hidden>
Cc: "gluster-devel" <address@hidden>
Sent: Sunday, July 1, 2007 9:46:18 PM (GMT-0500) America/New_York
Subject: Re: [Gluster-devel] difficult bug in 2.5 mainline

Amar,

The rm -rf bug is still there. See the last comment by Daniel to the ml in 
reply to the problem with rm -rf post to the ml. BTW files are being deleted 
but at the rate of about 1 every 3 sec with lots of lookups in the logs. I am 
going to check the other problem now.

Harris
 
----- Original Message -----
From: "Amar S. Tumballi" <address@hidden>
To: "Harris Landgarten" <address@hidden>
Cc: "gluster-devel" <address@hidden>
Sent: Sunday, July 1, 2007 7:55:09 PM (GMT-0500) America/New_York
Subject: Re: [Gluster-devel] difficult bug in 2.5 mainline

Hi Harris, 
With the latest patch this bug is fixed. Also, i hope it should fix the problem 
of 'rm -rf' too.. please confirm. 

i am looking into other strange bug reported by you. 

-bulde 


On 7/2/07 , Harris Landgarten < address@hidden > wrote: 

Disabling posix-locks changes the problem 

The client crashes along with the lock-server brick 

Here is the bt from the client: 

#0 unify_bg_cbk (frame=0xe080168, cookie=0xe1109c8, this=0x8057730, op_ret=0, 
op_errno=13) at unify.c:83 
83 callcnt = --local->call_count; 
(gdb) bt 
#0 unify_bg_cbk (frame=0xe080168, cookie=0xe1109c8, this=0x8057730, op_ret=0, 
op_errno=13) at unify.c:83 
#1 0xb75b96e5 in client_unlink_cbk (frame=0xe1109c8, args=0x8059248) at 
client-protocol.c:2969 
#2 0xb75beff5 in notify (this=0x8057730, event=2, data=0x8095338) at 
client-protocol.c:4184 
#3 0xb7f73827 in transport_notify (this=0x0, event= 235405672 ) at 
transport.c:152 
#4 0xb7f74299 in sys_epoll_iteration (ctx=0xbfb96248) at epoll.c:54 
#5 0xb7f738fd in poll_iteration (ctx=0xbfb96248) at transport.c:260 
#6 0x0804a170 in main (argc=6, argv=0xbfb96324) at glusterfs.c:341 
(gdb) print local 
$1 = (unify_local_t *) 0x0 

Harris 

----- Original Message ----- 
From: "Harris Landgarten" < address@hidden > 
To: "gluster-devel" < address@hidden > 
Sent: Sunday, July 1, 2007 10:56:05 AM (GMT-0500) America/New_York 
Subject: [Gluster-devel] difficult bug in 2.5 mainline 

I am trying to track down a bug that is causing hangs in 2.5-patch-249 and all 
previous. 

This happens during a full Zimbra backup of certain accounts to 
/mnt/glusterfs/backups 

The first stage of the backup copies indexes and primary storage to 
/mnt/glusterfs/backups/tmp 
All of this data resides in local storage and the writing to gluster is 
successful. 

The next stage copies secondary storage to /mnt/glusterfs/backups/tmp 
This fails in the following way: 

Brick1 hangs with no errors 
Brick2 hangs with no errors 
Zimbra client hangs with no errors 
second client loses connectivity 

The second client bails after 2 min but cannot connect 
The Zimbra client never bails 

I then restart the bricks 

After both bricks are restarted, the second client reconnects and a hung df -h 
completes 

Zimbra client stays in a hung unconnected start 

ls -l /mnt/glusterfs hangs 

Only way is reset is 

kill -9 pidof glusterfs 
umount /mnt/glusterfs 

glusterfs 

Post mortem examination of /mnt/glusterfs/backups/tmp shows that a few files 
have the written from the secondary storage volume. I this can over 15,000 
files should have been written. 

Note: this only happen with large email boxed with some large >10M files. 

Note: with patch-247 the zimbra client would seqfault. With 249 it just hangs 
in unrecoverable state. 


Harris 


_______________________________________________ 
Gluster-devel mailing list 
address@hidden 
http://lists.nongnu.org/mailman/listinfo/gluster-devel 



_______________________________________________ 
Gluster-devel mailing list 
address@hidden 
http://lists.nongnu.org/mailman/listinfo/gluster-devel 



-- 
Amar Tumballi 
http://amar.80x25.org 
[bulde on #gluster/irc.gnu.org] 


_______________________________________________
Gluster-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/gluster-devel






reply via email to

[Prev in Thread] Current Thread [Next in Thread]