Re: [Gluster-devel] Сrash - 2.0.git-2009.06.1 6

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Сrash - 2.0.git-2009.06.1 6

From:	Shehjar Tikoo
Subject:	Re: [Gluster-devel] Сrash - 2.0.git-2009.06.1 6
Date:	Sun, 05 Jul 2009 13:23:18 +0530
User-agent:	Mozilla-Thunderbird 2.0.0.19 (X11/20090103)

Hi

Firstly, a fix for this crash is under-review.
See http://patches.gluster.com/patch/672/

Secondly, I saw in the logs provided by you that the number of
outstanding/pending requests on a single thread were more than 64.
This could be because of a large number of concurrent meta-data
operations or a large number of files being open at the same
time or both.

I suggest that you try increasing the number of io-threads
at the client and server to 8 in order to balance the large
number of pending requests over more threads. It might result
in better performance.

-Shehjar

NovA wrote:

Hi everybody!
Recently I've migrated our small 24-node HPC-cluster from glusterFS1.3.8 unify to 2.0 distribute. It seems that performance reallyincreased a lot. Thanks for your work!
I use the following translators. On servers:posix->locks->iothreads->protocol/server; on clients:protocol/client->distribute->iothreads->write-behind. IO-threadstranslator uses 4 threads, NO autoscaling.
Unfortunately, after upgrade I've got new issues. First, I've noticed
a very high memory usage. Now GlusterFS on the head node eats 737Mbof RES memory and doesnt return it back. The memory usage have beenincreased in the migration process by the command "cd${namespace_export} && find . | (cd ${distribute_mount} && xargs -d'\n' stat -c '%n')". Note that providedmigrate-unify-to-distribute.sh script (with "execute_on" function)doesn't work...
Second problem is more important. A client on one of the nodes hascrashed today with the following backtrace:
------
Core was generated by `glusterfs -f /etc/glusterfs/client.vol -l/var/log/glusterfs/client.log /home'.
Program terminated with signal 11, Segmentation fault.

#0  0x00002b8039bec860 in ?? () from /lib64/libc.so.6

(gdb) bt

#0  0x00002b8039bec860 in ?? () from /lib64/libc.so.6

#1  0x00002b8039bedc0c in malloc () from /lib64/libc.so.6
#2 0x00002b8039548732 in fop_writev_stub (frame=<value optimizedout>,
fn=0x2b803ab6c160 <iot_writev_wrapper>, fd=0x2aaab001e8a0,vector=0x2aaab0071d50,
count=<value optimized out>, off=105432, iobref=0x2aaab0082d60) atcommon-utils.h:166
#3 0x00002b803ab6ec00 in iot_writev (frame=0x4, this=0x6150c0,fd=0x2aaab0082711,
vector=0x2aaab0083060, count=3, offset=105432, iobref=0x2aaab0082d60)



at io-threads.c:1212
#4 0x00002b803ad7a3de in wb_sync (frame=0x2aaab0034c40,file=0x2aaaac007280,
winds=0x7fff717a5450) at write-behind.c:445
#5 0x00002b803ad7a4ff in wb_do_ops (frame=0x2aaab0034c40,file=0x2aaaac007280,
winds=0x7fff717a5450, unwinds=<value optimized out>,other_requests=0x7fff717a5430)
at write-behind.c:1579
#6 0x00002b803ad7a617 in wb_process_queue (frame=0x2aaab0034c40,file=0x2aaaac007280,
flush_all=0 '\0') at write-behind.c:1624

#7  0x00002b803ad7dd81 in wb_sync_cbk (frame=0x2aaab0034c40,
cookie=<value optimized out>, this=<value optimized out>, op_ret=19,op_errno=0,
stbuf=<value optimized out>) at write-behind.c:338

#8  0x00002b803ab6a1e0 in iot_writev_cbk (frame=0x2aaab00309d0,
cookie=<value optimized out>, this=<value optimized out>, op_ret=19,op_errno=0,
stbuf=0x7fff717a5590) at io-threads.c:1186
#9 0x00002b803a953aae in dht_writev_cbk (frame=0x63e3e0,cookie=<value optimized out>,
this=<value optimized out>, op_ret=19, op_errno=0,stbuf=0x7fff717a5590)
at dht-common.c:1797
#10 0x00002b803a7406e9 in client_write_cbk (frame=0x648a80,hdr=<value optimized out>,
hdrlen=<value optimized out>, iobuf=<value optimized out>) atclient-protocol.c:4363
#11 0x00002b803a72c83a in protocol_client_pollin (this=0x60ec70,trans=0x61a380)
at client-protocol.c:6230
#12 0x00002b803a7370bc in notify (this=0x4, event=<value optimizedout>, data=0x61a380)
at client-protocol.c:6274
#13 0x00002b8039533183 in xlator_notify (xl=0x60ec70, event=2,data=0x61a380)
at xlator.c:820
#14 0x00002aaaaaaaff0b in socket_event_handler (fd=<value optimizedout>, idx=4,
data=0x61a380, poll_in=1, poll_out=0, poll_err=0) at socket.c:813

#15 0x00002b803954b2aa in event_dispatch_epoll (event_pool=0x6094f0)
 at event.c:804
#16 0x0000000000403f34 in main (argc=6, argv=0x7fff717a64f8) atglusterfsd.c:1223
----------



Later glusterFS crashed again with different backtrace:

----------
Core was generated by `glusterfs -f /etc/glusterfs/client.vol -l/var/log/glusterfs/client.log /home'.
Program terminated with signal 6, Aborted.

#0  0x00002ae6dfcd4b45 in raise () from /lib64/libc.so.6

(gdb) bt

#0  0x00002ae6dfcd4b45 in raise () from /lib64/libc.so.6

#1  0x00002ae6dfcd60e0 in abort () from /lib64/libc.so.6

#2  0x00002ae6dfd0cfbb in ?? () from /lib64/libc.so.6

#3  0x00002ae6dfd1221d in ?? () from /lib64/libc.so.6

#4  0x00002ae6dfd13f76 in free () from /lib64/libc.so.6

#5  0x00002ae6df673efd in mem_put (pool=0x631a90, ptr=0x2aaaac0bc520)
 at mem-pool.c:191
#6 0x00002ae6e0c992ce in iot_dequeue_ordered (worker=0x631a20) atio-threads.c:2407
#7 0x00002ae6e0c99326 in iot_worker_ordered (arg=<value optimizedout>)
at io-threads.c:2421

#8  0x00002ae6dfa8e020 in start_thread () from /lib64/libpthread.so.0



#9  0x00002ae6dfd68f8d in clone () from /lib64/libc.so.6

#10 0x0000000000000000 in ?? ()

----------



Hope this backtraces help to find an issue...



Best regards,

Andrey


_______________________________________________ Gluster-devel mailing
list address@hiddenhttp://lists.nongnu.org/mailman/listinfo/gluster-devel

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Gluster-devel] Сrash - 2.0.git-2009.06.1 6, Shehjar Tikoo, 2009/07/02
- Re: [Gluster-devel] Сrash - 2.0.git-2009.06.16, NovA, 2009/07/02
- Re: [Gluster-devel] Сrash - 2.0.git-2009.06.1 6, Shehjar Tikoo <=

Prev by Date: Re: [Gluster-devel] Performance Translators' Stability and Usefulness
Next by Date: Re: [Gluster-devel] Performance Translators' Stability and Usefulness
Previous by thread: Re: [Gluster-devel] Сrash - 2.0.git-2009.06.16
Next by thread: [Gluster-devel] OS X 10.5 PPC client does not connect
Index(es):
- Date
- Thread