|
From: | 蒋凯 |
Subject: | [Gluster-devel] gsyncd deadlocks in log_raise_exception |
Date: | Sun, 26 Jan 2014 10:46:41 +0000 |
Hi, Generally, when gsyncd encounters exceptions, it can log the exception and restarts. But in some cases, it deadlocks. It happens in my environment about once a week. The replication stops, but geo-replication status command
shows OK. I checked the processes in the master. The gsync process hangs in below backtrace, and the ssh sub process can’t terminate. I kill the ssh sub process use the signal -9 manually, then the geo-replication exits and restarts. #3 file '/usr/lib64/python2.6/subprocess.py', in '_eintr_retry_call' #7 file '/usr/lib64/python2.6/subprocess.py', in 'wait' #11 file '/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py', in 'log_raise_exception' #14 file '/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py', in 'twrap' #19 file '/usr/lib64/python2.6/threading.py', in 'run' #22 file '/usr/lib64/python2.6/threading.py', in '__bootstrap_inner' #25 file '/usr/lib64/python2.6/threading.py', in '__bootstrap' I think the problem is it uses Popen.wait here, which may deadlock if the output is larger than the pipe size. See the document
http://docs.python.org/2/library/subprocess.html, which recommends to use Popen.communicate instead. Thanks. |
[Prev in Thread] | Current Thread | [Next in Thread] |