gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] Re: more bugs (was Re: io-threads...)


From: Brent A Nelson
Subject: [Gluster-devel] Re: more bugs (was Re: io-threads...)
Date: Tue, 1 May 2007 16:27:37 -0400 (EDT)

Wow, it almost looked like the patch fixed the issue with using stat-prefetch, but see below. I was almost unable to get it to crash with du's or rm's on complex directories, as it did fairly easily before.

Also, I think it fixed a tiny anomaly that I had noticed but ignored. Previously, even without stat-prefetch, multiple du's on a complex directory could give slightly different total sizes (a few KB out of many GB). Now, there is no such fluctuation at all.

I WAS able to get a crash with one machine doing du, while a different machine did removes of the same area. The du machine is the one where the glusterfs client died (the first du completed, the second died). The glusterfs left a backtrace in the log but no core, perhaps because I compiled with CFLAGS=-O3. See attached backtrace.

Stat-prefetch did at least withstand a great deal more torture than before the patch, so it seems to be a significant improvement. Note that I haven't tried the new patch without stat-prefetch, so it's possible that heavy testing might be able to kill it even without stat-prefetch; I'm not sure.

Thanks,

Brent

PS Alas, there was no effect on the NFS reexport issue.
PPS The AFR client failover works pretty well, but I notice something. The first attempt to access the glusterfs after losing contact with a glusterfsd is sometimes faulty (e.g., the first df may say it's not connected or give a smaller size for the volume; trying to cat a file may not work on the first try). The very next attempt will succeed, however.

On Tue, 1 May 2007, Anand Avati wrote:


I was wondering if you could describe patch-134 a little? I was curious as
to whether or not it could be related to the stat-prefetch or the NFS
reexport issues.

this was a bug in afr which could have triggered for anybody who used
AFR and accessed a directory. the functions forming the reply path of
a transaction use function pointers and the afr's opendir reply
callback prototype had an extra member and derefered that pointer
(which is a junk pointer). so far all of us were lucky that the
derefernced pointer happened to point to some allocated memory (though
nothing was altered or used). it is very much possible that this culd
be related to the stat-prefetch. the latest glusterfs codebase now
prints a backtrace of a segfaul in the log as well as dumps a core,
next time if you get a segfault please pass on the core and/or log.
I do not see how nfs rexport can be affected, but you never know if
this could have triggered a side effect somewhere else.

I have done only a very little check with NFS re-export. once 1.3 next
release is done i will do a more thorough check.

regards,
avati

--
ultimate_answer_t
deep_thought (void)
{
 sleep (years2secs (7500000));
 return 42;
}

Attachment: glusterfs-crash.txt
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]