Re: [Gluster-devel] 3.5.0beta3 memory leak problem

Hi Vijay,

I make following test:

Start glusterfs volume, kill glusterfsd, and start glusterfsd with following command:

valgrind --log-file=/root/dingyuan/logs/valgrind.log /usr/sbin/glusterfsd -s server241 --volfile-id vol1.server241.fsmnt-fs1 -p /var/lib/glusterd/vols/vol1/run/server241-fsmnt-fs1.pid -S /var/run/4f8241255dc7142a794af68d66dcedeb.socket --brick-name /fsmnt/fs1 -l /var/log/glusterfs/bricks/fsmnt-fs1.log --xlator-option *-posix.glusterd-uuid=41da2eae-c2c8-41a0-8873-5286699a8b95 --brick-port 49153 --xlator-option vol1-server.listen-port=49153 -N

The command line option is the same with default command line option except the red region.

Then mount nfs client, run ltp test.

After a few minutes, valgrind seems run into a dead loop. top shows below:(glusterfsd run in the process 'memcheck-amd64-')

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

21255 root 20 0 309m 106m 4328 R 100.1 1.4 1121:42 memcheck-amd64-

The process can not be killed by SIGTERM. SIGKILL can kill it, but no valgrind report generated...

Is there something wrong with my test procedure. Or is there other method to catch more information?

Thanks!

On Wed, Feb 19, 2014 at 2:20 PM, Vijay Bellur <address@hidden> wrote:

On 02/18/2014 03:18 PM, Yuan Ding wrote:

I tested gluster nfs server with 1 nfs client. And run ltp's fs test
cases on that nfs client. There seems to have 2 memory leak problem.
(See my nfs server & 2 glusterfsd config file is in attach)
The 2 problem describes below:

1. The glusterfs runs as nfs server exhaust system memory(1GB) in server
minutes. After disable drc, this problem no longer exist.

2. After disable drc, the test run 1 day with no problem. But I found
glusterfsd used more than 50% system memory(ps command line output sees
below). Stop the test can not release memory.

address@hidden ~]# ps aux | grep glusterfsd
root 7443 3.7 52.8 1731340 539108 ? Ssl Feb17 70:01
/usr/sbin/glusterfsd -s server155 --volfile-id vol1.server155.fsmnt-fs1
-p /var/lib/glusterd/vols/vol1/run/server155-fsmnt-fs1.pid -S
/var/run/5b7fe23f0aec78ffa0e6968dece0a8b0.socket --brick-name /fsmnt/fs1
-l /var/log/glusterfs/bricks/fsmnt-fs1.log --xlator-option
*-posix.glusterd-uuid=d4f3d342-dd41-4dc7-b0fc-d3ce9998d21f --brick-port
49152 --xlator-option vol1-server.listen-port=49152

I use kill -SIGUSR1 7443 to collected some dump information(in attached
fsmnt-fs1.7443.dump.1392711830).

Any help is appreciate!

Thanks for the report, there seem to be a lot of dict_t allocations as seen from statedump. Would it be possible to run the tests after starting glusterfsd with valgrind and share the report here?

-Vijay

From:	Yuan Ding
Subject:	Re: [Gluster-devel] 3.5.0beta3 memory leak problem
Date:	Fri, 21 Feb 2014 12:01:41 +0800