gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] no such file of directory errors running find -exec


From: Harris Landgarten
Subject: Re: [Gluster-devel] no such file of directory errors running find -exec on large trees
Date: Wed, 6 Jun 2007 07:25:32 -0400 (EDT)

Avati,

Patch 176 fixed the spurious disconnect problem and I was able to run the find 
-exec test to completion as well as a reindex on a large zimbra mailbox. I 
consider this bug closed. Good job.

I am still having a problem that shows up when doing a full backup from Zimbra. 
This causes many files, some large, to be copied from one folder tree on the 
cluster to another on the same cluster. I am getting disconnects after failures 
on large blocks. This ultimately caused the glusterfs client to stall in a 
state where it had to be killed. Any thoughts?

Harris

----- Original Message -----
From: "Anand Avati" <address@hidden>
To: "Harris Landgarten" <address@hidden>
Cc: "gluster-devel" <address@hidden>
Sent: Monday, June 4, 2007 8:44:54 AM (GMT-0500) America/New_York
Subject: Re: [Gluster-devel] no such file of directory errors running find 
-exec on large trees

Harris,
  a bug fix in the timer code has been committed which used to cause
spurious transport timeout events causing false bailing. please try
with the latest tla checkout from glusterfs--mainline--2.4.

thanks,
avati

2007/5/26, Harris Landgarten <address@hidden>:
> I have been running some tests on a gluster tree with over 100,000 files 
> using find. When I run find /mnt/gluster -type f it completes without error. 
> When I combine the find with -exec that causes the files to be read (ie find 
> /mnt/gluster -type f -exec head -n1 {} \; ) The process fails to complete 
> 100% of the time with "No such file or directory" errors. The errors do not 
> occur until after at least 30,000 files have been processed. After the 
> process fails the clients and servers still appear to be running properly. 
> The errors seem to coincide with the following in the logs:
>
> Client running process:
>
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
> protocol/client:transport_submit failed
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
> protocol/client:transport_submit failed
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
> protocol/client:transport_submit failed
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
> protocol/client:transport_submit failed
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
> protocol/client:transport_submit failed
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
> protocol/client:transport_submit failed
> ...
> [May 25 10:12:34] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client2: 
> connection to server disconnected
> [May 25 10:12:34] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
> protocol/client:transport_submit failed
> [May 25 10:12:34] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
> protocol/client:transport_submit failed
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] 
> client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] 
> client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] 
> client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] 
> client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] 
> client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] 
> client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] 
> client/protocol:bailing transport
> [May 25 10:12:34] [ERROR/tcp.c:110/tcp_except()] transport/tcp:shutdown () - 
> error: Transport endpoint is not connected
> [May 25 10:12:34] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=104)
> [May 25 10:12:34] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client2: 
> connection to server disconnected
> [May 25 10:12:34] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=9)
> ...
>
> brick1:
>
> [May 25 10:11:26] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=2)
> [May 25 10:11:26] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
> connection to server disconnected
> [May 25 10:11:26] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=2)
> [May 25 10:11:26] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
> connection to server disconnected
> [May 25 10:11:26] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=2)
> [May 25 10:11:26] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
> connection to server disconnected
> [May 25 10:11:26] [ERROR/common-utils.c:110/full_rwv()] 
> libglusterfs:full_rwv: 50680 bytes r/w instead of 124559 (Broken pipe)
> [May 25 10:11:26] [ERROR/proto-srv.c:117/generic_reply()] 
> protocol/server:transport_writev failed
> [May 25 10:11:26] [ERROR/tcp.c:110/tcp_except()] transport/tcp:shutdown () - 
> error: Transport endpoint is not connected
> [May 25 10:11:26] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=107)
> [May 25 10:11:26] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
> connection to server disconnected
> [May 25 10:11:43] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=9)
> ...
>
> brick2:
>
> [May 25 10:11:25] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=2)
> [May 25 10:11:25] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
> connection to server disconnected
> [May 25 10:11:25] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=9)
> [May 25 10:11:25] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
> connection to server disconnected
> [May 25 10:11:43] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=2)
> [May 25 10:11:43] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
> connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=2)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
> connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=2)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
> connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=2)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
> connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=9)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
> connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=9)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
> connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=9)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
> connection to server disconnected
> ...
>
> Client2 running (logging = debug)
>
> [May 25 10:12:06] [CRITICAL/client-protocol.c:218/call_bail()] 
> client/protocol:bailing transport
> [May 25 10:12:06] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write 
> to break on blocked socket (if any)
> [May 25 10:12:06] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=115)
> [May 25 10:12:06] [DEBUG/protocol.c:244/gf_block_unserialize_transport()] 
> libglusterfs/protocol:gf_block_unserialize_transport: full_read of header 
> failed
> [May 25 10:12:06] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()] 
> protocol/client:cleaning up state in transport object 0x8077418
> [May 25 10:12:06] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client1: 
> connection to server disconnected
> [May 25 10:12:06] [DEBUG/tcp-client.c:180/tcp_connect()] transport: tcp: 
> :try_connect: socket fd = 4
> [May 25 10:12:06] [DEBUG/tcp-client.c:202/tcp_connect()] transport: tcp: 
> :try_connect: finalized on port `1023'
> [May 25 10:12:06] [DEBUG/tcp-client.c:226/tcp_connect()] 
> tcp/client:try_connect: defaulting remote-port to 6996
> [May 25 10:12:06] [DEBUG/tcp-client.c:262/tcp_connect()] tcp/client:connect 
> on 4 in progress (non-blocking)
> [May 25 10:12:06] [DEBUG/tcp-client.c:301/tcp_connect()] 
> tcp/client:connection on 4 still in progress - try later
> [May 25 10:12:06] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
> protocol/client:transport_submit failed
> [May 25 10:12:06] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()] 
> protocol/client:cleaning up state in transport object 0x8077418
> [May 25 10:12:26] [DEBUG/tcp-client.c:310/tcp_connect()] 
> tcp/client:connection on 4 success, attempting to handshake
> [May 25 10:12:26] [DEBUG/tcp-client.c:54/do_handshake()] 
> transport/tcp-client:dictionary length = 50
> [May 25 10:12:26] [CRITICAL/client-protocol.c:218/call_bail()] 
> client/protocol:bailing transport
> [May 25 10:12:26] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write 
> to break on blocked socket (if any)
> [May 25 10:13:11] [CRITICAL/client-protocol.c:218/call_bail()] 
> client/protocol:bailing transport
> [May 25 10:13:11] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write 
> to break on blocked socket (if any)
> [May 25 10:13:11] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
> bytes r/w instead of 113 (errno=115)
> [May 25 10:13:11] [DEBUG/protocol.c:244/gf_block_unserialize_transport()] 
> libglusterfs/protocol:gf_block_unserialize_transport: full_read of header 
> failed
> [May 25 10:13:11] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()] 
> protocol/client:cleaning up state in transport object 0x8077418
> [May 25 10:13:11] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client1: 
> connection to server disconnected
> [May 25 10:13:11] [DEBUG/tcp-client.c:180/tcp_connect()] transport: tcp: 
> :try_connect: socket fd = 4
> [May 25 10:13:11] [DEBUG/tcp-client.c:202/tcp_connect()] transport: tcp: 
> :try_connect: finalized on port `1022'
> [May 25 10:13:11] [DEBUG/tcp-client.c:226/tcp_connect()] 
> tcp/client:try_connect: defaulting remote-port to 6996
> [May 25 10:13:11] [DEBUG/tcp-client.c:262/tcp_connect()] tcp/client:connect 
> on 4 in progress (non-blocking)
> [May 25 10:13:11] [DEBUG/tcp-client.c:301/tcp_connect()] 
> tcp/client:connection on 4 still in progress - try later
> ...
>
> Both clients are running with read-ahead and write_behind on and 
> stat_prefetch off. All machines are on current mainline-2.4 as of patch 174. 
> Since both bricks are seeing the same error does that mean the client is 
> causing the error? The errors do not alway start at the same file but always 
> happen after a lot of files are processed.
>
> Are there any specific tests you would like run?
>
> Best,
>
> Harris
>
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>


-- 
Anand V. Avati





reply via email to

[Prev in Thread] Current Thread [Next in Thread]