[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gluster-devel] no such file of directory errors running find -exec
From: |
Harris Landgarten |
Subject: |
Re: [Gluster-devel] no such file of directory errors running find -exec on large trees |
Date: |
Wed, 6 Jun 2007 07:25:32 -0400 (EDT) |
Avati,
Patch 176 fixed the spurious disconnect problem and I was able to run the find
-exec test to completion as well as a reindex on a large zimbra mailbox. I
consider this bug closed. Good job.
I am still having a problem that shows up when doing a full backup from Zimbra.
This causes many files, some large, to be copied from one folder tree on the
cluster to another on the same cluster. I am getting disconnects after failures
on large blocks. This ultimately caused the glusterfs client to stall in a
state where it had to be killed. Any thoughts?
Harris
----- Original Message -----
From: "Anand Avati" <address@hidden>
To: "Harris Landgarten" <address@hidden>
Cc: "gluster-devel" <address@hidden>
Sent: Monday, June 4, 2007 8:44:54 AM (GMT-0500) America/New_York
Subject: Re: [Gluster-devel] no such file of directory errors running find
-exec on large trees
Harris,
a bug fix in the timer code has been committed which used to cause
spurious transport timeout events causing false bailing. please try
with the latest tla checkout from glusterfs--mainline--2.4.
thanks,
avati
2007/5/26, Harris Landgarten <address@hidden>:
> I have been running some tests on a gluster tree with over 100,000 files
> using find. When I run find /mnt/gluster -type f it completes without error.
> When I combine the find with -exec that causes the files to be read (ie find
> /mnt/gluster -type f -exec head -n1 {} \; ) The process fails to complete
> 100% of the time with "No such file or directory" errors. The errors do not
> occur until after at least 30,000 files have been processed. After the
> process fails the clients and servers still appear to be running properly.
> The errors seem to coincide with the following in the logs:
>
> Client running process:
>
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()]
> protocol/client:transport_submit failed
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()]
> protocol/client:transport_submit failed
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()]
> protocol/client:transport_submit failed
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()]
> protocol/client:transport_submit failed
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()]
> protocol/client:transport_submit failed
> [May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()]
> protocol/client:transport_submit failed
> ...
> [May 25 10:12:34] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client2:
> connection to server disconnected
> [May 25 10:12:34] [ERROR/client-protocol.c:204/client_protocol_xfer()]
> protocol/client:transport_submit failed
> [May 25 10:12:34] [ERROR/client-protocol.c:204/client_protocol_xfer()]
> protocol/client:transport_submit failed
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()]
> client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()]
> client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()]
> client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()]
> client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()]
> client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()]
> client/protocol:bailing transport
> [May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()]
> client/protocol:bailing transport
> [May 25 10:12:34] [ERROR/tcp.c:110/tcp_except()] transport/tcp:shutdown () -
> error: Transport endpoint is not connected
> [May 25 10:12:34] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=104)
> [May 25 10:12:34] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client2:
> connection to server disconnected
> [May 25 10:12:34] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=9)
> ...
>
> brick1:
>
> [May 25 10:11:26] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=2)
> [May 25 10:11:26] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server:
> connection to server disconnected
> [May 25 10:11:26] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=2)
> [May 25 10:11:26] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server:
> connection to server disconnected
> [May 25 10:11:26] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=2)
> [May 25 10:11:26] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server:
> connection to server disconnected
> [May 25 10:11:26] [ERROR/common-utils.c:110/full_rwv()]
> libglusterfs:full_rwv: 50680 bytes r/w instead of 124559 (Broken pipe)
> [May 25 10:11:26] [ERROR/proto-srv.c:117/generic_reply()]
> protocol/server:transport_writev failed
> [May 25 10:11:26] [ERROR/tcp.c:110/tcp_except()] transport/tcp:shutdown () -
> error: Transport endpoint is not connected
> [May 25 10:11:26] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=107)
> [May 25 10:11:26] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server:
> connection to server disconnected
> [May 25 10:11:43] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=9)
> ...
>
> brick2:
>
> [May 25 10:11:25] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=2)
> [May 25 10:11:25] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server:
> connection to server disconnected
> [May 25 10:11:25] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=9)
> [May 25 10:11:25] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server:
> connection to server disconnected
> [May 25 10:11:43] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=2)
> [May 25 10:11:43] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server:
> connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=2)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server:
> connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=2)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server:
> connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=2)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server:
> connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=9)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server:
> connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=9)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server:
> connection to server disconnected
> [May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=9)
> [May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server:
> connection to server disconnected
> ...
>
> Client2 running (logging = debug)
>
> [May 25 10:12:06] [CRITICAL/client-protocol.c:218/call_bail()]
> client/protocol:bailing transport
> [May 25 10:12:06] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write
> to break on blocked socket (if any)
> [May 25 10:12:06] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=115)
> [May 25 10:12:06] [DEBUG/protocol.c:244/gf_block_unserialize_transport()]
> libglusterfs/protocol:gf_block_unserialize_transport: full_read of header
> failed
> [May 25 10:12:06] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()]
> protocol/client:cleaning up state in transport object 0x8077418
> [May 25 10:12:06] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client1:
> connection to server disconnected
> [May 25 10:12:06] [DEBUG/tcp-client.c:180/tcp_connect()] transport: tcp:
> :try_connect: socket fd = 4
> [May 25 10:12:06] [DEBUG/tcp-client.c:202/tcp_connect()] transport: tcp:
> :try_connect: finalized on port `1023'
> [May 25 10:12:06] [DEBUG/tcp-client.c:226/tcp_connect()]
> tcp/client:try_connect: defaulting remote-port to 6996
> [May 25 10:12:06] [DEBUG/tcp-client.c:262/tcp_connect()] tcp/client:connect
> on 4 in progress (non-blocking)
> [May 25 10:12:06] [DEBUG/tcp-client.c:301/tcp_connect()]
> tcp/client:connection on 4 still in progress - try later
> [May 25 10:12:06] [ERROR/client-protocol.c:204/client_protocol_xfer()]
> protocol/client:transport_submit failed
> [May 25 10:12:06] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()]
> protocol/client:cleaning up state in transport object 0x8077418
> [May 25 10:12:26] [DEBUG/tcp-client.c:310/tcp_connect()]
> tcp/client:connection on 4 success, attempting to handshake
> [May 25 10:12:26] [DEBUG/tcp-client.c:54/do_handshake()]
> transport/tcp-client:dictionary length = 50
> [May 25 10:12:26] [CRITICAL/client-protocol.c:218/call_bail()]
> client/protocol:bailing transport
> [May 25 10:12:26] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write
> to break on blocked socket (if any)
> [May 25 10:13:11] [CRITICAL/client-protocol.c:218/call_bail()]
> client/protocol:bailing transport
> [May 25 10:13:11] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write
> to break on blocked socket (if any)
> [May 25 10:13:11] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0
> bytes r/w instead of 113 (errno=115)
> [May 25 10:13:11] [DEBUG/protocol.c:244/gf_block_unserialize_transport()]
> libglusterfs/protocol:gf_block_unserialize_transport: full_read of header
> failed
> [May 25 10:13:11] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()]
> protocol/client:cleaning up state in transport object 0x8077418
> [May 25 10:13:11] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client1:
> connection to server disconnected
> [May 25 10:13:11] [DEBUG/tcp-client.c:180/tcp_connect()] transport: tcp:
> :try_connect: socket fd = 4
> [May 25 10:13:11] [DEBUG/tcp-client.c:202/tcp_connect()] transport: tcp:
> :try_connect: finalized on port `1022'
> [May 25 10:13:11] [DEBUG/tcp-client.c:226/tcp_connect()]
> tcp/client:try_connect: defaulting remote-port to 6996
> [May 25 10:13:11] [DEBUG/tcp-client.c:262/tcp_connect()] tcp/client:connect
> on 4 in progress (non-blocking)
> [May 25 10:13:11] [DEBUG/tcp-client.c:301/tcp_connect()]
> tcp/client:connection on 4 still in progress - try later
> ...
>
> Both clients are running with read-ahead and write_behind on and
> stat_prefetch off. All machines are on current mainline-2.4 as of patch 174.
> Since both bricks are seeing the same error does that mean the client is
> causing the error? The errors do not alway start at the same file but always
> happen after a lot of files are processed.
>
> Are there any specific tests you would like run?
>
> Best,
>
> Harris
>
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
--
Anand V. Avati