gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] no such file of directory errors running find -exec on l


From: Harris Landgarten
Subject: [Gluster-devel] no such file of directory errors running find -exec on large trees
Date: Fri, 25 May 2007 16:14:47 -0400 (EDT)

I have been running some tests on a gluster tree with over 100,000 files using 
find. When I run find /mnt/gluster -type f it completes without error. When I 
combine the find with -exec that causes the files to be read (ie find 
/mnt/gluster -type f -exec head -n1 {} \; ) The process fails to complete 100% 
of the time with "No such file or directory" errors. The errors do not occur 
until after at least 30,000 files have been processed. After the process fails 
the clients and servers still appear to be running properly. The errors seem to 
coincide with the following in the logs:

Client running process:

[May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
protocol/client:transport_submit failed
[May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
protocol/client:transport_submit failed
[May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
protocol/client:transport_submit failed
[May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
protocol/client:transport_submit failed
[May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
protocol/client:transport_submit failed
[May 25 10:11:14] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
protocol/client:transport_submit failed
...
[May 25 10:12:34] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client2: 
connection to server disconnected
[May 25 10:12:34] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
protocol/client:transport_submit failed
[May 25 10:12:34] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
protocol/client:transport_submit failed
[May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] 
client/protocol:bailing transport
[May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] 
client/protocol:bailing transport
[May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] 
client/protocol:bailing transport
[May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] 
client/protocol:bailing transport
[May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] 
client/protocol:bailing transport
[May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] 
client/protocol:bailing transport
[May 25 10:12:34] [CRITICAL/client-protocol.c:218/call_bail()] 
client/protocol:bailing transport
[May 25 10:12:34] [ERROR/tcp.c:110/tcp_except()] transport/tcp:shutdown () - 
error: Transport endpoint is not connected
[May 25 10:12:34] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=104)
[May 25 10:12:34] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client2: 
connection to server disconnected
[May 25 10:12:34] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=9)
...

brick1:

[May 25 10:11:26] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=2)
[May 25 10:11:26] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
connection to server disconnected
[May 25 10:11:26] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=2)
[May 25 10:11:26] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
connection to server disconnected
[May 25 10:11:26] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=2)
[May 25 10:11:26] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
connection to server disconnected
[May 25 10:11:26] [ERROR/common-utils.c:110/full_rwv()] libglusterfs:full_rwv: 
50680 bytes r/w instead of 124559 (Broken pipe)
[May 25 10:11:26] [ERROR/proto-srv.c:117/generic_reply()] 
protocol/server:transport_writev failed
[May 25 10:11:26] [ERROR/tcp.c:110/tcp_except()] transport/tcp:shutdown () - 
error: Transport endpoint is not connected
[May 25 10:11:26] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=107)
[May 25 10:11:26] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
connection to server disconnected
[May 25 10:11:43] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=9)
...

brick2:

[May 25 10:11:25] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=2)
[May 25 10:11:25] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
connection to server disconnected
[May 25 10:11:25] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=9)
[May 25 10:11:25] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
connection to server disconnected
[May 25 10:11:43] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=2)
[May 25 10:11:43] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
connection to server disconnected
[May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=2)
[May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
connection to server disconnected
[May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=2)
[May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
connection to server disconnected
[May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=2)
[May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
connection to server disconnected
[May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=9)
[May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
connection to server disconnected
[May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=9)
[May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
connection to server disconnected
[May 25 10:12:46] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=9)
[May 25 10:12:46] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:server: 
connection to server disconnected
...

Client2 running (logging = debug)

[May 25 10:12:06] [CRITICAL/client-protocol.c:218/call_bail()] 
client/protocol:bailing transport
[May 25 10:12:06] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to 
break on blocked socket (if any)
[May 25 10:12:06] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=115)
[May 25 10:12:06] [DEBUG/protocol.c:244/gf_block_unserialize_transport()] 
libglusterfs/protocol:gf_block_unserialize_transport: full_read of header failed
[May 25 10:12:06] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()] 
protocol/client:cleaning up state in transport object 0x8077418
[May 25 10:12:06] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client1: 
connection to server disconnected
[May 25 10:12:06] [DEBUG/tcp-client.c:180/tcp_connect()] transport: tcp: 
:try_connect: socket fd = 4
[May 25 10:12:06] [DEBUG/tcp-client.c:202/tcp_connect()] transport: tcp: 
:try_connect: finalized on port `1023'
[May 25 10:12:06] [DEBUG/tcp-client.c:226/tcp_connect()] 
tcp/client:try_connect: defaulting remote-port to 6996
[May 25 10:12:06] [DEBUG/tcp-client.c:262/tcp_connect()] tcp/client:connect on 
4 in progress (non-blocking)
[May 25 10:12:06] [DEBUG/tcp-client.c:301/tcp_connect()] tcp/client:connection 
on 4 still in progress - try later
[May 25 10:12:06] [ERROR/client-protocol.c:204/client_protocol_xfer()] 
protocol/client:transport_submit failed
[May 25 10:12:06] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()] 
protocol/client:cleaning up state in transport object 0x8077418
[May 25 10:12:26] [DEBUG/tcp-client.c:310/tcp_connect()] tcp/client:connection 
on 4 success, attempting to handshake
[May 25 10:12:26] [DEBUG/tcp-client.c:54/do_handshake()] 
transport/tcp-client:dictionary length = 50
[May 25 10:12:26] [CRITICAL/client-protocol.c:218/call_bail()] 
client/protocol:bailing transport
[May 25 10:12:26] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to 
break on blocked socket (if any)
[May 25 10:13:11] [CRITICAL/client-protocol.c:218/call_bail()] 
client/protocol:bailing transport
[May 25 10:13:11] [DEBUG/tcp.c:123/cont_hand()] tcp:forcing poll/read/write to 
break on blocked socket (if any)
[May 25 10:13:11] [ERROR/common-utils.c:55/full_rw()] libglusterfs:full_rw: 0 
bytes r/w instead of 113 (errno=115)
[May 25 10:13:11] [DEBUG/protocol.c:244/gf_block_unserialize_transport()] 
libglusterfs/protocol:gf_block_unserialize_transport: full_read of header failed
[May 25 10:13:11] [DEBUG/client-protocol.c:2605/client_protocol_cleanup()] 
protocol/client:cleaning up state in transport object 0x8077418
[May 25 10:13:11] [CRITICAL/tcp.c:81/tcp_disconnect()] transport/tcp:client1: 
connection to server disconnected
[May 25 10:13:11] [DEBUG/tcp-client.c:180/tcp_connect()] transport: tcp: 
:try_connect: socket fd = 4
[May 25 10:13:11] [DEBUG/tcp-client.c:202/tcp_connect()] transport: tcp: 
:try_connect: finalized on port `1022'
[May 25 10:13:11] [DEBUG/tcp-client.c:226/tcp_connect()] 
tcp/client:try_connect: defaulting remote-port to 6996
[May 25 10:13:11] [DEBUG/tcp-client.c:262/tcp_connect()] tcp/client:connect on 
4 in progress (non-blocking)
[May 25 10:13:11] [DEBUG/tcp-client.c:301/tcp_connect()] tcp/client:connection 
on 4 still in progress - try later
...

Both clients are running with read-ahead and write_behind on and stat_prefetch 
off. All machines are on current mainline-2.4 as of patch 174. Since both 
bricks are seeing the same error does that mean the client is causing the 
error? The errors do not alway start at the same file but always happen after a 
lot of files are processed.

Are there any specific tests you would like run?

Best,

Harris







reply via email to

[Prev in Thread] Current Thread [Next in Thread]