gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Re: NFS reexport status


From: Brent A Nelson
Subject: Re: [Gluster-devel] Re: NFS reexport status
Date: Wed, 8 Aug 2007 12:25:15 -0400 (EDT)

On Wed, 8 Aug 2007, Krishna Srinivas wrote:

Hi Brent,

Thanks. So if you use storage/posix under afr, you don't see
problem in nfs reexport.

Correct, that worked fine. Once I introduced protocol/client and protocol/server, though, rsync -aH /usr/ /mount/nfs0/ gives I/O errors and an inconsistent copy.

We are not able to reproduce this behaviour here.

Did you try with the spec files I sent you (they only need two directories available on a single machine), with an rsync of your /usr partition to the NFS reexport (this can also be done via localhost, no additional machines needed)? You are using the kernel NFS server, I assume, not one of the user-mode NFS servers?

Can you give us access to your machines? is it possible?


Yes, if the above doesn't do the trick, we can coordinate some way to get you access. Do you have an SSH public key I could add as an authorized key?

Thanks,

Brent

On 8/8/07, Brent A Nelson <address@hidden> wrote:
Today, I tried switching to the Gluster-modified fuse-2.7.0, but I still
encountered the same misbehavior with NFS reexport.  Heads-up: like
someone else on the mailing list, I found that GlusterFS performance is
MUCH slower with 2.7.0 than with my old 2.6.3, at least for simple "du"
tests...

Failing that, I thought I'd try to figure out the simplest specs to
exhibit the issue; see attached.  I first tried glusterfs (no glusterfsd);
it worked for a simple afr as well as unification of two afrs with no NFS
reexport trouble.  As soon as I introduced a glusterfsd exporting to the
glusterfs via protocol/client and protocol/server (via localhost),
however, the rsync problems appeared.  I didn't see the issues with du in
this simple setup, though (perhaps that problem will disappear when this
problem is fixed, perhaps not).

Thanks,

Brent

On Tue, 7 Aug 2007, Krishna Srinivas wrote:

Hi Brent,

Those messages in log are harmless, I have removed them from the
source. Can you mail the spec files? I will see again if it can be
repro'd

Thanks
Krishna


On 8/7/07, Brent A Nelson <address@hidden> wrote:
I added debugging to all the AFR subvolumes.  On the du test, all it
produced were lines like this over and over:
2007-08-06 17:23:41 C [dict.c:1094:data_to_ptr] libglusterfs/dict:
@data=(nil)

For the rsync (in addition to the @data=(nil) messages):
rsync -a /tmp/blah/usr0/ /tmp/blah/nfs0/
rsync: readdir("/tmp/blah/usr0/share/perl/5.8.8/unicore/lib/gc_sc"):
Input/output error (5)
rsync: readdir("/tmp/blah/usr0/share/man/man1"): Input/output error (5)
rsync: readdir("/tmp/blah/usr0/share/man/man8"): Input/output error (5)
rsync: readdir("/tmp/blah/usr0/bin"): Input/output error (5)
rsync:
readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/linux"):
Input/output error (5)
rsync:
readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16-server/include/config"):
Input/output error (5)
rsync:
readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16-server/include/linux"):
Input/output error (5)
rsync: writefd_unbuffered failed to write 2672 bytes [sender]: Broken pipe
(32)
rsync: close failed on "/tmp/blah/nfs0/games/.banner.vl3iqI": Operation
not permitted (1)
rsync: connection unexpectedly closed (98 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(454)
[sender=2.6.9]

The debug output is:
2007-08-06 17:33:58 E [afr.c:1389:afr_selfheal_getxattr_cbk] mirror3:
(path=/nfs0/games/.banner.vl3iqI child=share3-0) op_ret=-1 op_errno=61
2007-08-06 17:33:58 E [afr.c:1389:afr_selfheal_getxattr_cbk] mirror3:
(path=/nfs0/games/.banner.vl3iqI child=share3-1) op_ret=-1 op_errno=61
2007-08-06 17:33:58 E [afr.c:1389:afr_selfheal_getxattr_cbk] ns0:
(path=/nfs0/games/.banner.vl3iqI child=ns0-0) op_ret=-1 op_errno=61
2007-08-06 17:33:58 E [afr.c:1389:afr_selfheal_getxattr_cbk] ns0:
(path=/nfs0/games/.banner.vl3iqI child=ns0-1) op_ret=-1 op_errno=61

This is new behavior; rsync didn't used to actually die, it just made
incomplete copies.


On Tue, 7 Aug 2007, Krishna Srinivas wrote:

Hi Brent,

Can you put "option debug on" in afr subvolume and try the
du/rsync operations and mail the log?

We are not able to reproduce the problem here, nfs is working
fine over afr.

Thanks
Krishna

On 8/4/07, Krishna Srinivas <address@hidden> wrote:
rsync was failing for me without no_root_squash, so thought that
might have been the culprit.

If i put no_root_squash, nfs over afr works fine for me.

Yes you are right, for some reason readdir() is not functioning
properly I think because of which paths are getting corrupted.

will get back to you.

Thanks
Krishna

On 8/4/07, Brent A Nelson <address@hidden> wrote:
All of my tests were done with no_root_squash already, and all tests were
done as root.

Without AFR, gluster and NFS reexports work fine with du and rsync.

With AFR, gluster by itself is fine, but du and rsync from an NFS client
do not work properly. rsync gives lots of I/O errors and occasional "file
has vanished" messages for paths where the last element is junk.  du gives
incorrect sizes (smaller than it should) and occassionally gives "no such
file or directory", also for paths where the last element is junk.  See
output below for examples from both of this junk.  Perhaps if you could
figure out how those paths are getting corrupted, the whole problem will
be resolved...

Thanks,

Brent

On Sat, 4 Aug 2007, Krishna Srinivas wrote:

Hi Brent,

Can you add no_root_squash to exports file and reexport and mount
using nfs and try to rsync as root and see if it works?

like: "/mnt/gluster *(rw,no_root_squash,sync,fsid=3)"

Thanks
Krishna

On 8/4/07, Brent A Nelson <address@hidden> wrote:
Woops, scratch that.  I accidentally tested the 2nd GlusterFS directory,
not the final NFS mount.  Even with the GlusterFS reexport of the original
GlusterFS, the issue is still present.

Thanks and sorry for the confusion,

Brent

On Fri, 3 Aug 2007, Brent A Nelson wrote:

I do have a workaround which can hide this bug, thanks to the wonderful
flexibility of GlusterFS and the fact that it in itself is POSIX.  If I mount
the GlusterFS as usual, but then use another glusterfs/glusterfsd pair to
export and mount it and NFS reexport THAT, the problem does not appear.

Presumably, server-side AFR instead of client-side would also bypass the
issue (not tested)...

Thanks,

Brent

On Fri, 3 Aug 2007, Brent A Nelson wrote:

I turned off self-heal on all the AFR volumes, remounted and reexported (I
didn't delete the data; let me know if that is needed).

du -sk /tmp/blah/* (via NFS)
du: cannot access `/tmp/blah/usr0/include/c++/4.1.2/\a': No such file or
directory
171832  /tmp/blah/usr0
109476  /tmp/blah/usr0-copy
du: cannot access `/tmp/blah/usr1/include/sys/\337O\004': No such file or
directory
du: cannot access
`/tmp/blah/usr1/src/linux-headers-2.6.20-16/include/asm-ia64/\v': No such
file or directory
du: cannot access
`/tmp/blah/usr1/src/linux-headers-2.6.20-16/include/asm-ia64/&\324\004': No
such file or directory
du: cannot access
`/tmp/blah/usr1/src/linux-headers-2.6.20-16/drivers/\006': No such file or
directory
117472  /tmp/blah/usr1
58392   /tmp/blah/usr1-copy

It appears that self-heal isn't the culprit.

Thanks,

Brent

On Fri, 3 Aug 2007, Krishna Srinivas wrote:

Hi Brent,

Can you turn self-heal off (option self-heal off) and see how it
behaves?

Thanks
Krishna

On 8/3/07, Brent A Nelson <address@hidden> wrote:
A hopefully relevant strace snippet:

open("share/perl/5.8.8/unicore/lib/jt",
O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
fstat64(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
mmap2(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb7c63000
getdents64(3, /* 6 entries */, 1048576) = 144
lstat64("share/perl/5.8.8/unicore/lib/jt/C.pl", {st_mode=S_IFREG|0644,
st_size=220, ...}) = 0
lstat64("share/perl/5.8.8/unicore/lib/jt/U.pl", {st_mode=S_IFREG|0644,
st_size=251, ...}) = 0
lstat64("share/perl/5.8.8/unicore/lib/jt/D.pl", {st_mode=S_IFREG|0644,
st_size=438, ...}) = 0
lstat64("share/perl/5.8.8/unicore/lib/jt/R.pl", {st_mode=S_IFREG|0644,
st_size=426, ...}) = 0
getdents64(3, /* 0 entries */, 1048576) = 0
munmap(0xb7c63000, 1052672)             = 0
close(3)                                = 0
open("share/perl/5.8.8/unicore/lib/gc_sc",
O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
fstat64(3, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
mmap2(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb7c63000
getdents64(3, 0xb7c63024, 1048576)      = -1 EIO (Input/output error)
write(2, "rsync: readdir(\"/tmp/blah/usr0/s"..., 91rsync:
readdir("/tmp/blah/usr0/share/perl/5.8.8/unicore/lib/gc_sc"):
Input/output
error (5)) = 91
write(2, "\n", 1
)                       = 1
munmap(0xb7c63000, 1052672)             = 0
close(3)                                = 0

Thanks,

Brent

On Thu, 2 Aug 2007, Brent A Nelson wrote:

NFS reexport of a unified GlusterFS seems to be working fine as of TLA
409.
I can make identical copies of a /usr area local-to-glusterfs and
glusterfs-to-glusterfs, hardlinks and all.  Awesome!

However, this is not true when AFR is added to the mix (rsync
glusterfs-to-glusterfs via NFS reexport):

rsync: readdir("/tmp/blah/usr0/lib/perl/5.8.8/auto/POSIX"): Input/output
error (5)
rsync: readdir("/tmp/blah/usr0/share/perl/5.8.8"): Input/output error
(5)
rsync: readdir("/tmp/blah/usr0/share/i18n/locales"): Input/output error
(5)
rsync:
readdir("/tmp/blah/usr0/share/locale-langpack/en_GB/LC_MESSAGES"):
Input/output error (5)
rsync: readdir("/tmp/blah/usr0/share/groff/1.18.1/font/devps"):
Input/output
error (5)
rsync: readdir("/tmp/blah/usr0/share/man/man1"): Input/output error (5)
rsync: readdir("/tmp/blah/usr0/share/man/man8"): Input/output error (5)
rsync: readdir("/tmp/blah/usr0/share/man/man7"): Input/output error (5)
rsync: readdir("/tmp/blah/usr0/share/X11/xkb/symbols"): Input/output
error
(5)
rsync: readdir("/tmp/blah/usr0/share/zoneinfo/right/Africa"):
Input/output
error (5)
rsync: readdir("/tmp/blah/usr0/share/zoneinfo/right/Asia"): Input/output
error (5)
rsync: readdir("/tmp/blah/usr0/share/zoneinfo/right/America"):
Input/output
error (5)
rsync: readdir("/tmp/blah/usr0/share/zoneinfo/Asia"): Input/output error
(5)
rsync: readdir("/tmp/blah/usr0/share/doc"): Input/output error (5)
rsync: readdir("/tmp/blah/usr0/share/consolefonts"): Input/output error
(5)
rsync: readdir("/tmp/blah/usr0/bin"): Input/output error (5)
rsync:
readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/asm-sparc64"):
Input/output error (5)
rsync:
readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/linux"):
Input/output error (5)
rsync:
readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/asm-mips"):
Input/output error (5)
rsync:
readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/asm-parisc"):
Input/output error (5)
file has vanished:
"/tmp/blah/usr0/src/linux-headers-2.6.20-16/include/asm-sparc/\#012"
rsync:
readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16-server/include/config"):
Input/output error (5)
rsync:
readdir("/tmp/blah/usr0/src/linux-headers-2.6.20-16-server/include/linux"):
Input/output error (5)
...

Any ideas? Meanwhile, I'll try to track it down in strace (the output
will be
huge, but maybe I'll get lucky)...

Thanks,

Brent



_______________________________________________
Gluster-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/gluster-devel






_______________________________________________
Gluster-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/gluster-devel




_______________________________________________
Gluster-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/gluster-devel











reply via email to

[Prev in Thread] Current Thread [Next in Thread]