gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] error while reading from an open file


From: Brian Hirt
Subject: Re: [Gluster-devel] error while reading from an open file
Date: Tue, 1 Sep 2009 20:25:45 -0600

Vijay,

I haven't heard back from anyone yet. I have some more information about one of the problems.

I have a program that write()'s to a file, keeping the file open. While this program is writing, restart the nodes one by one. After the nodes have been restarted no new data is written to the file. However, the program doing the write() still gets the correct num bytes returned by the system call and behaves as if everything is working when it clearly isn't.

Meanwhile, if I tail this same file on another client while I reboot the nodes, I eventually get "tail: /gluster/m/test: File descriptor in bad state"

At some point gluster realizes it can't deal with this file and reports back file descriptor in bad state to the reader, but continues to happily report success to the program doing the writes.

The first part of this problem (open files not surviving gluster restarts) seems like a pretty major design flaw that needs to be fixed. The second part (gluster not reporting the error to the writer when gluster chokes) is a critical problem that needs to be fixed. However, it seems that there isn't much interest in fixing these types of things. I've spent some time reading back in the mail archives and there seems to be a pattern of instability and silence on the part of the developers. This really isn't the way to make your project a success and get advocates of your software.

I want to help identify issues and provide information to help get things fixed, but I feel like i'm talking to deaf ears.

Please advice on how I can help on these issues.

--brian

On Aug 31, 2009, at 12:58 PM, Brian Hirt wrote:

Vijay,

Yes, I am using the same distributed-replicate scenario.

The file in the export directory does contains the correct information, but somewhere along the line something being communicated to the operating system by gluster must be wrong. I say this because the client trying to read from an open file is not getting the proper data returned from the system calls which seems to point to a bug in glusterfs.

I've also run into something the might be related but seems much more serious. A program writing to a glusterfs file will fail when you restart You can recreate the problem by:

1) have a program open a file on a glusterfs, write data to a file periodically 2) while the file is being written to, one by one restart all the gluster servers, waiting for the previous server to come back online

At all points in time, three of the four gluster servers are up and running, however the program trying to write data to the file fails. This is a huge issue for any program that keeps a file open for writing for more than a second or two.

As for the temporary files created by rsync, I'm willing to believe they are benign in this particular situation. However, something seems wrong the idea that gluster would expect to have a file, try to lstat it only to find it's not there. Shouldn't gluster know where the files it maintains are? It really feels like a race condition that will be triggered in other situations where it's not so benign.

Thanks for any help you can provide.

--brian

On Aug 30, 2009, at 10:05 AM, Vijay Bellur wrote:

Brian Hirt wrote:

I'm running into some problems where one process is writing a log file to a and another is reading from it. The process reading the file is not behaving as expected.
I am assuming you are using the distributed-replicate scenario that you mentioned in the previous mail. Can you please confirm if the file in the export
directory contains data that  you did not intend to create?

I'm also continuing to get hundreds of the errors I mentioned in that message with rsync.

[2009-08-28 10:21:20] E [posix.c:1155:posix_chmod] posix: lstat on /gluster/exports/redacted/.1218486082-01.jpg.nkOkw9 failed: No such file or directory

These are usually to do with temporary files created during a rsync. These error messages would be benign in nature unless you notice a discrepancy between the original and rsync'd directories.

Regards,
Vijay




_______________________________________________
Gluster-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/gluster-devel






reply via email to

[Prev in Thread] Current Thread [Next in Thread]