gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] split brain: how should it be cured?


From: Pranith Kumar Karampuri
Subject: Re: [Gluster-devel] split brain: how should it be cured?
Date: Mon, 18 Jun 2012 13:02:41 -0400 (EDT)

regression 832305

Patch: http://review.gluster.com/#change,3583

Pranith.
----- Original Message -----
From: "Emmanuel Dreyfus" <address@hidden>
To: address@hidden
Sent: Monday, June 18, 2012 5:49:04 PM
Subject: [Gluster-devel] split brain: how should it be cured?

Hi

I get this split brain:

$ ls -l /pfs/manu/netbsd/usr/src/tools/mktemp/Makefile 
-rw-r--r--  1 manu  manu  165 Dec  8  2002 
/pfs/manu/netbsd/usr/src/tools/mktemp/Makefile
$ head -1  /pfs/manu/netbsd/usr/src/tools/mktemp/Makefile
head: /pfs/manu/netbsd/usr/src/tools/mktemp/Makefile: Input/output error

Client log is at the end of the message.

On brick1:
trusted.gfid               6d 6c 04 a5 a8 bb 40 09 a4 a4 76 5e 83 28 63 6e
trusted.afr.pfs-client-0   00 00 00 00 00 00 00 00 00 00 00 00
trusted.afr.pfs-client-1   00 00 00 00 00 00 00 00 00 00 00 00

On brick2:
trusted.gfid               6b db b7 73 cc e7 46 a8 9d fc 96 40 2c 6a fe e8
trusted.afr.pfs-client-0   00 00 00 00 00 00 00 00 00 00 00 00
trusted.afr.pfs-client-1   00 00 00 00 00 00 00 01 00 00 00 00

Since the split brain bit is in brick2 I remove the file there. If
I run ls -l on the client, the file is re-created, but it still has the
split brain flag in trusted.afr.pfs-client-1


Client log when attempting to open the file for reading:

[2012-06-18 14:02:42.697447] W 
[afr-common.c:1226:afr_detect_self_heal_by_lookup_status] 0-pfs-replicate-0: 
split brain detected during lookup of 
/manu/netbsd/usr/src/tools/mktemp/Makefile.
[2012-06-18 14:02:42.697699] I [afr-common.c:1340:afr_launch_self_heal] 
0-pfs-replicate-0: background  meta-data data gfid self-heal triggered. path: 
/manu/netbsd/usr/src/tools/mktemp/Makefile, reason: lookup detected pending 
operations
[2012-06-18 14:02:42.698958] I 
[afr-self-heal-common.c:1197:afr_sh_missing_entry_call_impunge_recreate] 
0-pfs-replicate-0: no missing files - 
/manu/netbsd/usr/src/tools/mktemp/Makefile. proceeding to metadata check
[2012-06-18 14:02:42.699622] I 
[afr-self-heal-common.c:1002:afr_sh_missing_entries_done] 0-pfs-replicate-0: 
split brain found, aborting selfheal of 
/manu/netbsd/usr/src/tools/mktemp/Makefile
[2012-06-18 14:02:42.699919] E 
[afr-self-heal-common.c:2158:afr_self_heal_completion_cbk] 0-XXX: calling 
afr_set_split_brain
[2012-06-18 14:02:42.700114] E 
[afr-self-heal-common.c:2167:afr_self_heal_completion_cbk] 0-pfs-replicate-0: 
background  meta-data data gfid self-heal failed on 
/manu/netbsd/usr/src/tools/mktemp/Makefile
[2012-06-18 14:02:42.700720] W [afr-open.c:213:afr_open] 0-pfs-replicate-0: 
failed to open as split brain seen, returning EIO
[2012-06-18 14:02:42.701066] W [fuse-bridge.c:713:fuse_fd_cbk] 
0-glusterfs-fuse: 461378: OPEN() /manu/netbsd/usr/src/tools/mktemp/Makefile => 
-1 (Input/output error)


Client log when doing ls -l on the file after it was removed from brick2:

[2012-06-18 14:15:14.596053] I 
[afr-common.c:1215:afr_detect_self_heal_by_lookup_status] 0-pfs-replicate-0: 
entries are missing in lookup of /manu/netbsd/usr/src/tools/mktemp/Makefile.
[2012-06-18 14:15:14.596357] I [afr-common.c:1340:afr_launch_self_heal] 
0-pfs-replicate-0: background  meta-data data entry missing-entry gfid 
self-heal triggered. path: /manu/netbsd/usr/src/tools/mktemp/Makefile, reason: 
lookup detected pending operations
[2012-06-18 14:15:14.598599] E 
[afr-self-heal-common.c:1095:afr_sh_common_lookup_resp_handler] 
0-pfs-replicate-0: path /manu/netbsd/usr/src/tools/mktemp/Makefile on subvolume 
pfs-client-0 => -1 (No such file or directory)
[2012-06-18 14:15:14.600608] I 
[afr-self-heal-common.c:1002:afr_sh_missing_entries_done] 0-pfs-replicate-0: 
split brain found, aborting selfheal of 
/manu/netbsd/usr/src/tools/mktemp/Makefile
[2012-06-18 14:15:14.600816] E 
[afr-self-heal-common.c:2158:afr_self_heal_completion_cbk] 0-XXX: calling 
afr_set_split_brain
[2012-06-18 14:15:14.601012] E 
[afr-self-heal-common.c:2167:afr_self_heal_completion_cbk] 0-pfs-replicate-0: 
background  meta-data data entry missing-entry gfid self-heal failed on 
/manu/netbsd/usr/src/tools/mktemp/Makefile

NB: The XXX log is an addition I made while trying to igure what is
going on.

-- 
Emmanuel Dreyfus
address@hidden

_______________________________________________
Gluster-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/gluster-devel



reply via email to

[Prev in Thread] Current Thread [Next in Thread]