[Gluster-devel] important change to syncop infra

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] important change to syncop infra

From:	Pranith Kumar Karampuri
Subject:	[Gluster-devel] important change to syncop infra
Date:	Wed, 11 Dec 2013 03:51:55 -0500 (EST)

hi,
    We found a day-1 bug when syncop_xxx() infra is used inside a synctask with 
compilation optimization (CFLAGS -O2). This bug has been dormant for at least 2 
years.
There are around ~400(rebalance, replace-brick, bd, self-heal-daemon, quota, 
fuse lock/fd migration) places where syncop is used in the code base all of 
which are potential candidates which can take the hit.

I sent first round of patch at http://review.gluster.com/6475 to catch 
regressions upstream.
These are the files that are affected by the changes I introduced to fix this:

 api/src/glfs-fops.c                             |  36 
++++++++++++++++++++++++++++++++++
 api/src/glfs-handleops.c                        |  15 ++++++++++++++
 api/src/glfs-internal.h                         |   7 +++++++
 api/src/glfs-resolve.c                          |  10 ++++++++++
 libglusterfs/src/syncop.c                       | 117 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------
 xlators/cluster/afr/src/afr-self-heald.c        |  45 
+++++++++++++++++++++++++++++-------------
 xlators/cluster/afr/src/pump.c                  |  12 ++++++++++--
 xlators/cluster/dht/src/dht-helper.c            |  24 +++++++++++++++--------
 xlators/cluster/dht/src/dht-rebalance.c         | 168 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------------------------------
 xlators/cluster/dht/src/dht-selfheal.c          |   6 ++++--
 xlators/features/locks/src/posix.c              |   3 ++-
 xlators/features/qemu-block/src/bdrv-xlator.c   |  15 ++++----------
 xlators/features/qemu-block/src/qb-coroutines.c |  14 ++++++++++----
 xlators/mount/fuse/src/fuse-bridge.c            |  16 ++++++++++-----

Please review your respective component for these changes in gerrit.

Thanks
Pranith.

Detailed explanation of the Root cause:
We found the bug in 'gf_defrag_migrate_data' in rebalance operation:

Lets look at interesting parts of the function:

int
gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag, loc_t *loc,
                        dict_t *migrate_data)
{
.....
code section - [ Loop ]
        while ((ret = syncop_readdirp (this, fd, 131072, offset, NULL,
                                       &entries)) != 0) {
.....
code section - [ ERRNO-1 ] (errno of readdirp is stored in readdir_operrno by a 
thread)
                /* Need to keep track of ENOENT errno, that means, there is no
                   need to send more readdirp() */
                readdir_operrno = errno;
.....
code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread)
                        ret = syncop_getxattr (this, &entry_loc, &dict,
                                               GF_XATTR_LINKINFO_KEY);
code section - [ ERRNO-2]   (checking for failures of syncop_getxattr(). This 
may not always be executed in same thread which executed [SYNCOP-1])
                        if (ret < 0) {
                                if (errno != ENODATA) {
                                        loglevel = GF_LOG_ERROR;
                                        defrag->total_failures += 1;
.....
}

the function above could be executed by thread(t1) till [SYNCOP-1] and code 
from [ERRNO-2] can be executed by a different thread(t2) because of the way 
syncop-infra schedules the tasks.

when the code is compiled with -O2 optimization this is the assembly code that 
is generated:
 [ERRNO-1]
1165                        readdir_operrno = errno; <<---- errno gets expanded 
as *(__errno_location())
   0x00007fd149d48b60 <+496>:        callq  0x7fd149d410c0 <address@hidden>
   0x00007fd149d48b72 <+514>:        mov    %rax,0x50(%rsp) <<------ Address 
returned by __errno_location() is stored in a special location in stack for 
later use.
   0x00007fd149d48b77 <+519>:        mov    (%rax),%eax
   0x00007fd149d48b79 <+521>:        mov    %eax,0x78(%rsp)
....
 [ERRNO-2]
1281                                        if (errno != ENODATA) {
   0x00007fd149d492ae <+2366>:        mov    0x50(%rsp),%rax <<-----  Because 
it already stored the address returned by __errno_location(), it just 
dereferences the address to get the errno value. BUT THIS CODE NEED NOT BE 
EXECUTED BY SAME THREAD!!!
   0x00007fd149d492b3 <+2371>:        mov    $0x9,%ebp
   0x00007fd149d492b8 <+2376>:        mov    (%rax),%edi
   0x00007fd149d492ba <+2378>:        cmp    $0x3d,%edi

The problem is that __errno_location() value of t1 and t2 are different. So 
[ERRNO-2] ends up reading errno of t1 instead of errno of t2 even though t2 is 
executing [ERRNO-2] code section.

When code is compiled without any optimization for [ERRNO-2]:
1281                                        if (errno != ENODATA) {
   0x00007fd58e7a326f <+2237>:        callq  0x7fd58e797300 
<address@hidden><<--- As it is calling __errno_location() again it gets the 
location from t2 so it works as intended.
   0x00007fd58e7a3274 <+2242>:        mov    (%rax),%eax
   0x00007fd58e7a3276 <+2244>:        cmp    $0x3d,%eax
   0x00007fd58e7a3279 <+2247>:        je     0x7fd58e7a32a1 
<gf_defrag_migrate_data+2287>

Fix:
We decided to make syncop_xxx() return (-errno) value as the return value in 
case of errors and all the functions which make syncop_xxx() will need to use 
(-ret) to figure out the reason for failure in case of syncop_xxx() failures.

Pranith

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] important change to syncop infra, Pranith Kumar Karampuri <=
- Re: [Gluster-devel] important change to syncop infra, Xavier Hernandez, 2013/12/11
  - Re: [Gluster-devel] important change to syncop infra, Pranith Kumar Karampuri, 2013/12/11
    - Re: [Gluster-devel] important change to syncop infra, Xavier Hernandez, 2013/12/11
    - Re: [Gluster-devel] important change to syncop infra, Pranith Kumar Karampuri, 2013/12/11
    - Re: [Gluster-devel] important change to syncop infra, John Mark Walker, 2013/12/11
- Re: [Gluster-devel] important change to syncop infra, Anand Subramanian, 2013/12/12
  - Re: [Gluster-devel] important change to syncop infra, Anand Avati, 2013/12/12
    - Re: [Gluster-devel] important change to syncop infra, Pranith Kumar Karampuri, 2013/12/12

Prev by Date: Re: [Gluster-devel] Standardizing interfaces for BD xlator
Next by Date: Re: [Gluster-devel] Could we add a MAINTAINERS file to the glusterfs sources?
Previous by thread: [Gluster-devel] Standardizing interfaces for BD xlator
Next by thread: Re: [Gluster-devel] important change to syncop infra
Index(es):
- Date
- Thread