gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] important change to syncop infra


From: Xavier Hernandez
Subject: Re: [Gluster-devel] important change to syncop infra
Date: Wed, 11 Dec 2013 11:04:42 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.1

Hi Pranith,

it's really a very interesting and hard to find bug, but I'm wondering how we can prevent this to happen in the general case. There might be other operations based on pointers to thread local storage that will suffer this problem. Probably 'errno' is one of the most dangerous, but TLS is also used to resolve 'THIS' in a very similar way to errno, and there are temporary uuid and lkowner values also stored into TLS. More things could appear in the future. These are also potential cases to have problems with syncops.

An access to THIS before and after a syncop might trigger this bug, right ?

I think it's very difficult to track all these cases and handle them correctly. Another solution could be to tell the compiler to forget previous pointer values when a syncop is called, but I don't see any way to do so. Since it's storing pointers to values, any barrier or volatile seems useless.

Xavi


El 11/12/13 09:51, Pranith Kumar Karampuri ha escrit:
hi,
     We found a day-1 bug when syncop_xxx() infra is used inside a synctask 
with compilation optimization (CFLAGS -O2). This bug has been dormant for at 
least 2 years.
There are around ~400(rebalance, replace-brick, bd, self-heal-daemon, quota, 
fuse lock/fd migration) places where syncop is used in the code base all of 
which are potential candidates which can take the hit.

I sent first round of patch at http://review.gluster.com/6475 to catch 
regressions upstream.
These are the files that are affected by the changes I introduced to fix this:

  api/src/glfs-fops.c                             |  36 
++++++++++++++++++++++++++++++++++
  api/src/glfs-handleops.c                        |  15 ++++++++++++++
  api/src/glfs-internal.h                         |   7 +++++++
  api/src/glfs-resolve.c                          |  10 ++++++++++
  libglusterfs/src/syncop.c                       | 117 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------
  xlators/cluster/afr/src/afr-self-heald.c        |  45 
+++++++++++++++++++++++++++++-------------
  xlators/cluster/afr/src/pump.c                  |  12 ++++++++++--
  xlators/cluster/dht/src/dht-helper.c            |  24 +++++++++++++++--------
  xlators/cluster/dht/src/dht-rebalance.c         | 168 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------------------------------------
  xlators/cluster/dht/src/dht-selfheal.c          |   6 ++++--
  xlators/features/locks/src/posix.c              |   3 ++-
  xlators/features/qemu-block/src/bdrv-xlator.c   |  15 ++++----------
  xlators/features/qemu-block/src/qb-coroutines.c |  14 ++++++++++----
  xlators/mount/fuse/src/fuse-bridge.c            |  16 ++++++++++-----

Please review your respective component for these changes in gerrit.

Thanks
Pranith.

Detailed explanation of the Root cause:
We found the bug in 'gf_defrag_migrate_data' in rebalance operation:

Lets look at interesting parts of the function:

int
gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag, loc_t *loc,
                         dict_t *migrate_data)
{
.....
code section - [ Loop ]
         while ((ret = syncop_readdirp (this, fd, 131072, offset, NULL,
                                        &entries)) != 0) {
.....
code section - [ ERRNO-1 ] (errno of readdirp is stored in readdir_operrno by a 
thread)
                 /* Need to keep track of ENOENT errno, that means, there is no
                    need to send more readdirp() */
                 readdir_operrno = errno;
.....
code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread)
                         ret = syncop_getxattr (this, &entry_loc, &dict,
                                                GF_XATTR_LINKINFO_KEY);
code section - [ ERRNO-2]   (checking for failures of syncop_getxattr(). This 
may not always be executed in same thread which executed [SYNCOP-1])
                         if (ret < 0) {
                                 if (errno != ENODATA) {
                                         loglevel = GF_LOG_ERROR;
                                         defrag->total_failures += 1;
.....
}

the function above could be executed by thread(t1) till [SYNCOP-1] and code 
from [ERRNO-2] can be executed by a different thread(t2) because of the way 
syncop-infra schedules the tasks.

when the code is compiled with -O2 optimization this is the assembly code that 
is generated:
  [ERRNO-1]
1165                        readdir_operrno = errno; <<---- errno gets expanded 
as *(__errno_location())
    0x00007fd149d48b60 <+496>:        callq  0x7fd149d410c0 <address@hidden>
    0x00007fd149d48b72 <+514>:        mov    %rax,0x50(%rsp) <<------ Address 
returned by __errno_location() is stored in a special location in stack for later use.
    0x00007fd149d48b77 <+519>:        mov    (%rax),%eax
    0x00007fd149d48b79 <+521>:        mov    %eax,0x78(%rsp)
....
  [ERRNO-2]
1281                                        if (errno != ENODATA) {
    0x00007fd149d492ae <+2366>:        mov    0x50(%rsp),%rax <<-----  Because 
it already stored the address returned by __errno_location(), it just dereferences the 
address to get the errno value. BUT THIS CODE NEED NOT BE EXECUTED BY SAME THREAD!!!
    0x00007fd149d492b3 <+2371>:        mov    $0x9,%ebp
    0x00007fd149d492b8 <+2376>:        mov    (%rax),%edi
    0x00007fd149d492ba <+2378>:        cmp    $0x3d,%edi

The problem is that __errno_location() value of t1 and t2 are different. So 
[ERRNO-2] ends up reading errno of t1 instead of errno of t2 even though t2 is 
executing [ERRNO-2] code section.

When code is compiled without any optimization for [ERRNO-2]:
1281                                        if (errno != ENODATA) {
    0x00007fd58e7a326f <+2237>:        callq  0x7fd58e797300 
<address@hidden><<--- As it is calling __errno_location() again it gets the location 
from t2 so it works as intended.
    0x00007fd58e7a3274 <+2242>:        mov    (%rax),%eax
    0x00007fd58e7a3276 <+2244>:        cmp    $0x3d,%eax
    0x00007fd58e7a3279 <+2247>:        je     0x7fd58e7a32a1 
<gf_defrag_migrate_data+2287>

Fix:
We decided to make syncop_xxx() return (-errno) value as the return value in 
case of errors and all the functions which make syncop_xxx() will need to use 
(-ret) to figure out the reason for failure in case of syncop_xxx() failures.

Pranith

_______________________________________________
Gluster-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/gluster-devel




reply via email to

[Prev in Thread] Current Thread [Next in Thread]