[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] strange crash in tracked_request_begin
From: |
Christian Borntraeger |
Subject: |
Re: [Qemu-devel] strange crash in tracked_request_begin |
Date: |
Mon, 7 Mar 2016 20:00:49 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 |
On 03/07/2016 06:01 PM, Stefan Hajnoczi wrote:
> On Mon, Mar 07, 2016 at 01:29:08PM +0100, Christian Borntraeger wrote:
>> Folks,
>>
>> I had a crash of a qemu guest in tracked_request_begin.
>> The testcase was a guest with ramdisk/kernel that reboots in a
>> loop. (about 10 times per second) with a single null-co disk
>> attached. No idea how to reproduce this, seems to be a lucky hit.
>>
>> (gdb) bt
>> #0 0x00000000101db5ba in tracked_request_begin (address@hidden,
>> address@hidden, address@hidden, address@hidden, address@hidden)
>> at /home/cborntra/REPOS/qemu/block/io.c:390
>> #1 0x00000000101de91e in bdrv_co_do_preadv (bs=0x42a39190, offset=0,
>> bytes=4096, qiov=0x3ff7400cbd8, flags=<optimized out>,
>> address@hidden(unknown: 0))
>> at /home/cborntra/REPOS/qemu/block/io.c:1001
>> #2 0x00000000101dfc3e in bdrv_co_do_readv (flags=(unknown: 0),
>> qiov=<optimized out>, nb_sectors=<optimized out>, sector_num=<optimized
>> out>, bs=<optimized out>)
>> at /home/cborntra/REPOS/qemu/block/io.c:1024
>> #3 bdrv_co_do_rw (opaque=0x3ff7400e370) at
>> /home/cborntra/REPOS/qemu/block/io.c:2173
>> #4 0x000000001022d8f6 in coroutine_trampoline (i0=<optimized out>,
>> i1=-1946150928) at /home/cborntra/REPOS/qemu/util/coroutine-ucontext.c:79
>> #5 0x000003ff95ed150a in __makecontext_ret () from /lib64/libc.so.6
>>
>> looking at the code we are at
>>
>> QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
>> which translates to
>>
>> if (((req)->list.le_next = (&bs->tracked_requests)->lh_first) != NULL)
>> (&bs->tracked_requests)->lh_first->list.le_prev = &(req)->list.le_next;
>> (&bs->tracked_requests)->lh_first = (req);
>> (req)->list.le_prev = &(&bs->tracked_requests)->lh_first;
>>
>> gdb says, that (&bs->tracked_requests)->lh_first) is zero in the corefile
>> (gdb) print /x bs->tracked_requests
>> $6 = {lh_first = 0x0}
>>
>> Now looking at the code I am asking myself if this can happen in parallel
>> to another code that touches tracked_requests, because gcc seems to read
>> &bs->tracked_requests)->lh_first twice (first to check the value, then
>> to use it as pointer)
>
> tracked_requests is protected by AioContext. Perhaps something is doing
> I/O without acquiring AioContext?
Hmm, the guest was rebooting, which resets all devices. Maybe something
in that code is still not right? I will have a look.
>
> Luckily there is only 1 place where items are added and removed from
> tracked_requests. This might make debugging somewhat easier.
I have trouble reproducing the issue, which makes it hard :-/
>>
>> 388 qemu_co_queue_init(&req->wait_queue);
>> 0x00000000101db594 <+76>: la %r2,72(%r13)
>> 0x00000000101db598 <+80>: brasl %r14,0x1022cdc0 <qemu_co_queue_init>
>>
>> 389
>> 390 QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
>> 0x00000000101db59e <+86>: lg %r1,12744(%r12) # r1 =
>> (&bs->tracked_requests)->lh_first)
>> 0x00000000101db5a4 <+92>: stg %r1,48(%r13) #
>> (req)->list.le_next = r1
>> 0x00000000101db5aa <+98>: cgij %r1,0,8,0x101db5c0 ---+ # if r1==0 goto
>> 0x00000000101db5b0 <+104>: lg %r1,12744(%r12) | # r1 =
>> (&bs->tracked_requests)->lh_first) (again!!)
>> 0x00000000101db5b6 <+110>: la %r2,48(%r13) |
>> => 0x00000000101db5ba <+114>: stg %r2,56(%r1) | # r1==0
>> bang
>> 0x00000000101db5c0 <+120>: stg %r13,12744(%r12)<-----+
>> 0x00000000101db5c6 <+126>: lay %r12,12744(%r12)
>> 0x00000000101db5cc <+132>: stg %r12,56(%r13)
>>
>>
>> Christian
>>