[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [Qemu-devel] qtest hang in /x86_64/ahci/io/ncq/simple (pp
From: |
John Snow |
Subject: |
Re: [Qemu-ppc] [Qemu-devel] qtest hang in /x86_64/ahci/io/ncq/simple (ppc64 host) |
Date: |
Mon, 06 Jul 2015 15:21:12 -0400 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 |
On 07/06/2015 12:24 PM, Paolo Bonzini wrote:
>
>
> On 06/07/2015 17:41, John Snow wrote:
>> I'll take a look. More than possible there's a race in the wait
>> conditional that I have just never seen before.
>>
>> I tweaked it a little recently to wait on both NCQ and traditional
>> completion, but it doesn't have a timeout or anything. I'll go ahead and
>> fix that while I'm here.
>>
>> Should be something simple, hopefully.
The one Peter Maydell reported is indeed something simple. Sent out a patch.
>
> My hang in test_halted_ncq is different:
... Great :)
>
> (gdb) up
> #1 0x00007f3aaadb4476 in read (__nbytes=1, __buf=0x7ffc0652abbf,
> __fd=<optimized out>) at /usr/include/bits/unistd.h:44
> 44 return __read_alias (__fd, __buf, __nbytes);
> (gdb)
> #2 qtest_qmp_receive (address@hidden) at
> /home/work/upstream/qemu/tests/libqtest.c:371
> 371 len = read(s->qmp_fd, &c, 1);
> (gdb)
> #3 0x00007f3aaadb4c48 in qtest_qmp_eventwait (s=0x7f3aac017360,
> address@hidden "STOP") at
> /home/work/upstream/qemu/tests/libqtest.c:477
> 477 response = qtest_qmp_receive(s);
> (gdb)
> #4 0x00007f3aaadb3514 in qmp_eventwait (event=0x7f3aaadbdc68 "STOP") at
> /home/work/upstream/qemu/tests/libqtest.h:498
> 498 return qtest_qmp_eventwait(global_qtest, event);
> (gdb)
> #5 ahci_guest_io_halt (address@hidden,
> address@hidden '\005', address@hidden 'a',
> address@hidden, address@hidden,
> address@hidden)
> at /home/work/upstream/qemu/tests/libqos/ahci.c:604
> 604 qmp_eventwait("STOP");
> (gdb)
> #6 0x00007f3aaada9e09 in ahci_halted_io_test (cmd_read=<optimized out>,
> cmd_write=<optimized out>) at
> /home/work/upstream/qemu/tests/ahci-test.c:1244
> 1244 cmd = ahci_guest_io_halt(ahci, port, cmd_write,
>
>
> In QEMU, current_run_state is RUN_STATE_RUNNING, so the halt didn't
> happen. It's 100% reproducible with "make check-qtest-{i386,x86_64} V=1
> -j12". This is the final part of the qtest log:
>
> [R +0.066539] readl 0xe0000390
> [S +0.066553] OK 0x0000000000000000
> [R +0.066582] memset 0x10b000 0x100 0x00
> [S +0.066597] OK
> [R +0.067627] write 0x10c000 0x1000
[snip]
> [S +0.067748] OK
> [R +0.067782] readl 0xe00003b8
> [S +0.067796] OK 0x0000000000000000
> [R +0.067824] read 0x10a020 0x20
> [S +0.067843] OK
> 0x0000000000000000000000000000000000000000000000000000000000000000
> [R +0.067874] write 0x10a020 0x20
> 0x0000000000000000000000000000000000000000000000000000000000000000
> [S +0.067897] OK
> [R +0.067926] write 0x10a020 0x20
> 0x450401000000000000e010000000000000000000000000000000000000000000
> [S +0.067940] OK
> [R +0.067964] write 0x10e000 0x14 0x2780610800000040000000000800000000000000
> [S +0.067980] OK
> [R +0.068004] write 0x10e080 0x10 0x00c010000000000000000000ff0f0080
> [S +0.068018] OK
> [R +0.068042] writel 0xe00003b4 0x2
> [S +0.068056] OK
> [R +0.068079] writel 0xe00003b8 0x2
> [S +0.069500] OK
>
> This is the /tmp/qtest-blkdebug.INHp7s file:
>
> [inject-error]
> event = "write_aio"
> errno = "5"
> state = "1"
> immediately = "off"
> once = "on"
> [set-state]
> event = "write_aio"
> new_state = "2"
>
> There is no trace of the write payload in the .qcow2 file.
Smells like the injected error prevented that successfully, but...
>
> Paolo
>
Can't reproduce: Fedora 21, 4.0.5, Today's QEMU HEAD (7edd8e56) ...
So it's a race condition of some sort.
Obvious questions: You've tried a distclean?
What was your configure invocation?
My attempt:
make distclean
../../configure --target-list="x86_64-softmmu i386-softmmu" --enable-debug
make -j9
make check-qtest-{i386,x86_64} V=1 -j12
Completes just fine for me.
That aside, I can't figure out why the machine HALT *and* the write
wouldn't occur. Can you stick a bunch of printfs inside the NCQ error
handling function and see if there's something goofing up in there, or
if it's even making it there?
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index bb6a92f..f1af6f9 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -921,6 +921,7 @@ static void ncq_cb(void *opaque, int ret)
IDEState *ide_state = &ncq_tfs->drive->port.ifs[0];
if (ret == -ECANCELED) {
+ fprintf(stderr, "ncq_cb: -ECANCELED\n");
return;
}
@@ -928,12 +929,17 @@ static void ncq_cb(void *opaque, int ret)
bool is_read = ncq_tfs->cmd == READ_FPDMA_QUEUED;
BlockErrorAction action = blk_get_error_action(ide_state->blk,
is_read, -ret);
+ fprintf(stderr, "ncq_cb: ret (%d) is < 0; action is %s\n",
+ ret, BlockErrorAction_lookup[action]);
if (action == BLOCK_ERROR_ACTION_STOP) {
ncq_tfs->halt = true;
ide_state->bus->error_status = IDE_RETRY_HBA;
+ fprintf(stderr, "ncq_cb: BLOCK_ERROR_ACTION_STOP\n");
} else if (action == BLOCK_ERROR_ACTION_REPORT) {
ncq_err(ncq_tfs);
+ fprintf(stderr, "ncq_cb: BLOCK_ERROR_ACTION_REPORT\n");
}
+ fprintf(stderr, "ncq_cb: -> blk_error_action\n");
blk_error_action(ide_state->blk, action, is_read, -ret);
} else {
ide_state->status = READY_STAT | SEEK_STAT;
My guess is I have goofed something up above there in the error handling
section of ncq_cb, but "worksforme," so I am not sure what's going on.
If no prints show up during testing, maybe turn on AHCI_DEBUG as well
and see if any prints are showing up in the success case.
--js