qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] Migration sometimes fails with IDE and Qem


From: Peter Lieven
Subject: Re: [Qemu-block] [Qemu-devel] Migration sometimes fails with IDE and Qemu 2.2.1
Date: Thu, 09 Apr 2015 16:54:09 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0

Am 09.04.2015 um 15:43 schrieb Dr. David Alan Gilbert:
* Peter Lieven (address@hidden) wrote:
Am 07.04.2015 um 21:01 schrieb Dr. David Alan Gilbert:
* Peter Lieven (address@hidden) wrote:
Am 07.04.2015 um 17:29 schrieb Dr. David Alan Gilbert:
* Peter Lieven (address@hidden) wrote:
Hi David,

Am 07.04.2015 um 10:43 schrieb Dr. David Alan Gilbert:
Any particular workload or reproducer?
Workload is almost zero. I try to figure out if there is a way to trigger it.

Maybe playing a role: Machine type is -M pc1.2 and we set -kvmclock as
CPU flag since kvmclock seemed to be quite buggy in 2.6.16...

Exact cmdline is:
/usr/bin/qemu-2.2.1  -enable-kvm  -M pc-1.2  -nodefaults -netdev 
type=tap,id=guest2,script=no,downscript=no,ifname=tap2  -device 
e1000,netdev=guest2,mac=52:54:00:ff:00:65 -drive 
format=raw,file=iscsi://172.21.200.53/iqn.2001-05.com.equallogic:4-52aed6-88a7e99a4-d9e00040fdc509a3-XXX-hd0/0,if=ide,cache=writeback,aio=native
  -serial null  -parallel null  -m 1024 -smp 2,sockets=1,cores=2,threads=1  
-monitor tcp:0:4003,server,nowait -vnc :3 -qmp tcp:0:3003,server,nowait -name 
'XXX' -boot order=c,once=dc,menu=off  -drive 
index=2,media=cdrom,if=ide,cache=unsafe,aio=native,readonly=on  -k de  
-incoming tcp:0:5003  -pidfile /var/run/qemu/vm-146.pid  -mem-path /hugepages  
-mem-prealloc  -rtc base=utc -usb -usbdevice tablet -no-hpet -vga cirrus  -cpu 
qemu64,-kvmclock

Exact kernel is:
2.6.16.46-0.12-smp (i think this is SLES10 or sth.)

The machine does not hang. It seems just I/O is hanging. So you can type at the 
console or ping the system, but no longer login.

Thank you,
Peter
Interesting observation: Migrating the vServer again seems to fix to problem 
(at least in one case I could test just now).

2.6.8-24-smp is also affected.
How often does it fail - you say 'sometimes' - is it a 1/10 or a 1/1000 ?
Its more often than 1/10 I would say.
OK, that's not too bad - it's the 1/1000 that are really nasty to find.
In your setup, how easy would it be for you to try :
     with either 2.1 or current head?
     with a newer machine-type?
     without the cdrom?
Its all possible. I can clone the system and try everything on my test systems. 
I hope
it reproduces there.
Great.  I think the order I would go would be:
     Try head - if it works we know we've already got the fix somewhere
     Try 2.1  - if it works we know it's something we introduced between
                2.1 and 2.2.1
     Try a newer machine type - because pc-1.2 probably isn't tested much
     CDROM at the end.
Update:
  - head -> not working
  - 2.1.3 -> not working
  - without CROM -> not working
  - with head and no machine type specified -> not working
  - with -device isa-ide -> BIOS not booting harddisk
Well, at least it's consistent....

Will now try 1.3.1 just to be sure.

Any ideas how to debug the IDE state after migration and/or check if the issue 
is similar to the ATAPI IDE
problem?
It's unlikely to be quite the same - most of the ATAPI problems were related to 
ATAPI
being quite separate and not saving much state.

The way I found the CDROM problems was to turn on most of the debugging in the 
ide and bmdma code
and on a failed migrate try and see what the state of any IO was at the point 
it migrated.

Thats tough. I enalbed DEBUG_IDE and DEBUG_AIO at first. But I have never 
debugged IDE before so I first
have to understand how that works....

What debugging confirms is that the IDE interface ideed stalls completely.

One thing I found curious in pci.c:

#define BM_MIGRATION_COMPAT_STATUS_BITS \
        (IDE_RETRY_DMA | IDE_RETRY_PIO | \
        IDE_RETRY_READ | IDE_RETRY_FLUSH)

Why is there no IDE_RETRY_WRITE ?
Honestly, I have not yet understood that that BM_MIGRATION_COMPAT_STATUS_BITS 
is for.


One other thing to check; I found the newer kernel code recovers better after
IDE problems; so on a newer guest kernel are there any log warnings about IDE 
problems,
even if the guests are otherwise apparently happy?

I will check for that.

Thanks,
Peter



reply via email to

[Prev in Thread] Current Thread [Next in Thread]