Thanks. I discovered today that the backup procedure I was using,
based off of this script -
http://davemorris.wordpress.com/2010/11/22/kvm-snapshot-backups-with-qemu-img/ -
was the source of the corruption because qemu-img snapshot should not
be used on a running image file.
It sounds like a safe alternative for live backups will be available soon:
http://www.linux-kvm.com/content/first-look-virtual-machine-online-disk-snapshots-coming-fedora-18
Thanks again,
Andrew
------------------------------------------------------------------------
*From: *"Jakob Bohm" <address@hidden>
*To: address@hidden
*Sent: *Wednesday, August 15, 2012 12:37:01 PM
*Subject: *Re: [Qemu-discuss] Source of QCOW Image Corruption
Please note that qemu 0.12.3 is quite old now, but "long term
stability" OS distributions are defined by freezing all software
versions at some point in time, and then maintaining those
versions with cherry picked backported patches for 5 to 10
years, so this is normal.
However I guess that a lot of folks around here will simply
refuse to deal with such old software, because they believe
in the bleeding edge.
On 8/14/2012 10:54 PM, Andrew Martin wrote:
Some additional information:
Cloning the VM disk image with qemu-img convert results in an image that
appears to be error free and can be mounted successfully:
# qemu-img convert myvm.qcow2 -O qcow2 test.qcow2
# qemu-img check test.qcow2
No errors were found on the image.
Also, the following error is logged in /var/log/libvirt/qemu/myvm.log when
attempting to start the VM:
# tail /var/log/libvirt/qemu/myvm.log
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin
QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 8192 -smp 6 -name
myvm -uuid 14a9dd6b-7a80-b286-8558-8c0c1f0324dc -chardev
socket,id=monitor,path=/var/lib/libvirt/qemu/myvm.monitor,server,nowait
-monitor chardev:monitor -boot c -drive
file=/mnt/storage/vmstore/disks/myvm.qcow2,if=virtio,index=0,boot=on,format=qcow2,cache=none
-drive file=/dev/drbd1,if=virtio,index=1,format=raw -drive
file=/dev/drbd2,if=virtio,index=2,format=raw -net
nic,macaddr=00:16:3e:32:35:82,vlan=0,model=virtio,name=virtio.0 -net
tap,fd=55,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0
-parallel none -usb -vnc 127.0.0.1:0 -vga cirrus
char device redirected to /dev/pts/0
pci_add_option_rom: failed to find romfile "pxe-virtio.bin"
qcow2_free_clusters failed: Invalid argument
These systems are running the stock Ubuntu 10.04 version of qemu-common,
qemu-kvm, and kvm (0.12.3+noroms-0ubuntu9.19).
Thanks,
Andrew
----- Original Message -----
From: "Andrew Martin"<address@hidden>
To:address@hidden
Sent: Tuesday, August 14, 2012 1:53:44 PM
Subject: [Qemu-discuss] Source of QCOW Image Corruption
Hello,
I have two KVM virtual machine nodes in a high-availability cluster using
Pacemaker + Heartbeat on Ubuntu 10.04 Server amd64. This cluster hosts a single
Ubuntu 10.04 VM which uses a qcow2 image file, myvm.qcow2, with a backing file,
backingfile.qcow2. This morning, the VM suddenly powered off. I attempted to
start it again with virsh start domain, but it would only start briefly and
then power off again. I checked the qcow2 disk image and found countless
corruption errors:
address@hidden:/mnt/storage/vmstore/disks# qemu-img info myvm.qcow2
image: myvm.qcow2
file format: qcow2
virtual size: 9.8G (10485760000 bytes)
disk size: 13G
cluster_size: 65536
backing file: backingfile.qcow2 (actual path: backingfile.qcow2)
Snapshot list:
ID TAG VM SIZE DATE VM CLOCK
1.5G 2056-05-05 21:01:212795663:45:42.642
/archive/1006/20100627000/2il_root/save/archive/1002/20100204005/1 743M
1995-08-16 12:47:352289751:06:20.183
address@hidden:/mnt/storage/vmstore/disks# qemu-img check myvm.qcow2 2>&1 |
head
ERROR OFLAG_COPIED: offset=80000002047d0000 refcount=0
ERROR OFLAG_COPIED: offset=8000000212e50000 refcount=0
ERROR OFLAG_COPIED: offset=80000001ffde0000 refcount=0
ERROR OFLAG_COPIED: offset=80000001ff710000 refcount=0
ERROR OFLAG_COPIED: offset=8000000216ec0000 refcount=0
ERROR OFLAG_COPIED: offset=8000000206db0000 refcount=0
ERROR OFLAG_COPIED: offset=80000001ff720000 refcount=0
ERROR OFLAG_COPIED: offset=80000001ffdf0000 refcount=0
ERROR OFLAG_COPIED: offset=8000000212e60000 refcount=0
ERROR OFLAG_COPIED: offset=8000000212e70000 refcount=0
address@hidden:/mnt/storage/vmstore/disks# qemu-img info backingfile.qcow2
image: backingfile.qcow2
file format: qcow2
virtual size: 9.8G (10485760000 bytes)
disk size: 4.8G
cluster_size: 65536
address@hidden:/mnt/storage/vmstore/disks# qemu-img check backingfile.qcow2
No errors were found on the image.
I had this happen a month ago on the same machine but a different physical
drive, so I do not believe it to be a physical disk failure. I can find nothing
in /var/log that gives any more information related to this corruption. What
other debug information can I provide to diagnose why these images are getting
corrupted and taking these running VMs offline?