qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Lost partition tables on ide-hd + ahci drive


From: Fiona Ebner
Subject: Re: Lost partition tables on ide-hd + ahci drive
Date: Thu, 15 Jun 2023 09:04:19 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0

Am 14.06.23 um 16:48 schrieb Simon J. Rowe:
> On 02/02/2023 12:08, Fiona Ebner wrote:
>> Hi,
>> over the years we've got 1-2 dozen reports[0] about suddenly
>> missing/corrupted MBR/partition tables. The issue seems to be very rare
>> and there was no success in trying to reproduce it yet. I'm asking here
>> in the hope that somebody has seen something similar.
>>
>> The only commonality seems to be the use of an ide-hd drive with ahci
>> bus.
>>
>> It does seem to happen with both Linux and Windows guests (one of the
>> reports even mentions FreeBSD) and backing storages for the VMs include
>> ZFS, RBD, LVM-Thin as well as file-based storages.
>>
>> Relevant part of an example configuration:
>>
>>>    -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' \
>>>    -drive
>>> 'file=/dev/zvol/myzpool/vm-168-disk-0,if=none,id=drive-sata0,format=raw,cache=none,aio=io_uring,detect-zeroes=on'
>>>  \
>>>    -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0' \
>> The first reports are from before io_uring was used and there are also
>> reports with writeback cache mode and discard=on,detect-zeroes=unmap.
>>
>> Some reports say that the issue occurred under high IO load.
>>
>> Many reports suspect backups causing the issue. Our backup mechanism
>> uses backup_job_create() for each drive and runs the jobs sequentially.
>> It uses a custom block driver as the backup target which just forwards
>> the writes to the actual target which can be a file or our backup server.
>> (If you really want to see the details, apply the patches in [1] and see
>> pve-backup.c and block/backup-dump.c).
>>
>> Of course, the backup job will read sector 0 of the source disk, but I
>> really can't see where a stray write would happen, why the issue would
>> trigger so rarely or why seemingly only ide-hd+ahci would be affected.
>>
>> So again, just asking if somebody has seen something similar or has a
>> hunch of what the cause might be.
>>
>> [0]: https://bugzilla.proxmox.com/show_bug.cgi?id=2874
>> [1]:
>> https://git.proxmox.com/?p=pve-qemu.git;a=tree;f=debian/patches;hb=HEAD
>>
>>
> We've also seen a handful of similar reports. Again, just the MBR sector
> overwritten by what looks to be guest data (e.g. log messages). The
> common thread with our incidents is again a SATA disk under the AHCI
> controller, we have a network backend (iSCSI) which has experienced a
> failure.
> 
> I've tried to repro this with blkdebug and simulated write errors,
> without success.
> 

Hi,
which version/build of QEMU are you using? Can you correlate the issue
with any block job or was the drive in use by the guest only?

Best Regards,
Fiona




reply via email to

[Prev in Thread] Current Thread [Next in Thread]