Re: Lost partition tables on ide-hd + ahci drive

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Lost partition tables on ide-hd + ahci drive

From:	Fiona Ebner
Subject:	Re: Lost partition tables on ide-hd + ahci drive
Date:	Thu, 15 Jun 2023 09:04:19 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0

Am 14.06.23 um 16:48 schrieb Simon J. Rowe:
> On 02/02/2023 12:08, Fiona Ebner wrote:
>> Hi,
>> over the years we've got 1-2 dozen reports[0] about suddenly
>> missing/corrupted MBR/partition tables. The issue seems to be very rare
>> and there was no success in trying to reproduce it yet. I'm asking here
>> in the hope that somebody has seen something similar.
>>
>> The only commonality seems to be the use of an ide-hd drive with ahci
>> bus.
>>
>> It does seem to happen with both Linux and Windows guests (one of the
>> reports even mentions FreeBSD) and backing storages for the VMs include
>> ZFS, RBD, LVM-Thin as well as file-based storages.
>>
>> Relevant part of an example configuration:
>>
>>>    -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' \
>>>    -drive
>>> 'file=/dev/zvol/myzpool/vm-168-disk-0,if=none,id=drive-sata0,format=raw,cache=none,aio=io_uring,detect-zeroes=on'
>>>  \
>>>    -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0' \
>> The first reports are from before io_uring was used and there are also
>> reports with writeback cache mode and discard=on,detect-zeroes=unmap.
>>
>> Some reports say that the issue occurred under high IO load.
>>
>> Many reports suspect backups causing the issue. Our backup mechanism
>> uses backup_job_create() for each drive and runs the jobs sequentially.
>> It uses a custom block driver as the backup target which just forwards
>> the writes to the actual target which can be a file or our backup server.
>> (If you really want to see the details, apply the patches in [1] and see
>> pve-backup.c and block/backup-dump.c).
>>
>> Of course, the backup job will read sector 0 of the source disk, but I
>> really can't see where a stray write would happen, why the issue would
>> trigger so rarely or why seemingly only ide-hd+ahci would be affected.
>>
>> So again, just asking if somebody has seen something similar or has a
>> hunch of what the cause might be.
>>
>> [0]: https://bugzilla.proxmox.com/show_bug.cgi?id=2874
>> [1]:
>> https://git.proxmox.com/?p=pve-qemu.git;a=tree;f=debian/patches;hb=HEAD
>>
>>
> We've also seen a handful of similar reports. Again, just the MBR sector
> overwritten by what looks to be guest data (e.g. log messages). The
> common thread with our incidents is again a SATA disk under the AHCI
> controller, we have a network backend (iSCSI) which has experienced a
> failure.
> 
> I've tried to repro this with blkdebug and simulated write errors,
> without success.
> 

Hi,
which version/build of QEMU are you using? Can you correlate the issue
with any block job or was the drive in use by the guest only?

Best Regards,
Fiona

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Lost partition tables on ide-hd + ahci drive, Simon J. Rowe, 2023/06/14
- Re: Lost partition tables on ide-hd + ahci drive, Fiona Ebner <=
  - Re: Lost partition tables on ide-hd + ahci drive, Simon Rowe, 2023/06/15

Prev by Date: Re: [PATCH v2 8/8] disas/riscv: Add support for XThead* instructions
Next by Date: Re: [QEMU PATCH 1/1] virtgpu: do not destroy resources when guest suspend
Previous by thread: Re: Lost partition tables on ide-hd + ahci drive
Next by thread: Re: Lost partition tables on ide-hd + ahci drive
Index(es):
- Date
- Thread