[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 09/11] vfio/migration: Reset device if setting recover sta
From: |
liulongfang |
Subject: |
Re: [PATCH v2 09/11] vfio/migration: Reset device if setting recover state fails |
Date: |
Tue, 11 Oct 2022 09:41:03 +0800 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 |
On 2022/5/31 1:07, Avihai Horon wrote:
> If vfio_migration_set_state() fails to set the device in the requested
> state it tries to put it in a recover state. If setting the device in
> the recover state fails as well, hw_error is triggered and the VM is
> aborted.
>
> To improve user experience and avoid VM data loss, reset the device with
> VFIO_RESET_DEVICE instead of aborting the VM.
>
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
> hw/vfio/migration.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 852759e6ca..6c34502611 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -89,8 +89,16 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
> /* Try to put the device in some good state */
> mig_state->device_state = recover_state;
> if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> - hw_error("%s: Device in error state, can't recover",
> - vbasedev->name);
> + if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {
> + hw_error("%s: Device in error state, can't recover",
> + vbasedev->name);
> + }
> +
> + error_report(
> + "%s: Device was reset due to failure in changing device
> state to recover state %s",
> + vbasedev->name, mig_state_to_str(recover_state));
> +
> + return -1;
> }
>
When I used the qemu 7.1.50 version compiled with this set of patches,
I found that after the migration failed due to disconnecting the destination VM
during the live migration process, when I exited the source qemu, the
following error would appear:
[100337.287047] BUG: Bad page state in process qemu-system-aar pfn:82199518
[100337.295815] page:00000000356de4da refcount:-2 mapcount:0 mapping:00000000000
00000 index:0x0 pfn:0x82199518
[100337.306403] flags: 0xbfff80000000000(node=0|zone=2|lastcpupid=0x7fff)
[100337.314091] raw: 0bfff80000000000 dead000000000100 dead000000000122 00000000
00000000
[100337.322589] raw: 0000000000000000 0000000000000000 fffffffeffffffff 00000000
00000000
[100337.330630] page dumped because: nonzero _refcount
[100337.335840] Modules linked in: hisi_acc_vfio_pci hisi_sec2 hisi_zip hisi_hpr
e hisi_qm uacce vfio_iommu_type1 vfio_pci vfio_pci_core vfio_virqfd vfio pv680_m
ii(O) [last unloaded: hisi_sec2]
[100337.354564] CPU: 1 PID: 786 Comm: qemu-system-aar Tainted: G B O
6.0.0-rc4+ #1
[100337.377378] Call trace:
[100337.380382] dump_backtrace.part.0+0xc4/0xd0
[100337.385791] show_stack+0x24/0x40
[100337.389478] dump_stack_lvl+0x68/0x84
[100337.394155] dump_stack+0x18/0x34
[100337.398006] bad_page+0xf0/0x120
[100337.401796] check_free_page_bad+0x84/0x90
[100337.406404] free_pcppages_bulk+0x1bc/0x2b0
[100337.411126] free_unref_page_commit+0x120/0x15c
[100337.416935] free_unref_page+0x15c/0x254
[100337.421436] free_compound_page+0x6c/0x100
[100337.425868] free_transhuge_page+0xd4/0x140
[100337.430535] destroy_large_folio+0x30/0x40
[100337.434953] release_pages+0x1bc/0x4d0
[100337.439268] free_pages_and_swap_cache+0x68/0x80
[100337.444224] tlb_batch_pages_flush+0x5c/0x94
[100337.448976] tlb_flush_mmu+0x4c/0xd4
[100337.453062] unmap_page_range+0x8d0/0xbd0
[100337.457432] unmap_single_vma+0x90/0x12c
[100337.461673] unmap_vmas+0x84/0xfc
[100337.465354] exit_mmap+0x88/0x1b0
[100337.469008] __mmput+0x48/0x134
[100337.472637] mmput+0x44/0x50
[100337.475857] do_exit+0x2b8/0x970
[100337.479641] do_group_exit+0x40/0xac
[100337.484079] get_signal+0x8c0/0x934
[100337.488215] do_notify_resume+0x1d0/0x1570
[100337.492795] el0_svc+0xa8/0xc0
[100337.496452] el0t_64_sync_handler+0x1ac/0x1b0
[100337.501187] el0t_64_sync+0x19c/0x1a0
Can anyone see what is causing this error?
> error_report("%s: Failed changing device state to %s",
> vbasedev->name,
>
Thanks
Longfang.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [PATCH v2 09/11] vfio/migration: Reset device if setting recover state fails,
liulongfang <=