qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize


From: Duan, Zhenzhong
Subject: RE: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize
Date: Tue, 27 Jun 2023 02:38:56 +0000

>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Sent: Monday, June 26, 2023 6:08 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>avihaih@nvidia.com; Peng, Chao P <chao.p.peng@intel.com>
>Subject: Re: [PATCH v3 1/3] vfio/pci: Fix resource leak in vfio_realize
>
>On 26/06/2023 08:02, Duan, Zhenzhong wrote:
>>> -----Original Message-----
>>> From: Duan, Zhenzhong
>>> Sent: Sunday, June 25, 2023 2:01 PM
>>> To: Joao Martins <joao.m.martins@oracle.com>
>>> Cc: alex.williamson@redhat.com; clg@redhat.com;
>>> qemu-devel@nongnu.org; avihaih@nvidia.com; Peng, Chao P
>>> <chao.p.peng@intel.com>
>>> Subject: RE: [PATCH v3 1/3] vfio/pci: Fix resource leak in
>>> vfio_realize
>>>
>>>> -----Original Message-----
>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>> Sent: Wednesday, June 21, 2023 7:08 PM
>>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>> Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-
>>> devel@nongnu.org;
>>>> avihaih@nvidia.com; Peng, Chao P <chao.p.peng@intel.com>
>>>> Subject: Re: [PATCH v3 1/3] vfio/pci: Fix resource leak in
>>>> vfio_realize
>>>>
>>>>
>>>>
>>>> On 21/06/2023 09:02, Zhenzhong Duan wrote:
>>>>> When adding migration blocker failed in vfio_migration_realize(),
>>>>> hotplug will fail and we see below:
>>>>>
>>>>> (qemu) device_add
>>>>> vfio-pci,host=81:11.1,id=vfio1,bus=root1,x-enable-migration=true
>>>>> 0000:81:11.1: VFIO migration is not supported in kernel
>>>>> Error: disallowing migration blocker (--only-migratable) for: VFIO
>>>>> device doesn't support migration
>>>>>
>>>>> If we hotplug again we should see same log as above, but we see:
>>>>> (qemu) device_add
>>>>> vfio-pci,host=81:11.0,id=vfio0,bus=root0,x-enable-migration=true
>>>>> Error: vfio 0000:81:11.0: device is already attached
>>>>>
>>>>> That's because some references to VFIO device isn't released, we
>>>>> should check return value of vfio_migration_realize() and release
>>>>> the references, then VFIO device will be truely released when
>>>>> hotplug failed.
>>>>>
>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>> ---
>>>>>  hw/vfio/pci.c | 1 +
>>>>>  1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
>>>>> 73874a94de12..c71b0955d81c 100644
>>>>> --- a/hw/vfio/pci.c
>>>>> +++ b/hw/vfio/pci.c
>>>>> @@ -3210,6 +3210,7 @@ static void vfio_realize(PCIDevice *pdev,
>>>>> Error
>>>> **errp)
>>>>>          ret = vfio_migration_realize(vbasedev, errp);
>>>>>          if (ret) {
>>>>>              error_report("%s: Migration disabled",
>>>>> vbasedev->name);
>>>>> +            goto out_deregister;
>>>>>          }
>>>>>      }
>>>>>
>>>> This doesn't look right. This means that failure to support
>>>> migration will deregister the device.
>>>
>>> In my understanding, failure to support migration but success to add
>>> migration blocker will not deregister device.
>>> vfio_migration_realize() is successful in this case.
>>> But failure to add migration blocker should deregister device,
>>> because
>>> vfio_exitfn() is never called in this case(errp set), jumping to
>>> out_deregister is the best choice. Then vfio_migration_realize()
>>> should return failure in this case.
>>
>
>I was checking other devices in the tree, and I think you are right. Failure to
>add the migration blocker results in a failure of device realize. Which IIUC 
>only
>happens in 'only-migratable' setups and during snapshots.
Yes.

>
>Maybe also including a sentence explainer in the commit message is useful
>too.
Sure, will do.

>
>> I just realized the error path in vfio_realize() isn't adequate. We need to 
>> add
>more deregister code just as vfio_exitfn(), see below. I'll re-post after we 
>are
>consensus on this and get some comments of PATCH3.
>>
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3210,7 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error
>**errp)
>>          ret = vfio_migration_realize(vbasedev, errp);
>>          if (ret) {
>>              error_report("%s: Migration disabled", vbasedev->name);
>> -            goto out_deregister;
>> +            goto out_vfio_migration;
>>          }
>>      }
>>
>> @@ -3220,11 +3220,17 @@ static void vfio_realize(PCIDevice *pdev,
>> Error **errp)
>>
>>      return;
>>
>> +out_vfio_migration:
>> +    vfio_migration_exit(vbasedev);
>
>This belongs in this patch from the looks
Yes, I plan to merge above change to this patch.

>
>>  out_deregister:
>>      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>>      if (vdev->irqchip_change_notifier.notify) {
>>          kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
>>      }
>> +    vfio_disable_interrupts(vdev);
>> +    if (vdev->intx.mmap_timer) {
>> +        timer_free(vdev->intx.mmap_timer);
>> +    }
>
>But this one suggests another one as it looks a pre-existing issue?
Yes, it's another resource leak I just found.
Not sure if it's better to also merge above change to this patch which is 
targeting resource leak issues,
or to PATCH2 which is targeting out_deregister path, or to create a new one.
Any suggestion?

Thanks
Zhenzhong

reply via email to

[Prev in Thread] Current Thread [Next in Thread]