qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] vfio/migration: Fix return value of vfio_migration_realize()


From: Joao Martins
Subject: Re: [PATCH] vfio/migration: Fix return value of vfio_migration_realize()
Date: Fri, 16 Jun 2023 19:06:14 +0100

On 16/06/2023 15:04, Cédric Le Goater wrote:
> On 6/16/23 12:12, Joao Martins wrote:
>> On 16/06/2023 10:53, Duan, Zhenzhong wrote:
>>>> -----Original Message-----
>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>> Sent: Friday, June 16, 2023 5:06 PM
>>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>> Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>>>> Peng, Chao P <chao.p.peng@intel.com>
>>>> Subject: Re: [PATCH] vfio/migration: Fix return value of
>>>> vfio_migration_realize()
>>>>
>>>> On 16/06/2023 03:42, Duan, Zhenzhong wrote:
>>>>>> -----Original Message-----
>>>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>>>> Sent: Thursday, June 15, 2023 6:23 PM
>>>>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>>>> Cc: alex.williamson@redhat.com; clg@redhat.com;
>>>>>> qemu-devel@nongnu.org; Peng, Chao P <chao.p.peng@intel.com>
>>>>>> Subject: Re: [PATCH] vfio/migration: Fix return value of
>>>>>> vfio_migration_realize()
>>>>>>
>>>>>> On 15/06/2023 10:19, Duan, Zhenzhong wrote:
>>>>>>>> -----Original Message-----
>>>>>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>>>>>> Sent: Thursday, June 15, 2023 4:54 PM
>>>>>>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>>>>>> Cc: alex.williamson@redhat.com; clg@redhat.com;
>>>>>>>> qemu-devel@nongnu.org; Peng, Chao P <chao.p.peng@intel.com>
>>>>>>>> Subject: Re: [PATCH] vfio/migration: Fix return value of
>>>>>>>> vfio_migration_realize()
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 15/06/2023 09:18, Zhenzhong Duan wrote:
>>>>>>>>> We should print "Migration disabled" when migration is blocked in
>>>>>>>>> vfio_migration_realize().
>>>>>>>>>
>>>>>>>>> Fix it by reverting return value of migrate_add_blocker(),
>>>>>>>>> meanwhile error out directly once migrate_add_blocker() failed.
>>>>>>>>>
>>>>>>>>
>>>>>>>> It wasn't immediately obvious from commit message that this is
>>>>>>>> mainly about just printing an error message when blockers get added
>>>>>>>> and that well
>>>>>>>> migrate_add_blocker() returns 0 on success and caller of
>>>>>>>> vfio_migration_realize expects the opposite when blockers are added.
>>>>>>>>
>>>>>>>> Perhaps better wording would be:
>>>>>>>>
>>>>>>>> migrate_add_blocker() returns 0 when successfully adding the
>>>>>>>> migration blocker. However, the caller of vfio_migration_realize()
>>>>>>>> considers that migration was blocked when the latter returned an
>>>>>>>> error. Fix it by negating the return value obtained by
>>>>>>>> migrate_add_blocker(). What matters for migration is that the
>>>>>>>> blocker is added in core migration, so this cleans up usability
>>>>>>>> such that user sees "Migrate disabled" when any of the vfio
>>>>>>>> migration blockers are
>>>>>> active.
>>>>>>>
>>>>>>> Great, I'll use your words.
>>>>>>>
>>>>>>>>
>>>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>>>>> ---
>>>>>>>>>   hw/vfio/common.c    | 4 ++--
>>>>>>>>>   hw/vfio/migration.c | 6 +++---
>>>>>>>>>   2 files changed, 5 insertions(+), 5 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c index
>>>>>>>>> fa8fd949b1cf..8505385798f3 100644
>>>>>>>>> --- a/hw/vfio/common.c
>>>>>>>>> +++ b/hw/vfio/common.c
>>>>>>>>> @@ -399,7 +399,7 @@ int
>>>>>>>>> vfio_block_multiple_devices_migration(Error
>>>>>>>> **errp)
>>>>>>>>>           multiple_devices_migration_blocker = NULL;
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>> -    return ret;
>>>>>>>>> +    return !ret;
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>>   void vfio_unblock_multiple_devices_migration(void)
>>>>>>>>> @@ -444,7 +444,7 @@ int vfio_block_giommu_migration(Error **errp)
>>>>>>>>>           giommu_migration_blocker = NULL;
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>> -    return ret;
>>>>>>>>> +    return !ret;
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>>   void vfio_migration_finalize(void) diff --git
>>>>>>>>> a/hw/vfio/migration.c b/hw/vfio/migration.c index
>>>>>>>>> 6b58dddb8859..0146521d129a 100644
>>>>>>>>> --- a/hw/vfio/migration.c
>>>>>>>>> +++ b/hw/vfio/migration.c
>>>>>>>>> @@ -646,12 +646,12 @@ int vfio_migration_realize(VFIODevice
>>>>>>>>> *vbasedev,
>>>>>>>> Error **errp)
>>>>>>>>>       }
>>>>>>>>>
>>>>>>>>>       ret = vfio_block_multiple_devices_migration(errp);
>>>>>>>>> -    if (ret) {
>>>>>>>>> +    if (ret || (errp && *errp)) {
>>>>>>>>
>>>>>>>> Why do you need this extra clause?
>>>>>>>
>>>>>>> Now that error happens, no need to add other blockers which will
>>>>>>> fail for same reason.
>>>>>>>
>>>>>>
>>>>>> But you don't need the (errp && *errp) for that as that's the whole
>>>>>> point of negating the result.
>>>>>>
>>>>>> And if there's an error set it means migrate_add_blocker failed to
>>>>>> add the blocker (e.g. snapshotting IIUC), and you would be failing here
>>>> unnecessarily?
>>>>>
>>>>> If there is an error qdev_device_add() will fail, continue execution is
>>>> meaningless here?
>>>>> There is ERRP_GUARD in this path, so it looks (*errp) is enough.
>>>>>
>>>>> If I removed (errp && *errp) to continue, need below change to bypass
>>>>> trace_vfio_migration_probe Do you prefer this way?
>>>>>
>>>>>      if (!*errp) {
>>>>>          trace_vfio_migration_probe(vbasedev->name);
>>>>>      }
>>>>>
>>>>
>>>> I am mainly questioning that the error testing is correct to test here.
>>>>
>>>> IIUC, the only one that can propagate any *new* error in
>>>> vfio_migration_realize is the calls to migrate_add_blocker failing within 
>>>> the
>>>> vfio_block* (migration code suggests that this happens on snapshotting).
>>>> Failing to add migration blocker just means we haven't installed any 
>>>> blockers.
>>>> And the current code presents that as a "Migration disabled" case. If we 
>>>> want
>>>> to preserve that behaviour on migration_add_blocker() failures (which seems
>>>> like that's what you are doing here) then instead of this:
>>>
>>> Current behavior(without my patch):
>>> "Migration disabled" isn't printed if migrate_add_blocker succeed.
>>> "Migration disabled" is printed if migrate_add_blocker fail.
>>>
>>> I think this behavior isn't correct, I want to revert it not preserve it, so
>>> I used !ret.
>>> Imagine we hotplug a vfio device when snapshotting, migrate_add_blocker 
>>> failure
>>> will lead to hotplug fail, then the guest is still migratable as no vfio
>>> plugged.
>>> But we see "Migration disabled" which will confuse us.
>>>
>>
>> /me nods
> 
> Yes. Overall, the reason for which migration is not supported or is
> disabled is not very clear today (for the user). It might need some
> more adjustments, like this one, before we remove the experimental flag.

I think the "Migration Disabled" was an UX oversight when we refactored blockers
into vfio_migration_realize(). The actual migration is *not* disabled, it is
just that the message or tracepoint shouldn't be printed. The reasons why it is
blocked / not supported are clear in code right now.

> It will also help QE to define test scenarios and track expected results.
> 
>>   >>
>>>>     return !ret;
>>>>
>>>> you would do this:
>>>>
>>>>     ret = migration_add_blocker(...);
>>>>     if (ret < 0) {
>>>>         error_free(...);
>>>>         blocker = NULL;
>>>> +        return ret;
>>>>     }
>>>>
>>>> +    return 1;
>>>>
>>>> Or something like that. And then if you return early as you intended?
>>>
>>> Yes, this change make sense, I also want to add below:
>>>
>>>       if (!pdev->failover_pair_id) {
>>>           ret = vfio_migration_realize(vbasedev, errp);
>>> -        if (ret) {
>>> +        if (ret > 0) {
>>>               error_report("%s: Migration disabled", vbasedev->name);
>>>           }
>>>
>> Makes sense. Checking errp above before printing the tracepoint (like you
>> suggested) is also an option taking the discussion so far, but perhaps the
>> return type to be bool to make it clear to callers that this is not no 
>> longer an
>> error code? Maybe let's wait for others on what style is generally preferred 
>> in
>> error propagation.
> 
> I think the code should follow the qdev_realize() prototype, which returns
> a bool.
> 

That makes sense, it makes the decision a lot simpler.

> Thanks,
> 
> C.
> 
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]