[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] vfio/migration: Fix return value of vfio_migration_realize()
From: |
Joao Martins |
Subject: |
Re: [PATCH] vfio/migration: Fix return value of vfio_migration_realize() |
Date: |
Fri, 16 Jun 2023 19:06:14 +0100 |
On 16/06/2023 15:04, Cédric Le Goater wrote:
> On 6/16/23 12:12, Joao Martins wrote:
>> On 16/06/2023 10:53, Duan, Zhenzhong wrote:
>>>> -----Original Message-----
>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>> Sent: Friday, June 16, 2023 5:06 PM
>>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>> Cc: alex.williamson@redhat.com; clg@redhat.com; qemu-devel@nongnu.org;
>>>> Peng, Chao P <chao.p.peng@intel.com>
>>>> Subject: Re: [PATCH] vfio/migration: Fix return value of
>>>> vfio_migration_realize()
>>>>
>>>> On 16/06/2023 03:42, Duan, Zhenzhong wrote:
>>>>>> -----Original Message-----
>>>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>>>> Sent: Thursday, June 15, 2023 6:23 PM
>>>>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>>>> Cc: alex.williamson@redhat.com; clg@redhat.com;
>>>>>> qemu-devel@nongnu.org; Peng, Chao P <chao.p.peng@intel.com>
>>>>>> Subject: Re: [PATCH] vfio/migration: Fix return value of
>>>>>> vfio_migration_realize()
>>>>>>
>>>>>> On 15/06/2023 10:19, Duan, Zhenzhong wrote:
>>>>>>>> -----Original Message-----
>>>>>>>> From: Joao Martins <joao.m.martins@oracle.com>
>>>>>>>> Sent: Thursday, June 15, 2023 4:54 PM
>>>>>>>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>>>>>>>> Cc: alex.williamson@redhat.com; clg@redhat.com;
>>>>>>>> qemu-devel@nongnu.org; Peng, Chao P <chao.p.peng@intel.com>
>>>>>>>> Subject: Re: [PATCH] vfio/migration: Fix return value of
>>>>>>>> vfio_migration_realize()
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 15/06/2023 09:18, Zhenzhong Duan wrote:
>>>>>>>>> We should print "Migration disabled" when migration is blocked in
>>>>>>>>> vfio_migration_realize().
>>>>>>>>>
>>>>>>>>> Fix it by reverting return value of migrate_add_blocker(),
>>>>>>>>> meanwhile error out directly once migrate_add_blocker() failed.
>>>>>>>>>
>>>>>>>>
>>>>>>>> It wasn't immediately obvious from commit message that this is
>>>>>>>> mainly about just printing an error message when blockers get added
>>>>>>>> and that well
>>>>>>>> migrate_add_blocker() returns 0 on success and caller of
>>>>>>>> vfio_migration_realize expects the opposite when blockers are added.
>>>>>>>>
>>>>>>>> Perhaps better wording would be:
>>>>>>>>
>>>>>>>> migrate_add_blocker() returns 0 when successfully adding the
>>>>>>>> migration blocker. However, the caller of vfio_migration_realize()
>>>>>>>> considers that migration was blocked when the latter returned an
>>>>>>>> error. Fix it by negating the return value obtained by
>>>>>>>> migrate_add_blocker(). What matters for migration is that the
>>>>>>>> blocker is added in core migration, so this cleans up usability
>>>>>>>> such that user sees "Migrate disabled" when any of the vfio
>>>>>>>> migration blockers are
>>>>>> active.
>>>>>>>
>>>>>>> Great, I'll use your words.
>>>>>>>
>>>>>>>>
>>>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>>>>> ---
>>>>>>>>> hw/vfio/common.c | 4 ++--
>>>>>>>>> hw/vfio/migration.c | 6 +++---
>>>>>>>>> 2 files changed, 5 insertions(+), 5 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c index
>>>>>>>>> fa8fd949b1cf..8505385798f3 100644
>>>>>>>>> --- a/hw/vfio/common.c
>>>>>>>>> +++ b/hw/vfio/common.c
>>>>>>>>> @@ -399,7 +399,7 @@ int
>>>>>>>>> vfio_block_multiple_devices_migration(Error
>>>>>>>> **errp)
>>>>>>>>> multiple_devices_migration_blocker = NULL;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> - return ret;
>>>>>>>>> + return !ret;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> void vfio_unblock_multiple_devices_migration(void)
>>>>>>>>> @@ -444,7 +444,7 @@ int vfio_block_giommu_migration(Error **errp)
>>>>>>>>> giommu_migration_blocker = NULL;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> - return ret;
>>>>>>>>> + return !ret;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> void vfio_migration_finalize(void) diff --git
>>>>>>>>> a/hw/vfio/migration.c b/hw/vfio/migration.c index
>>>>>>>>> 6b58dddb8859..0146521d129a 100644
>>>>>>>>> --- a/hw/vfio/migration.c
>>>>>>>>> +++ b/hw/vfio/migration.c
>>>>>>>>> @@ -646,12 +646,12 @@ int vfio_migration_realize(VFIODevice
>>>>>>>>> *vbasedev,
>>>>>>>> Error **errp)
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> ret = vfio_block_multiple_devices_migration(errp);
>>>>>>>>> - if (ret) {
>>>>>>>>> + if (ret || (errp && *errp)) {
>>>>>>>>
>>>>>>>> Why do you need this extra clause?
>>>>>>>
>>>>>>> Now that error happens, no need to add other blockers which will
>>>>>>> fail for same reason.
>>>>>>>
>>>>>>
>>>>>> But you don't need the (errp && *errp) for that as that's the whole
>>>>>> point of negating the result.
>>>>>>
>>>>>> And if there's an error set it means migrate_add_blocker failed to
>>>>>> add the blocker (e.g. snapshotting IIUC), and you would be failing here
>>>> unnecessarily?
>>>>>
>>>>> If there is an error qdev_device_add() will fail, continue execution is
>>>> meaningless here?
>>>>> There is ERRP_GUARD in this path, so it looks (*errp) is enough.
>>>>>
>>>>> If I removed (errp && *errp) to continue, need below change to bypass
>>>>> trace_vfio_migration_probe Do you prefer this way?
>>>>>
>>>>> if (!*errp) {
>>>>> trace_vfio_migration_probe(vbasedev->name);
>>>>> }
>>>>>
>>>>
>>>> I am mainly questioning that the error testing is correct to test here.
>>>>
>>>> IIUC, the only one that can propagate any *new* error in
>>>> vfio_migration_realize is the calls to migrate_add_blocker failing within
>>>> the
>>>> vfio_block* (migration code suggests that this happens on snapshotting).
>>>> Failing to add migration blocker just means we haven't installed any
>>>> blockers.
>>>> And the current code presents that as a "Migration disabled" case. If we
>>>> want
>>>> to preserve that behaviour on migration_add_blocker() failures (which seems
>>>> like that's what you are doing here) then instead of this:
>>>
>>> Current behavior(without my patch):
>>> "Migration disabled" isn't printed if migrate_add_blocker succeed.
>>> "Migration disabled" is printed if migrate_add_blocker fail.
>>>
>>> I think this behavior isn't correct, I want to revert it not preserve it, so
>>> I used !ret.
>>> Imagine we hotplug a vfio device when snapshotting, migrate_add_blocker
>>> failure
>>> will lead to hotplug fail, then the guest is still migratable as no vfio
>>> plugged.
>>> But we see "Migration disabled" which will confuse us.
>>>
>>
>> /me nods
>
> Yes. Overall, the reason for which migration is not supported or is
> disabled is not very clear today (for the user). It might need some
> more adjustments, like this one, before we remove the experimental flag.
I think the "Migration Disabled" was an UX oversight when we refactored blockers
into vfio_migration_realize(). The actual migration is *not* disabled, it is
just that the message or tracepoint shouldn't be printed. The reasons why it is
blocked / not supported are clear in code right now.
> It will also help QE to define test scenarios and track expected results.
>
>> >>
>>>> return !ret;
>>>>
>>>> you would do this:
>>>>
>>>> ret = migration_add_blocker(...);
>>>> if (ret < 0) {
>>>> error_free(...);
>>>> blocker = NULL;
>>>> + return ret;
>>>> }
>>>>
>>>> + return 1;
>>>>
>>>> Or something like that. And then if you return early as you intended?
>>>
>>> Yes, this change make sense, I also want to add below:
>>>
>>> if (!pdev->failover_pair_id) {
>>> ret = vfio_migration_realize(vbasedev, errp);
>>> - if (ret) {
>>> + if (ret > 0) {
>>> error_report("%s: Migration disabled", vbasedev->name);
>>> }
>>>
>> Makes sense. Checking errp above before printing the tracepoint (like you
>> suggested) is also an option taking the discussion so far, but perhaps the
>> return type to be bool to make it clear to callers that this is not no
>> longer an
>> error code? Maybe let's wait for others on what style is generally preferred
>> in
>> error propagation.
>
> I think the code should follow the qdev_realize() prototype, which returns
> a bool.
>
That makes sense, it makes the decision a lot simpler.
> Thanks,
>
> C.
>
>
- [PATCH] vfio/migration: Fix return value of vfio_migration_realize(), Zhenzhong Duan, 2023/06/15
- Re: [PATCH] vfio/migration: Fix return value of vfio_migration_realize(), Joao Martins, 2023/06/15
- RE: [PATCH] vfio/migration: Fix return value of vfio_migration_realize(), Duan, Zhenzhong, 2023/06/15
- Re: [PATCH] vfio/migration: Fix return value of vfio_migration_realize(), Joao Martins, 2023/06/15
- RE: [PATCH] vfio/migration: Fix return value of vfio_migration_realize(), Duan, Zhenzhong, 2023/06/15
- Re: [PATCH] vfio/migration: Fix return value of vfio_migration_realize(), Joao Martins, 2023/06/16
- RE: [PATCH] vfio/migration: Fix return value of vfio_migration_realize(), Duan, Zhenzhong, 2023/06/16
- Re: [PATCH] vfio/migration: Fix return value of vfio_migration_realize(), Joao Martins, 2023/06/16
- Re: [PATCH] vfio/migration: Fix return value of vfio_migration_realize(), Cédric Le Goater, 2023/06/16
- Re: [PATCH] vfio/migration: Fix return value of vfio_migration_realize(),
Joao Martins <=