qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/4] vl: Prioritize device realizations


From: David Hildenbrand
Subject: Re: [PATCH 0/4] vl: Prioritize device realizations
Date: Wed, 20 Oct 2021 15:58:20 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.1.0

On 20.10.21 15:48, Daniel P. Berrangé wrote:
> On Wed, Oct 20, 2021 at 03:44:08PM +0200, David Hildenbrand wrote:
>> On 18.08.21 21:42, Peter Xu wrote:
>>> This is a long pending issue that we haven't fixed.  The issue is in QEMU we
>>> have implicit device ordering requirement when realizing, otherwise some of 
>>> the
>>> device may not work properly.
>>>
>>> The initial requirement comes from when vfio-pci starts to work with 
>>> vIOMMUs.
>>> To make sure vfio-pci will get the correct DMA address space, the vIOMMU 
>>> device
>>> needs to be created before vfio-pci otherwise vfio-pci will stop working 
>>> when
>>> the guest enables the vIOMMU and the device at the same time.
>>>
>>> AFAIU Libvirt should have code that guarantees that.  For QEMU cmdline 
>>> users,
>>> they need to pay attention or things will stop working at some point.
>>>
>>> Recently there's a growing and similar requirement on vDPA.  It's not a hard
>>> requirement so far but vDPA has patches that try to workaround this issue.
>>>
>>> This patchset allows us to realize the devices in the order that e.g. 
>>> platform
>>> devices will be created first (bus device, IOMMU, etc.), then the rest of
>>> normal devices.  It's done simply by ordering the QemuOptsList of "device"
>>> entries before realization.  The priority so far comes from migration
>>> priorities which could be a little bit odd, but that's really about the same
>>> problem and we can clean that part up in the future.
>>>
>>> Libvirt can still keep its ordering for sure so old QEMU will still work,
>>> however that won't be needed for new qemus after this patchset, so with the 
>>> new
>>> binary we should be able to specify qemu cmdline as wish on '-device'.
>>>
>>> Logically this should also work for vDPA and the workaround code can be done
>>> with more straightforward approaches.
>>>
>>> Please review, thanks.
>>
>> Hi Peter, looks like I have another use case:
>>
>> vhost devices can heavily restrict the number of available memslots:
>> e.g., upstream KVM ~64k, vhost-user usually 32 (!). With virtio-mem
>> intending to make use of multiple memslots [1] and auto-detecting how
>> many to use based on currently avilable memslots when plugging and
>> realizing the virtio-mem device, this implies that realizing vhost
>> devices (especially vhost-user device) after virtio-mem devices can
>> similarly result in issues: when trying realization of the vhost device
>> with restricted memslots, QEMU will bail out.
>>
>> So similarly, we'd want to realize any vhost-* before any virtio-mem device.
> 
> Ordering virtio-mem vs vhost-* devices doesn't feel like a good
> solution to this problem. eg if you start a guest with several
> vhost-* devices, then virtio-mem auto-decides to use all/most
> remaining memslots, we've now surely broken the abiltiy to then
> hotplug more vhost-* devices at runtime by not leaving memslots
> for them.

You can hotplug vhost-* devices devices as you want; they don't
"consume" memslots, they can only restrict the number of total memslots
if they provide less..

We have this situation today already:

Coldplug/hotplug > 32 DIMMs to a VM. Then hotplug a vhost-user device
that's based on libvhost-user or rust's vhost-user-backend. Hotplug will
fail.

Nothing really different with virtio-mem, except that you can configure
how many memslots it should actually use if you care about above situation.

> 
> I think virtio-mem configuration needs to be stable in its memslot
> usage regardless of how many other types of devices are present,
> and not auto-adjust how many it consumes.

There is a parameter to limit the number of memslots a virtio-mem device
can use ("max-memslots") to handle such corner-case environments as you
describe.

Set to 1 - exactly one ("old behavior").
Set to 0 - auto-detect.
Set to > 1 - auto detect and cap at the given value.

99.999% of all users don't care about hotplug of limiting vhost devices
and will happily use "0". The remainder can be handled via realization
priority. Nothing to confuse ordinary users with IMHO.

-- 
Thanks,

David / dhildenb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]