[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] CC wangxiongfeng. : RE: [PATCH] pcie: fix device hotplug fa
From: |
Zhangbo (Oscar) |
Subject: |
[Qemu-devel] CC wangxiongfeng. : RE: [PATCH] pcie: fix device hotplug failure at the meantime of VM boot |
Date: |
Thu, 25 Jul 2019 03:19:11 +0000 |
>> If the PCI_EXP_LNKSTA_DLLLA capability is set by default, linux kernel will
>> send
>> PDC event to detect whether there is a device in pcie slot. If a device is
>> pluged
>> in the pcie-root-port at the same time, hot-plug device will send ABP + PDC
>> events to the kernel. The VM kernel will wrongly unplug the device if two PDC
>> events get too close. Thus we'd better set the PCI_EXP_LNKSTA_DLLLA
>> capability only in hotplug callback.
>
>Could you please describe a reproducer in a bit more detail?
>
Step1: start a VM with qemu, the VM boots up within 500ms.
/path/to/qemu-2.8.1/aarch64-softmmu/qemu-system-aarch64 \
-name test-c65961652639ccf9ce0b8476a325421811d4fdc873e90c27168497bc9e204776 \
-uuid a8ed4a86-3f49-45a3-a8ce-28d61b2f2914 \
-machine virt,usb=off,accel=kvm,gic-version=3 \
-cpu host \
-m 2048M,slots=10,maxmem=239477M \
-qmp unix:/var/run/qmp.sock,server,nowait \
-device
pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1
\
-device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 \
-device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 \
-device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 \
-device pcie-root-port,port=0xc,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x4 \
-device pcie-root-port,port=0xd,chassis=6,id=pci.6,bus=pcie.0,addr=0x1.0x5 \
-device pcie-root-port,port=0xe,chassis=7,id=pci.7,bus=pcie.0,addr=0x1.0x6 \
-device pcie-pci-bridge,id=pci.8,bus=pci.1,addr=0x0 \
-device pcie-root-port,port=0xf,chassis=9,id=pci.9,bus=pcie.0,addr=0x1.0x7 \
.......
Step2: Immediately hotplug a pcie device:
qmp_msg='{ "execute": "qmp_capabilities" }
{"arguments":{"addr":"0x0","bus":"pci.4","driver":"virtio-net-pci","id":"virtio-e1356802-4b9f-44bb-b8f0-5f98bf765823","mac":"02:42:20:6e:a2:59"},"execute":"device_del"}
{"arguments":{"id":"netport_test_1","ifname":"nfs_tap"},"execute":"netdev_del"}'
echo $qmp_msg | nc -U /var/run/qmp.sock
Result expected: hotplug successful, the pcie device could be seen inside the
VM
Result in fact: we found a "hotplug" and "unplug" message inside the VM, it
failed in hotplug.
>
>>
>> By the way, we should clean up the PCI_EXP_LNKSTA_DLLLA capability during
>> unplug to avoid VM restart or migration failure which will enter the same
>> abnormal scenario as above.
>>
>> Signed-off-by: address@hidden
>> Signed-off-by: address@hidden
>> Signed-off-by: address@hidden
>
>So looking at linux I see:
>
> * pciehp_card_present_or_link_active() - whether given slot is occupied
> * @ctrl: PCIe hotplug controller
> *
> * Unlike pciehp_card_present(), which determines presence solely from the
> * Presence Detect State bit, this helper also returns true if the Link Active
> * bit is set. This is a concession to broken hotplug ports which hardwire
> * Presence Detect State to zero, such as Wilocity's [1ae9:0200].
>
>so it looks like linux actually looks at presence detect state,
>but we have a bug just like Wilocity's and keeping
>link active up fixes that. Can't we fix the bug instead?
>
@wangxiongfeng
>> ---
>> hw/pci/pcie.c | 9 +++++----
>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
>> index a6beb56..174f392 100644
>> --- a/hw/pci/pcie.c
>> +++ b/hw/pci/pcie.c
>> @@ -75,10 +75,6 @@ pcie_cap_v1_fill(PCIDevice *dev, uint8_t port, uint8_t
>type, uint8_t version)
>>
>QEMU_PCI_EXP_LNKSTA_NLW(QEMU_PCI_EXP_LNK_X1) |
>>
>QEMU_PCI_EXP_LNKSTA_CLS(QEMU_PCI_EXP_LNK_2_5GT));
>>
>> - if (dev->cap_present & QEMU_PCIE_LNKSTA_DLLLA) {
>> - pci_word_test_and_set_mask(exp_cap + PCI_EXP_LNKSTA,
>> - PCI_EXP_LNKSTA_DLLLA);
>> - }
>>
>> /* We changed link status bits over time, and changing them across
>> * migrations is generally fine as hardware changes them too.
>
>Does this actually change anything?
>
>I don't know why do we bother setting it here but we do
>set it later in pcie_cap_slot_plug_cb, correct?
>
>I'd like to understand whether this is part of fix or
>just a cleanup.
>
>
>> @@ -484,6 +480,11 @@ void
>pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev,
>> return;
>> }
>>
>> + if (pci_dev->cap_present & QEMU_PCIE_LNKSTA_DLLLA) {
>> + pci_word_test_and_clear_mask(exp_cap + PCI_EXP_LNKSTA,
>> + PCI_EXP_LNKSTA_DLLLA);
>> + }
>> +
>> pcie_cap_slot_push_attention_button(PCI_DEVICE(hotplug_dev));
>> }
>
>So this reports data link inactive immediately after
>unplug request. Seems a bit questionable: guest did not
>respond yet. I'd like to see a comment explaining why
>this makes sense.
>
>
>> --
>> 1.8.3.1
- [Qemu-devel] CC wangxiongfeng. : RE: [PATCH] pcie: fix device hotplug failure at the meantime of VM boot,
Zhangbo (Oscar) <=