qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [qemu-s390x] [PATCH v3 2/2] s390x/pci: Unplug remaining


From: Pierre Morel
Subject: Re: [Qemu-devel] [qemu-s390x] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset
Date: Tue, 29 Jan 2019 17:50:25 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1

On 29/01/2019 16:11, David Hildenbrand wrote:
On 29.01.19 14:50, Pierre Morel wrote:
On 29/01/2019 11:24, David Hildenbrand wrote:
I'm wondering what the architecture says regarding those events -- can
someone with access to the documentation comment?

Ping. Any comments from the IBM folks?

Hi,

Sorry to have wait so long.
At least Collin was faster.




So the idea here is that if we have a PCI device that is the process of
being deconfigured and we are also in the middle of a reset, then let's
accelerate deconfiguring of the PCI device during the reset. Makes sense.

to me too.
However, how do we ensure that the guest got time to respond to the
first deconfigure request?

30 seconds, then the reboot. On a reboot, I don't see why we should give
a guest more time. "It's dead", rip out the card as the guest refused to
hand it back. (maybe it crashed! but after a reboot the guest state is
reset and baught back to life)

I agree that 30 seconds is way enough,
also agree that in most cases the guest is dead.





Note:

The callback function will deconfigure the the device and put it into
standby mode. However, a PCI device should only go into standby from the
*disabled state* (which it could already be in due to the unplug
sequence), or from a *permanent error state* (something we should
hopefully never see -- this means something went seriously wrong with
the device).

Not completely exact, the CHSC event 0x304, on the initiative of the
host, force the "deconfigure state" from "configure state" generally,
whatever sub-state it has (enabled/disabled/error...).


Right, this should already have been checked before setting up the timer.

Apropos timer, do we need a timer or wouldn't it be better to use a
delay / a timer + condition?

I don't think we need a timer at all.

Yes, if it is possible to wait synchronously 30s it seems good to me.



AFAIU we get out of the unplug without waiting for any answer from the
guest and we surely get the timer triggering after the reset has been done.
That seems bad.

This is the case right now, correct.





Two things I'm concerned about:

1)

What I would suggest is adding a check for the pbdev->state for
ZPCI_FS_DISABLED || ZPCI_FS_PERMANENT_ERROR. If it is in either of these
states, then we're safe to deconfigure and put into standby. If the

We setup a timer if !ZPCI_FS_STANDBY and !ZPCI_FS_RESERVED.

So for
- ZPCI_FS_DISABLED
- ZPCI_FS_ENABLED
- ZPCI_FS_BLOCKED
- ZPCI_FS_ERROR
- ZPCI_FS_PERMANENT_ERROR

We setup a timer and simply go ahead and unplug the device when the
timer expires ("forced unplug").

I agree only for ZPCI_FS_ENABLED why do we need to be smooth for other
states?
ZPCI_FS_DISABLED may be a candidate even I do not see the interrest but
other states of the device should issue a HP_EVENT_DECONFIGURE_REQUEST
and we do not need a timer (or any delay)

You can always expect that your guest driver is dead.

hum, the device is dead.
May be the guest got hit too if the driver is not right written.




Changing that behavior might be more invasive. Simply not unplugging in
s390_pcihost_timer_cb() on some of these states would mean that somebody
issued a request and that requests is simply lost/ignored. Not sure if
that is what we want. I think we need separate patches to change
something like that. Especially

1. What happens if the device was in ZPCI_FS_DISABLED, the guest ignores
the unplug request and moves the device to ZPCI_FS_ENABLED before the
timer expires? These are corner cases to consider.

+1, we must ensure to do the work inside the unplug CB.


2. Do we need a timer at all? Now that Patch #1 introduces
unplug_requests, we are free to ignore requests from the user if the
guest is not reacting. I would really favor getting rid of the timer
completely. Was there a special reason why this was introduced?

Yes, to let a chance to the guest to smoothly relinquish the device.
(for example sync/clean the disk)
However I do not think it is right implemented.


No other architecture (e.g. ACPI) uses such a timer. They use a simple
flag to remember if a request is pending. I would really favor going
into that direction.

I am not sure that the Intel architecture is a good example. :)

Right, we all learned that zPCI did it better. (sorry ;) )

Well I really think so.
It is designed with several guest in parallel and shared devices.

In such an architecture, ripping of a device from a guest may have interest.
One good thing would be that the software of the guest handle it :)



AFAIU they do not wait for the guest to have relinquish the device.
Or do they?
How long do they wait?

They wait for ever. And I guess we should do the same thing. If the
guest driver is broken (and this is really a rare scenario!) we would
not get the device back. Which is perfectly fine in my point of view. In
all other scenarios I guess the guest will simply respond in a timely
manner. And ripping out stuff from the guest always feels wrong. (yes
the channel subsystem works like that, but here we actually have a choice)

If we reboot, we can unplug the device. Otherwise, let's keep it simply
and don't use a timer.

Thanks!


AFAIK We have no plan to operate on pools of PCI devices so for me I have no objection to keep it simple:

Regards,
Pierre



--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany




reply via email to

[Prev in Thread] Current Thread [Next in Thread]