qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [qemu-s390x] [PATCH v3 2/2] s390x/pci: Unplug remaining


From: David Hildenbrand
Subject: Re: [Qemu-devel] [qemu-s390x] [PATCH v3 2/2] s390x/pci: Unplug remaining devices on pcihost reset
Date: Tue, 29 Jan 2019 11:24:41 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1

>>> I'm wondering what the architecture says regarding those events -- can
>>> someone with access to the documentation comment?
>>
>> Ping. Any comments from the IBM folks?
>>
> 
> 
> So the idea here is that if we have a PCI device that is the process of 
> being deconfigured and we are also in the middle of a reset, then let's 
> accelerate deconfiguring of the PCI device during the reset. Makes sense.
> 
> Note:
> 
> The callback function will deconfigure the the device and put it into 
> standby mode. However, a PCI device should only go into standby from the 
> *disabled state* (which it could already be in due to the unplug 
> sequence), or from a *permanent error state* (something we should 
> hopefully never see -- this means something went seriously wrong with 
> the device).

Right, this should already have been checked before setting up the timer.

> 
> Two things I'm concerned about:
> 
> 1)
> 
> What I would suggest is adding a check for the pbdev->state for 
> ZPCI_FS_DISABLED || ZPCI_FS_PERMANENT_ERROR. If it is in either of these 
> states, then we're safe to deconfigure and put into standby. If the 

We setup a timer if !ZPCI_FS_STANDBY and !ZPCI_FS_RESERVED.

So for
- ZPCI_FS_DISABLED
- ZPCI_FS_ENABLED
- ZPCI_FS_BLOCKED
- ZPCI_FS_ERROR
- ZPCI_FS_PERMANENT_ERROR

We setup a timer and simply go ahead and unplug the device when the
timer expires ("forced unplug").

Changing that behavior might be more invasive. Simply not unplugging in
s390_pcihost_timer_cb() on some of these states would mean that somebody
issued a request and that requests is simply lost/ignored. Not sure if
that is what we want. I think we need separate patches to change
something like that. Especially

1. What happens if the device was in ZPCI_FS_DISABLED, the guest ignores
the unplug request and moves the device to ZPCI_FS_ENABLED before the
timer expires? These are corner cases to consider.

2. Do we need a timer at all? Now that Patch #1 introduces
unplug_requests, we are free to ignore requests from the user if the
guest is not reacting. I would really favor getting rid of the timer
completely. Was there a special reason why this was introduced?

No other architecture (e.g. ACPI) uses such a timer. They use a simple
flag to remember if a request is pending. I would really favor going
into that direction.

@Christian do you think we need this "force remove after 30 seconds"
timer thingy?

> device is still in another state (such as enabled or blocked, etc) then 
> we should allow the timer to resume and give the device some more time 
> before forcing an unplug. It's also probably not a good idea to try and 
> deconfigure a device that might already be deconfigured (e.g. if it's 
> already in standby or reserved state). That might not happen though, but 
> it's good to cover our bases.
> 
> A side thought: In addition to checking the states, what would happen if 
> you forced the timer to 0? Would the callback get called? Would that 
> just accelerate the already-in-progress unplug sequence?

I guess so, but then action will be called asynchronously from the main
loop when timers are run. Calling it synchronously during the reset
makes in my point of view more sense.

> 
> and 2)
> 
> I worry that the sclp might try to deconfigure a PCI device at the same 
> time we force the callback in this patch. I noticed that the 
> sclp_deconfigure function also checks on the release timer before 
> unplugging. I think we should make sure the timer is properly stopped or 
> canceled before the sclp tries to deconfigure and before this patch 
> forces the callback, that way we hopefully won't try to do both at the 
> same time.

Not necessary. Timers are ran from the main thread, just as we (system
reset) are. We don't have any race conditions here. This would only be
possible when being triggered by a VCPU without holding the iothread
mutex. But this is never the case. All VCPUs and even timers (due to the
virtual clock being stopped) are stopped during system reset.


-- 

Thanks,

David / dhildenb



reply via email to

[Prev in Thread] Current Thread [Next in Thread]