qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] virtio: use shadow_avail_idx while checking number of heads


From: Ilya Maximets
Subject: Re: [PATCH] virtio: use shadow_avail_idx while checking number of heads
Date: Tue, 26 Sep 2023 00:13:11 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

On 9/25/23 23:24, Michael S. Tsirkin wrote:
> On Mon, Sep 25, 2023 at 10:58:05PM +0200, Ilya Maximets wrote:
>> On 9/25/23 17:38, Stefan Hajnoczi wrote:
>>> On Mon, 25 Sept 2023 at 11:36, Ilya Maximets <i.maximets@ovn.org> wrote:
>>>>
>>>> On 9/25/23 17:12, Stefan Hajnoczi wrote:
>>>>> On Mon, 25 Sept 2023 at 11:02, Ilya Maximets <i.maximets@ovn.org> wrote:
>>>>>>
>>>>>> On 9/25/23 16:23, Stefan Hajnoczi wrote:
>>>>>>> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets <i.maximets@ovn.org> wrote:
>>>>>>>>
>>>>>>>> We do not need the most up to date number of heads, we only want to
>>>>>>>> know if there is at least one.
>>>>>>>>
>>>>>>>> Use shadow variable as long as it is not equal to the last available
>>>>>>>> index checked.  This avoids expensive qatomic dereference of the
>>>>>>>> RCU-protected memory region cache as well as the memory access itself
>>>>>>>> and the subsequent memory barrier.
>>>>>>>>
>>>>>>>> The change improves performance of the af-xdp network backend by 2-3%.
>>>>>>>>
>>>>>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>>>>>> ---
>>>>>>>>  hw/virtio/virtio.c | 10 +++++++++-
>>>>>>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>>>>>>>> index 309038fd46..04bf7cc977 100644
>>>>>>>> --- a/hw/virtio/virtio.c
>>>>>>>> +++ b/hw/virtio/virtio.c
>>>>>>>> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const 
>>>>>>>> VirtQueueElement *elem,
>>>>>>>>  /* Called within rcu_read_lock().  */
>>>>>>>>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
>>>>>>>>  {
>>>>>>>> -    uint16_t num_heads = vring_avail_idx(vq) - idx;
>>>>>>>> +    uint16_t num_heads;
>>>>>>>> +
>>>>>>>> +    if (vq->shadow_avail_idx != idx) {
>>>>>>>> +        num_heads = vq->shadow_avail_idx - idx;
>>>>>>>> +
>>>>>>>> +        return num_heads;
>>>>>>>
>>>>>>> This still needs to check num_heads > vq->vring.num and return -EINVAL
>>>>>>> as is done below.
>>>>>>
>>>>>> Hmm, yeas, you're right.  If the value was incorrect initially, the 
>>>>>> shadow
>>>>>> will be incorrect.  However, I think we should just not return here in 
>>>>>> this
>>>>>> case and let vring_avail_idx() to grab an actual new value below.  
>>>>>> Otherwise
>>>>>> we may never break out of this error.
>>>>>>
>>>>>> Does that make sense?
>>>>>
>>>>> No, because virtio_error() marks the device as broken. The device
>>>>> requires a reset in order to function again. Fetching
>>>>> vring_avail_idx() again won't help.
>>>>
>>>> OK, I see.  In this case we're talking about situation where
>>>> vring_avail_idx() was called in some other place and stored a bad value
>>>> in the shadow variable, then virtqueue_num_heads() got called.  Right?
>>
>> Hmm, I suppose we also need a read barrier after all even if we use
>> a shadow index.  Assuming the index is correct, but the shadow variable
>> was updated by a call outside of this function, then we may miss a
>> barrier and read the descriptor out of order, in theory.  Read barrier
>> is going to be a compiler barrier on x86, so the performance gain from
>> this patch should still be mostly there.  I'll test that.
> 
> I can't say I understand generally. shadow is under qemu control,
> I don't think it can be updated concurrently by multiple CPUs.

It can't, I agree.  Scenario I'm thinking about is the following:

1. vring_avail_idx() is called from one of the places other than
   virtqueue_num_heads().  Shadow is updated with the current value.
   Some users of vring_avail_idx() do not use barriers after the call.

2. virtqueue_split_get_avail_bytes() is called.

3. virtqueue_split_get_avail_bytes() calls virtqueue_num_heads().

4. virtqueue_num_heads() checks the shadow and returns early.

5. virtqueue_split_get_avail_bytes() calls vring_split_desc_read() and
   reads the descriptor.

If between steps 1 and 5 we do not have a read barrier, we potentially
risk reading descriptor data that is not yet fully written, because
there is no guarantee that reading the last_avail_idx on step 1 wasn't
reordered with the descriptor read.

In current code we always have smp_rmb() in virtqueue_num_heads().
But if we return from this function without a barrier, we may have an
issue, IIUC.

I agree that it's kind of a very unlikely scenario and we will probably
have a control dependency between steps 1 and 5 that will prevent the
issue, but it might be safer to just have an explicit barrier in
virtqueue_num_heads().

Does that make sense?  Or am I missing something else here?

> 
> 
>>>>
>>>> AFAIU, we can still just fall through here and let vring_avail_idx()
>>>> to read the index again and fail the existing check.  That would happen
>>>> today without this patch applied.
>>>
>>> Yes, that is fine.
>>>
>>>>
>>>> I'm jut trying to avoid duplication of the virtio_error call, i.e.:
>>>>
>>>>     if (vq->shadow_avail_idx != idx) {
>>>>         num_heads = vq->shadow_avail_idx - idx;
>>>>
>>>>         /* Check it isn't doing very strange things with descriptor 
>>>> numbers. */
>>>>         if (num_heads > vq->vring.num) {
>>>>             virtio_error(vq->vdev, "Guest moved used index from %u to %u",
>>>>                          idx, vq->shadow_avail_idx);
>>>>             return -EINVAL;
>>>>         }
>>>>         return num_heads;
>>>>     }
>>>>
>>>> vs
>>>>
>>>>     if (vq->shadow_avail_idx != idx) {
>>>>         num_heads = vq->shadow_avail_idx - idx;
>>>>
>>>>         /* Only use the shadow value if it was good initially. */
>>>>         if (num_heads <= vq->vring.num) {
>>>>             return num_heads;
>>>>         }
>>>>     }
>>>>
>>>>
>>>> What do you think?
>>>
>>> Sounds good.
>>>
>>>>
>>>> Best regards, Ilya Maximets.
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]