qemu-s390x
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code


From: Thomas Huth
Subject: Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code
Date: Tue, 16 Feb 2021 12:47:40 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0

On 16/02/2021 12.43, Peter Maydell wrote:
On Tue, 16 Feb 2021 at 11:01, Thomas Huth <thuth@redhat.com> wrote:
According to the virtio specification, a memory barrier should be
used before incrementing the idx field in the "available" ring.
So far, we did not do this in the s390-ccw bios yet, but recently
Peter Maydell saw problems with the s390-ccw bios when running
the qtests on an aarch64 host (the bios panic'ed with the message:
"SCSI cannot report LUNs: response VS RESP=09"), which could
maybe be related to the missing memory barriers. Thus let's add
those barriers now. Since we've only seen the problem on TCG so far,
a "bcr 14,0" should be sufficient here to trigger the tcg_gen_mb()
in the TCG translate code.

(Note: The virtio spec also talks about using a memory barrier
*after* incrementing the idx field, but if I understood correctly
this is only required when using notification suppression - which
we don't use in the s390-ccw bios here)

Signed-off-by: Thomas Huth <thuth@redhat.com>

This noticeably increases the reliability of the test for me:
it goes from failing one time in 2 or 3 to failing about one time
in 5 or 6. However it does still hit the virtio_scsi_parse_req()
check on "out_size && in_size" eventually. So my guess is that this
is not a sufficient supply of barriers. I'm firmly not a virtio expert
but my reading of the spec suggested that some of the way the
s390-ccw code is doing things might be in the wrong order and
need restructuring beyond merely "add barriers to existing code".

I've had a look at the virtio 1.0 spec and looked for "barrier" there, and these two spots were the only ones that I identified that were definitely missing. The other spots in the spec that talk about memory barriers rather seem to be related to notification suppression, which we are not doing here, if I've got that right. So no clue what else could be going wrong here on your aarch64 host...

 Thomas




reply via email to

[Prev in Thread] Current Thread [Next in Thread]