qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error


From: Fernando Casas Schössow
Subject: Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error
Date: Thu, 31 Jan 2019 11:32:32 +0000

Hi,

Sorry for resurrecting this thread after so long but I just upgraded the host 
to Qemu 3.1 and libvirt 4.10 and I'm still facing this problem.
At the moment I cannot use virtio disks (virtio-blk nor virtio-scsi) with my 
guests in order to avoid this issue so as a workaround I'm using SATA emulated 
storage which is not ideal but is perfectly stable.

Do you have any suggestions on how can I progress troubleshooting?
Qemu is not crashing so I don't have any dumps that can be analyzed. The guest 
is just "stuck" and all I can do is destroy it and start it again.
It's really frustrating that after all this time I couldn't find the cause for 
this issue so any ideas are welcome.

Thanks.

Fernando

________________________________
From: Fernando Casas Schössow <address@hidden>
Sent: Saturday, June 24, 2017 10:34 AM
To: Ladi Prosek
Cc: address@hidden
Subject: Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error

Hi Ladi,

After running for about 15hrs two different guests (one Windows, one Linux) 
crashed with around 1 hour difference and the same error in qemu log "Virqueue 
size exceeded".

The Linux guest was already running on virtio_scsi and without virtio_balloon. 
:(
I compiled and attached gdbserver to the qemu process for this guest but when I 
did this I got the following warning in gdbserver:

warning: Cannot call inferior functions, Linux kernel PaX protection forbids 
return to non-executable pages!

The default Alpine kernel is a grsec kernel. Not sure if this will interfere 
with debugging or not but I suspect yes.
If you need me to replace the grsec kernel with a vanilla one (also available 
as an option in Alpine) let me know and I will do so.
Otherwise send me an email directly so I can share with you the host:port 
details so you can connect to gdbserver.

Thanks,

Fer

On vie, jun 23, 2017 at 8:29 , Fernando Casas Schössow <address@hidden> wrote:
Hi Ladi,

Small update. Memtest86+ was running on the host for more than 54 hours. 8 
passes were completed and no memory errors found. For now I think we can assume 
that the host memory is ok.

I just started all the guests one hour ago. I will monitor them and once one 
fails I will attach the debugger and let you know.

Thanks.

Fer

On jue, jun 22, 2017 at 9:43 , Ladi Prosek <address@hidden> wrote:
Hi Fernando, On Wed, Jun 21, 2017 at 2:19 PM, Fernando Casas Schössow 
<address@hidden<mailto:address@hidden>> wrote:
Hi Ladi, Sorry for the delay in my reply. I will leave the host kernel alone 
for now then. For the last 15 hours or so I'm running memtest86+ on the host. 
So far so good. Two passes no errors so far. I will try to leave it running for 
at least another 24hr and report back the results. Hopefully we can discard the 
memory issue at hardware level. Regarding KSM, that will be the next thing I 
will disable if after removing the balloon device guests still crash. About 
leaving a guest in a failed state for you to debug it remotely, that's 
absolutely an option. We just need to coordinate so I can give you remote 
access to the host and so on. Let me know if any preparation is needed in 
advance and which tools you need installed on the host.
I think that gdbserver attached to the QEMU process should be enough. When the 
VM gets into the broken state please do something like: gdbserver --attach 
host:12345 <QEMU pid> and let me know the host name and port (12345 in the 
above example).
Once I again I would like to thank you for all your help and your great 
disposition!
You're absolutely welcome, I don't think I've done anything helpful so far :)
Cheers, Fer On mar, jun 20, 2017 at 9:52 , Ladi Prosek 
<address@hidden<mailto:address@hidden>> wrote: The host kernel is less likely 
to be responsible for this, in my opinion. I'd hold off on that for now. And 
last but not least KSM is enabled on the host. Should I disable it? Could be 
worth the try. Following your advice I will run memtest on the host and report 
back. Just as a side comment, the host is running on ECC memory. I see. Would 
it be possible for you, once a guest is in the broken state, to make it 
available for debugging? By attaching gdb to the QEMU process for example and 
letting me poke around it remotely? Thanks!






reply via email to

[Prev in Thread] Current Thread [Next in Thread]