qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] tests/avocado/reverse_debugging: Disable the ppc64 tests by


From: Daniel P . Berrangé
Subject: Re: [PATCH] tests/avocado/reverse_debugging: Disable the ppc64 tests by default
Date: Wed, 15 Nov 2023 17:22:53 +0000
User-agent: Mutt/2.2.10 (2023-03-25)

On Wed, Nov 15, 2023 at 01:14:53PM +0000, Daniel P. Berrangé wrote:
> On Wed, Nov 15, 2023 at 07:23:01AM +0100, Thomas Huth wrote:
> > On 15/11/2023 02.15, Nicholas Piggin wrote:
> > > On Wed Nov 15, 2023 at 4:29 AM AEST, Thomas Huth wrote:
> > > > On 14/11/2023 17.37, Philippe Mathieu-Daudé wrote:
> > > > > On 14/11/23 17:31, Thomas Huth wrote:
> > > > > > The tests seem currently to be broken. Disable them by default
> > > > > > until someone fixes them.
> > > > > > 
> > > > > > Signed-off-by: Thomas Huth <thuth@redhat.com>
> > > > > > ---
> > > > > >    tests/avocado/reverse_debugging.py | 7 ++++---
> > > > > >    1 file changed, 4 insertions(+), 3 deletions(-)
> > > > > 
> > > > > Similarly, I suspect 
> > > > > https://gitlab.com/qemu-project/qemu/-/issues/1961
> > > > > which has a fix ready:
> > > > > https://lore.kernel.org/qemu-devel/20231110170831.185001-1-richard.henderson@linaro.org/
> > > > > 
> > > > > Maybe wait the fix gets in first?
> > > > 
> > > > No, I applied Richard's patch, but the problem persists. Does this test
> > > > still work for you?
> > > 
> > > I bisected it to 1d4796cd008373 ("python/machine: use socketpair() for
> > > console connections"),
> > 
> > Maybe John (who wrote that commit) can help?
> 
> I find it hard to believe this commit is a direct root cause of the
> problem since all it does is change the QEMU startup sequence so that
> instead of QEMU listening for a monitor connection, it is given a
> pre-opened monitor connection.
> 
> At the very most that should affect the startup timing a little.
> 
> I notice all the reverse debugging tests have a skip on gitlab
> with a comment:
> 
>     # unidentified gitlab timeout problem
> 
> this makes be suspicious that John's patch has merely made this
> (henceforth undiagnosed) timeout more likely to ocurr.

After an absolutely horrendous hours long debugging session I think
I figured out the problem. The QEMU process is blocking in

    qemu_chr_write_buffer

spinning in the loop on EAGAIN.

The Python  Machine() class has passed one of a pre-created socketpair
FDs for the serial port chardev. The guest is trying to write to this
and blocking.  Nothing in the Machine() class is reading from the
other end of the serial port console.


Before John's change, the serial port uses a chardev in server mode
and crucially  'wait=off', and the Machine() class never opened the
console socket unless the test case wanted to read from it.

IOW, QEMU had a background job setting there waiting for a connection
that would never come.

As a result when QEMU started executing the guest, all the serial port
writes get sent into to the void.


So John's patch has had a semantic change in behaviour, because the
console socket is permanently open, and thus socket buffers are liable
to fill up.

As a demo I increased the socket buffers to 1MB and everything then
succeeded.

@@ -357,6 +360,10 @@ def _pre_launch(self) -> None:
 
         if self._console_set:
             self._cons_sock_pair = socket.socketpair()
+            self._cons_sock_pair[0].setsockopt(socket.SOL_SOCKET, 
socket.SO_SNDBUF, 1024*1024);
+            self._cons_sock_pair[0].setsockopt(socket.SOL_SOCKET, 
socket.SO_RCVBUF, 1024*1024);
+            self._cons_sock_pair[1].setsockopt(socket.SOL_SOCKET, 
socket.SO_SNDBUF, 1024*1024);
+            self._cons_sock_pair[1].setsockopt(socket.SOL_SOCKET, 
socket.SO_RCVBUF, 1024*1024);
             os.set_inheritable(self._cons_sock_pair[0].fileno(), True)
 
         # NOTE: Make sure any opened resources are *definitely* freed in


The Machine class doesn't know if anything will ever use the console,
so as is the change is unsafe.

The original goal of John's change was to guarantee we capture early
boot messages as some test need that.  

I think we need to be able to have a flag to say whether the caller needs
an "early console" facility, and only use the pre-opened FD passing for
that case. Tests we need early console will have to ask for that guarantee
explicitly.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]