qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] test-filter-mirror hangs


From: Markus Armbruster
Subject: Re: [Qemu-devel] test-filter-mirror hangs
Date: Fri, 25 Jan 2019 08:12:49 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Jason Wang <address@hidden> writes:

> On 2019/1/24 下午5:47, Markus Armbruster wrote:
>> Please cc: me on QMP issues.
>
>
> Ok.
>
>
>>
>> Jason Wang <address@hidden> writes:
>>
>>> On 2019/1/24 上午3:53, Dr. David Alan Gilbert wrote:
>>>> * Jason Wang (address@hidden) wrote:
>>>>> On 2019/1/22 上午2:56, Peter Maydell wrote:
>>>>>> On Thu, 17 Jan 2019 at 09:46, Jason Wang<address@hidden>  wrote:
>>>>>>> On 2019/1/15 上午12:33, Zhang Chen wrote:
>>>>>>>> On Sat, Jan 12, 2019 at 12:15 AM Dr. David Alan Gilbert
>>>>>>>> <address@hidden  <mailto:address@hidden>> wrote:
>>>>>>>>
>>>>>>>>        * Peter Maydell (address@hidden
>>>>>>>>        <mailto:address@hidden>) wrote:
>>>>>>>>        > Recently I've noticed that test-filter-mirror has been hanging
>>>>>>>>        > intermittently, typically when run on some other TCG 
>>>>>>>> architecture.
>>>>>>>>        > In the instance I've just looked at, this was with s390x 
>>>>>>>> guest on
>>>>>>>>        > x86-64 host, though I've also seen it on other host archs and
>>>>>>>>        > perhaps with other guests.
>>>>>>>>
>>>>>>>>        Watch out to see if you really do see it for other guests;
>>>>>>>>        it carefully avoids using virtio-net to avoid vhost; but on 
>>>>>>>> s390x it
>>>>>>>>        uses virtio-net-ccw - could that hit the vhost it was trying to 
>>>>>>>> avoid?
>>>>>>>>
>>>>>>>>        > Below is a backtrace, though it seems to be pretty unhelpful.
>>>>>>>>        > Anybody got any theories ? Does the mirror test rely on dirty
>>>>>>>>        > memory bitmaps like the migration test (which also hangs
>>>>>>>>        > occasionally with TCG due to some bug I'm sure we've 
>>>>>>>> investigated
>>>>>>>>        > in the past) ?
>>>>>>>>
>>>>>>>>        I don't think it relies on the CPU at all.
>>>>>>>>     I have no idea about this currently, but Jason and I designed the
>>>>>>>> test case.
>>>>>>>> Add Jason: Have any comments about this ?
>>>>>>> I can't reproduce this locally with s390x-softmmu. It looks to me the
>>>>>>> test should be independent to any kinds of emulation. It should pass
>>>>>>> when mainloop work.
>>>>>> I've just seen a hang with ppc64 guest on s390x host, so it is
>>>>>> indeed not specific to s390x guest (and so not specific to
>>>>>> virtio-net either, since the ppc64 guest setup uses e1000).
>>>>>>
>>>>>> thanks
>>>>>> -- PMM
>>>>> Finally reproduced locally after hundreds (sometimes thousands) times of
>>>>> running.
>>>>>
>>>>> Bisection points to OOB monitor[1].
>>>>>
>>>>> It looks to me after OOB is used unconditionally we lose a barrier to make
>>>>> sure socket is connected before sending packets in test-filter-mirror.c. 
>>>>> Is
>>>>> there any other similar and simple thing that we could do to kick the
>>>>> mainloop?
>>>> Do you mean the:
>>>>
>>>>       /* send a qmp command to guarantee that 'connected' is setting to 
>>>> true. */
>>>>       qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
>>>
>>> Yes.
>>>
>>>
>>>> why was that ever sufficient to know the socket was ready?
>>>
>>> It was suggested by Fam, I don't remember the details. Can we make
>>> sure all pending events has been processed (UNIX socket was set to
>>> connected) after query-status is returned with an non OOB monitor?
>> I'm afraid I lack context.  Which socket are you talking about?  The
>> test has at least the QMP socket, the send_sock[], and recv_sock.  What
>> exactly are you trying to accomplish?
>
>
> I mean recv_sock. If mirror tries to send a packet to it before its
> is_connected is set to true, packet will be dropped.

So the *socket* is connected (in the TCP sense), but something else
(whatever owns is_connected) is not.  Can you point me to where
is_connected is set to true?

>> By the way, mkstemp(sock_path) followed by unix_connect(sock_path, NULL)
>> looks rather fishy.  Why create a temporary file only to create a Unix
>> domain socket right over it?
>
>
> I vaguely remember passing fd created by unix domain socket doesn't
> work when the test is introduced. So my understanding is the author
> needs a way to create a unique file name which will be used b Unix
> domain socket at that time.

We should really, really, really improve the test harness to run each
test program in its very own temporary directory.  Then tests can simply
create files with fixed names, and leave cleanup to the test harness.

>>   Why is ignoring errors a good idea?
>
>
> I don't get, which error is missed, it checks the return value of both
> mkstemp() and unix_connect().

Now I neglected to provide enough context for you :)

I read

    recv_sock = unix_connect(sock_path, NULL);

and immediately went "why are errors ignored".  If I had read on (as I
should've), I would've seen the are not:

    g_assert_cmpint(recv_sock, !=, -1);

Sorry for the noise.

I'd replace both lines by

    recv_sock = unix_connect(sock_path, &error_abort);

Reports the actual error, which is an obvious improvement, with the
location pointing to the failing spot within unix_connect().  To find
where unix_connect() was called, you need to examine the stack
backtrace.  Strictly more information, but your actual mileage may vary.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]