qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to tame CI?


From: Juan Quintela
Subject: Re: How to tame CI?
Date: Thu, 05 Oct 2023 16:36:10 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.3 (gnu/linux)

Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> wrote:
> On 26.07.23 16:32, Thomas Huth wrote:
>> On 26/07/2023 15.00, Peter Maydell wrote:
>>> On Wed, 26 Jul 2023 at 13:06, Juan Quintela <quintela@redhat.com> wrote:
>>>> To make things easier, this is the part that show how it breaks (this is
>>>> the gcov test):
>>>>
>>>> 357/423 qemu:block / io-qcow2-copy-before-write                            
>>>> ERROR           6.38s   exit status 1
>>>>>>> PYTHON=/builds/juan.quintela/qemu/build/pyvenv/bin/python3
>>>> MALLOC_PERTURB_=44
>>>> /builds/juan.quintela/qemu/build/pyvenv/bin/python3
>>>> /builds/juan.quintela/qemu/build/../tests/qemu-iotests/check -tap
>>>> -qcow2 copy-before-write --source-dir
>>>> /builds/juan.quintela/qemu/tests/qemu-iotests --build-dir
>>>> /builds/juan.quintela/qemu/build/tests/qemu-iotests
>>>> ――――――――――――――――――――――――――――――――――――― ✀  
>>>> ―――――――――――――――――――――――――――――――――――――
>>>> stderr:
>>>> --- 
>>>> /builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write.out
>>>> +++ 
>>>> /builds/juan.quintela/qemu/build/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
>>>> @@ -1,5 +1,21 @@
>>>> -....
>>>> +...F
>>>> +======================================================================
>>>> +FAIL: test_timeout_break_snapshot (__main__.TestCbwError)
>>>> +----------------------------------------------------------------------
>>>> +Traceback (most recent call last):
>>>> +  File 
>>>> "/builds/juan.quintela/qemu/tests/qemu-iotests/tests/copy-before-write", 
>>>> line 210, in test_timeout_break_snapshot
>>>> +    self.assertEqual(log, """\
>>>> +AssertionError: 'wrot[195 chars]read 1048576/1048576 bytes at
>>>> offset 0\n1 MiB,[46 chars]c)\n' != 'wrot[195 chars]read failed:
>>>> Permission denied\n'
>>>> +  wrote 524288/524288 bytes at offset 0
>>>> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>> +  wrote 524288/524288 bytes at offset 524288
>>>> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>> ++ read failed: Permission denied
>>>> +- read 1048576/1048576 bytes at offset 0
>>>> +- 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>>>> +
>>>
>>> This iotest failing is an intermittent that I've seen running
>>> pullreqs on master. I tend to see it on the s390 host. I
>>> suspect a race condition somewhere where it fails if the host
>>> is heavily loaded.
>> It's obviously a failure in an iotest, so let's CC: the
>> corresponding people (done now).
>> 
>
> Sorry for long delay.
>
> Does it still fail?
>
> In the test we expect that copy-before-write operation fails (because
> of throttling and timeout), and therefore snapshot is broken and next
> read from snapshot should fail.
>
> But most probably the copy-before-write operation succeeded in this
> case for some reason.. I don't think that throttling and timeouts in
> block layer can guarantee some determinism.. But usually it works.
>
> we use throttling with bps-write = 300 * 1024, i.e. 300KB per second. and 
> cbw-timeout is set to 1 second.
>
> Then we do write 512K,
>
> then the comment say:
> # We need second write to trigger throttling
>
> and we write another 512K.
>
> first 512K are written, and we should wait 512/300 = 1.7 seconds since
> _start_ of that write before issuing the second one.. But if write was
> slow we may have to wait less than a second from finish of the first
> write start the second one. Then timeout will not fire.
>
> ====
>
> I see two possible ways to fix that:
>
> 1. decrease bps-write a bit. For example to 200 BPS.
>
> 2. rework the test to use null-co instead of real images. This way we will 
> not suffer from unstable IO duration.
>
>
> So, is the problem still fire sometimes?      

For me it is random.  When it happens, it do it forever.
And then it stops, and don't happens for a while.

It is not happening for me now.

Later, Juan.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]