bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#9737: misc/timeout-group: spurious test failure on SLES 10.3 (coreut


From: Pádraig Brady
Subject: bug#9737: misc/timeout-group: spurious test failure on SLES 10.3 (coreutils 8.14)
Date: Thu, 03 Nov 2011 02:11:27 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0

On 10/13/2011 11:27 PM, Voelker, Bernhard wrote:
> Pádraig Brady wrote:
> 
>> On 10/13/2011 04:58 PM, Voelker, Bernhard wrote:
>>> reopen 9737
>>> thanks
>>>
>>> Pádraig Brady wrote:
>>>
>>>> Bah, this is just a racy test I think.
>>>> Hopefully the attached fixes it.
>>>
>>> Thank you for the patch.
>>>
>>> I tried it 16 times:
>>>
>> * 14x PASS, execution time real < 0.4s
>>>
>>> * 1x test failure (in the 5th run)
>>
>> So the command exited without receiving SIGINT.
>> Or perhaps the touch of the 'received.int' file
>> is being done asynch. Anything special about your
>> file system?
> 
> It's a virtual host on a ESX server farm in our data center.
> 
> address@hidden:~/berny/depot/coreutils-8.14/tests> uname -a
> Linux mchp320a 2.6.16.60-0.74.7-smp #1 SMP Fri Nov 26 09:16:10 UTC 2010 
> x86_64 x86_64 x86_64 GNU/Linux
> 
> address@hidden:~/berny/depot/coreutils-8.14/tests> df -h .
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/mapper/vg01-lvol0
>                        50G   15G   33G  31% /user
> 
> address@hidden:~/berny/depot/coreutils-8.14/tests> mount | grep /user
> /dev/mapper/vg01-lvol0 on /user type ext3 (rw,acl,user_xattr)
> 
>>> * 1x the test lasted 20s (in the 16th run)
>>
>> But this one passed, which means the command
>> did receive the SIGINT, but then didn't exit?
> 
> Sounds like one error is shadowing another.
> 
>> I'm confused, sorry,
>> Pádraig.
> 
> That's strange, indeed.
> 
> I repeated the test with < 0.2 load 100 times:
> the run #5, #18, #28, #53, #58 and #71 resulted in FAIL as above,
> and the run #24 and #25 PASSed but took 20 seconds,
> all other PASSed within <=0.3s.

I reproduced this weirdness in OpenSuse 10.3 in a VM.
Much less frequently though.
Delays in 10 out of 2750
Signal handler call failure in 1 out of 2750

The delays might be due to bash, but I updated
to 4.2 and the issue still persists.
I suspect kernel issues too.

Anyway I've attached 2 patches to replace the previous one.
The first hopefully addresses any races in the test.
I don't think you hit any of these TBH.

The second should detect the signal issues and skip the test.

cheers,
Pádraig.

Attachment: 1-timeout-races.diff
Description: Text document

Attachment: 2-timeout-skips.diff
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]