chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] Scheduler fdset assert patch


From: Jonathan Chan
Subject: Re: [Chicken-hackers] Scheduler fdset assert patch
Date: Fri, 10 Jul 2015 20:45:02 -0400

Hello all,
 
Here is the scheduler assert patch, test cases, and integration with the test suite combined into a single patch!
After a bit more testing with code that used a lot of threads, I ended up looking at how make-thread uses ##sys#make-thread and added some missing code to the patch's usage of ##sys#make-thread.
 
Best,
Jonathan Chan
 
On Jul 9, 2015, at 4:29 PM, Jonathan Chan <address@hidden> wrote:
 
... and here is hopefully the final version of the patch for review. In
the end I think hijacking a thread to run the signal handler is too much
prone to error. Although the previous patches try to cope with this
behavior, as soon as the signal handler tries to do something like lock
a mutex via srfi-18 or yield, very strange things will happen depending
on what thread was hijacked, which threads were blocking, etc. Blocking
##sys#schedule from running is a partial fix, but doesn't help when it
would have been expected for ##sys#schedule not to return immediately,
e.g. if a signal handler tries to block on a socket. For example, seems
to make the the tcp unit go into a constant loop while trying to recv.
 
In this newest patch, the ##sys#interrupt-handler in scheduler.scm will
instead save the old handler in ##sys#signal-vector, change the handler
to one that makes a thread with ##sys#make-thread to call the old
handler and then schedules that thread to run, wrap state so that the
original ##sys#interrupt-handler will reset the handler and then finally
call the old hook so everything starts going.
 
In addition to the new patch, I have attached a revised version of the
old test case that should be more likely to trigger the assert failure
in unpatched versions of the scheduler as well as a new test case that
1) shows problems with the previous patches 2) ensures that the signal
handler can properly call blocking code and 3) possibly demonstrates a
bug with multiple threads blocking on one fd that I am not certain
about.
 
Instructions for scheduler-fdset.scm:
 
1. Run with csi or csc.
2. Send SIGINT (e.g. with Ctrl-C)
3. The test should write "locking... unlocking... unlocked" and then
exit without error.
 
Instructions for scheduler-fdset-2.scm:
You will need to compile sink.c with gcc sink.c -o sink
 
1. Run the test
2. Wait a short bit for the child processes to start
3. Start netcat in a new terminal with `nc -l 8080`
4. Send SIGINT (e.g. with ^C)
5. Verify that the process is not using 100% CPU
6. Type `hi` (or anything) in the terminal with netcat and press Return
7. The test should exit without error
 
Sorry for the many versions and emails! This at least lets me get back
to the work that needed the scheduler in the first place.
I hope that this finally kills the bug...
 
All the best,
-- 
 Jonathan Chan
 address@hidden
 
On Thu, Jul 9, 2015, at 03:18 AM, Jonathan Chan wrote:
I realized that if I am able to set a flag about whether or not we are
in the interrupt hook, I might as well save the previous block object of
the thread and then restore it instead of possibly causing spurious
wakeups. As an aside, it seems as if force-primordial doesn't currently
do this, and relies on the code that asked the scheduler to block on i/o
to retry on its own. As I stated in the previous email, this seems to be
what the code that uses the scheduler's blocking facilities does anyway.
 
In any case, this patch also passes my tests (as the previous one did)
but now should more cleanly avoid "spurious wakeups" of threads blocking
on I/O.
 
Also, I realized that the test case lacked instructions; you should just
be able to start it, wait a second, then send SIGINT (e.g. with Ctrl-C)
and the unpatched scheduler should cause the program to exit with an
assert error while with the patched scheduler you should just see a
nice:
 
   address@hidden tests]$ csi -s scheduler-fdset.scm
   ^Clocking...
   unlocking...
   unlocked...
   done!
 
Best regards,
-- 
 Jonathan Chan
 address@hidden
 
On Wed, Jul 8, 2015, at 11:34 PM, Jonathan Chan wrote:
Hi again,
 
It turns out that changing the current thread's state to 'running
doesn't actually work in all cases because a) unblocking from condition
variables will skip a thread if its state is running b) the call to
change the state back was never called because the call to the old hook
would never return. I found this by doing some more testing with the
program that caused me to run into the original assert bug.
 
Here is a revised patch which instead sets a variable to indicate
whether or not we are in the interrupt hook and skips the scheduler if
so. I've also included an updated test case that catches the problem
with the previous patch.
 
Best regards,
-- 
 Jonathan Chan
 address@hidden
 
On Wed, Jul 8, 2015, at 08:12 PM, Jonathan Chan wrote:
Hi all,
 
I ended up having some free time before the weekend and went back to the
scheduler assert problem. I wrote a test case for the bug and revised
the scheduler assert patch after some further testing.
 
In the previous patch, the thread in which the interrupt handler ran
would just be removed from ##sys#fd-list if the block object was changed
from a file. This didn't seem to handle some cases where the interrupt
handler ended up unblocking. In one of these cases, when mutex-lock! and
then mutex-unlock! were called in succession, it also turned out that
the "current thread"'s state needed to be saved, set to 'running during
the interrupt handler, then restored in order to avoid mutex-unlock!
from calling the scheduler and causing the program to hang/do strange
things.
 
I've attached the the test case for the bug and the revised patch.
Again, if someone experienced with the scheduler could review it and
make sure it doesn't cause strange things to happen, it would be very
much appreciated.
 
Best regards,
-- 
 Jonathan Chan
 address@hidden
 
On Sat, Jul 4, 2015, at 07:49 PM, Jonathan Chan wrote:
Hello all,
 
I was in the IRC channel earlier trying to find out why my program was 
failing due to a scheduler assertion about the fdset for poll. This 
would happen consistently when I sent SIGINT.
 
The error looks like:
 
scheduler.c:51: C_fd_ready: Assertion `fd == C_fdset_set[pos].fd' failed
 
After looking at the scheduler code, it looks like this happened because 
the SIGINT was causing the interrupt handler to be called, which ended 
up unblocking some thread to run a signal handler, which caused a 
condition variable broadcast. In the end, this made a thread that was 
previously blocking on a file descriptor to now block on a mutex.
 
When create-fdset procedure iterated over the ##sys#fd-list, it skipped 
threads in the list who were not blocking on fds while it added those 
who were to the fdset using fdset-set. However, it did not remove these 
threads from ##sys#fd-list, which resulted in the scheduler throwing an 
error because it expected a fd to be in the fdset when it was not there.
 
I have attached a patch which modifies create-fdset to remove threads 
from ##sys#fd-list who are not blocking on an fd, rather than only skip 
them. This seems to fix my problem and has not introduced any bizarre 
behvaior in the few tests I have run, but hopefully someone who is more 
familiar with the scheduler could look it over and see if it fixes the 
issue.
 
Best regards,
-- 
  Jonathan Chan
  address@hidden
_______________________________________________
Chicken-hackers mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Email had 1 attachment:
+ scheduler-assert.diff
 2k (text/plain)
_______________________________________________
Chicken-hackers mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Email had 2 attachments:
+ scheduler-assert-fix.diff
 5k (application/octet-stream)
+ scheduler-fdset.scm
 2k (application/octet-stream)
_______________________________________________
Chicken-hackers mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Email had 3 attachments:
+ scheduler-fdset.scm
 2k (application/octet-stream)
+ sink.c
 1k (text/plain)
+ scheduler-assert-fix.diff
 14k (application/octet-stream)
_______________________________________________
Chicken-hackers mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Email had 1 attachment:
+ scheduler-assert-fix.diff
 14k (application/octet-stream)
<scheduler-assert-new.diff><scheduler-fdset-2.scm><scheduler-fdset.scm><sink.c>_______________________________________________
Chicken-hackers mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/chicken-hackers
 

Attachment: scheduler-assert-final.diff
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]