chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] [PATCH 1/2] tcp: disable interrupts


From: Florian Zumbiehl
Subject: Re: [Chicken-hackers] [PATCH 1/2] tcp: disable interrupts
Date: Sun, 17 Mar 2013 07:58:29 +0100
User-agent: Mutt/1.5.20 (2009-06-14)

Hi,

> I'm positive the race conditions have to do with errno,
> which is incompatible with SRFI-18 threads.  For example
> in the connect loop, there are two possible context
> switches in which errno can get overwritten:
> 
> 1) On entrance to (fail), prior to ##sys#update-errno
> 2) Inside (fail), entering the continuation of
>    (##sys#update-errno), just before the strerror
> 
> In 2)'s case the reported error message can be wrong.
> In 1)'s case the error message and errno can be wrong.
> In either case it's just the reported error that's wrong, not
> the fact that an error occurred.

> Florian, please try the attached patch.  It simply inlines
> (fail), removing the context switch at 1), and also passes
> the return value of ##sys#update-errno to strerror, eliminating 2).
> 
> Assuming this works, I can produce a patch to fix the rest
> of the ##sys#update-errno + signal-hook clauses in the same way,
> as this combination is unsafe with SRFI 18.

Somehow, this feels to me like you are applying test driven development
techniques to concurrency correctness. Are you sure that your reasoning is
correct and reliable and future-proof? Are the assumptions you make about
the compiler true now for sure and can that be expected to stay that way
even with new optimizations, say?

I don't mind testing things as a cheap way to potentially falsify
assumptions that are incorrect despite a proof to the contrary, but using
tests as a way to establish correctness, especially for concurrency stuff,
strikes me as a very bad idea indeed. If you cannot be sufficiently sure
that your approach is correct to be willing to build a full fix on it
without testing first, I would consider that a sign that the approach is
too fragile to rely on, at least without a good reason.

As for the robustness of the code in the face of future changes, I would
submit that the current code being defective could be considered evidence
that the current approach is indeed too fragile. Not only could it break
inadvertently, it _did_ already break.

Are there any problems with disabling interrupts? The potential for
starvation mentioned by Felix should be quite easy to fix by scheduling
manually after every write() or read() call and not just on
EAGAIN/EWOULDBLOCK/EINTR, I think?!

Alternatively, I guess some cli/sti mechanism would be a clean solution? I
obviously have no clue how easy that would be implement as syntax
understood and enforced by the compiler, but maybe a dynamic solution would
also be OK? That is, to not prevent the compiler from inserting scheduling
calls, but to make them check a flag that causes the actual context switch
to be deferred until sti time.

> Note that once tcp-connect returns, all bets are off on
> the validity of (errno) anyway.

Yeah, I am considerung submitting a patch that removes errno. It seems to
me that it does more harm than good--AFAICS, pretty much any use of it is a
bug that could cause silent corruption, so throwing an exception due to an
undefined symbol seems like the better alternative to me.

>                  (let ((f (##net#select-write s)))
> -                  (when (eq? f -1) (fail))
> +                  (when (eq? f -1)
> +                    (##net#close s)

That ##net#close could also overwrite errno, BTW.

Regards, Florian



reply via email to

[Prev in Thread] Current Thread [Next in Thread]