[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] Regarding #1564: srfi-18: (mutex-unlock) Internal

From: Jörg F . Wittenberger
Subject: Re: [Chicken-hackers] Regarding #1564: srfi-18: (mutex-unlock) Internal scheduler error
Date: 03 Dec 2018 10:46:38 +0100

Thanks you so much Kon,

reviewing these logs helped to confirm my feelings.

Feelings, not findings. Yet.

Tinkering with these scheduler/srfi-18 issues again really made me feel bad and sorry. In fact the anger has cost me the sleep of the better half of the night. Still enrages me.

Whats going on here IMHO is that a lot of lifetime, your guys and mine, is wasted. At the same time the code quality of the result is likely worse that what I'm using as the source to cut out those patches.

As I can't outright proof this statement to you, let me recap the background for a moment: Around a decade ago I ported a rather thread-heavy thing (Askemos, which technically is something partially inspired by Erlang, bearing similarities to Termite - except that those processes are all made persistent and the states is replicated and synchronized in byzantine agreement over a part of the network; you might be able to imagine that this is really stressing the threading capabilities of the language in use) from rscheme to chicken. The code was at that time grown for ~7yrs; that's almost 100 modules, which took some months to port. ...

...Only to learn that the threading in chicken was not at all up for the job. Hence I spend a few more weeks fixing that one. Including adding an prio queue for timeout- and fd-list.

What I could NOT produce where test cases for each of the bugs (1231, 1232, 1255, 1564 - like these are not all) I fixed in the process.

Nor was is feasible to fix them one-by-one. (Yesterday evening I failed to properly backport the fix for 1564 into the ugly code implementing the timeout queue -- while asking myself why the hell it is useful; this queue should be replaced with a better version anyway.)

The result I posted on chicken-users at that time. It was a complex fix. Sure. But those where sort of interrelated bugs.

Then for about seven years I sadly maintained a chicken fork (which I'm still using in production) for these differences in order to be able to use chicken at all. Since 4.12 it is at least _possible_ to run this code on stock chicken. Partly because I changed my code to avoid triggering bugs remaining.

So for me the question remains: wouldn't it be much, much more efficient to work sort-of hand-in-hand with one of the core developers, or maybe on the list to get the remaining things (bugs and improvements) fixed and reviewed?

It would be so much more satisfying to me to actually produce code I could approve myself than backport yet another hotfix -- creating a result in the process I take issues with.

(((Going into details, I'd probably do the prio-queue different today as I learned about chickens performance details. And I'm ready to do so. But at least I'd like to be allowed to use a prio queue using a proper interface than kludging inline handling of a linear list into well tested code -- likely creating fresh bugs in the process.)))



On Dec 2 2018, Kon Lovett wrote:

see attached git (C4) & svn (C5) logs

#(in C4 core local repo)
git log --follow -p -- srfi-18.scm >srfi-18.log

#(in C5 svn local repo)
svn log --diff trunk >srfi-18_trunk-diff.log


On Dec 2, 2018, at 1:19 PM, Jörg F. Wittenberger <address@hidden> wrote:

Thanks for the replies,

chicken-install -r srfi-18 ;  did the trick already

I should have stated that that's what I have, what I've been looking for was the git history. I wonder for some statements why the hell they are there at all. Two possible reasons: a) I cleaned them up for being obsolete (due to former changes I made) b) removed since I touched the file, which begs the question "why where those added".

Never mind.  I can proceed at least.

On Dec 2 2018, Kon Lovett wrote:

well, that shows me. ;-)

trying to track down why #497 $ chicken-install -r srfi-18
mapped (srfi-18) to ()
retrieving ...

On Dec 2, 2018, at 10:42 AM, Kon Lovett <address@hidden> wrote:
C5 evicted srfi-18, along w/ srfi-1, 13, 14, & 69, to the egg store.
chicken-install -retrieve.
On Dec 2, 2018, at 10:39 AM, Jörg F. Wittenberger <address@hidden> wrote: Hi all, when I tried to reply in a timely manner I apparently sent out a link to a broken file. Sorry for that. Just wanted to see if I could create a patch for the current master. For this I need srfi-18 egg source too. Just I can't find it. Jöry On Nov 30 2018, Jörg F. Wittenberger wrote:
Hello Megane,
On Nov 30 2018, megane wrote:
Here's another version that crashes quickly with "very high
24 Error: (mutex-unlock) Internal scheduler error: unknown thread state 25 #<thread: thread1> 26 ready
This bears an uncanny resemblance to scheduler issues I've been fighting a long ago. Too long to ago.
--- A fix
Just allow the 'ready state for threads in mutex-unlock!
Is this a correct fix?
Too long ago. But it feels wrong. We'd rather make sure there is no ready thread in the queue waiting for a mutex in the first place. Diffing the changes I maintained quite a while back you will find that I added a ##sys#thread-clear-blocking-state! Towards the end of scheduler.scm and used it for consistency whereever I ran into not-so-clean unlocks. Now this is still an invasive change. But looking at the source of scheduler and srfi-18 in chicken 5 right now, I can't fight the feeling that it is working around the missing changes at several places. Best /Jörg _______________________________________________ Chicken-hackers mailing list address@hidden
Chicken-hackers mailing list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]