chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] CHICKEN in production


From: Peter Bex
Subject: Re: [Chicken-hackers] CHICKEN in production
Date: Wed, 8 Oct 2014 09:16:56 +0200
User-agent: Mutt/1.4.2.3i

On Wed, Oct 08, 2014 at 01:31:30AM +0400, Oleg Kolosov wrote:
> On Oct 7, 2014, at 10:04 PM, Peter Bex <address@hidden> wrote:
> > The overhead of calling C should be pretty minimal in the usual cases,
> > unless strings are the only problem.  If it's the only dealbreaker,
> > I think that should be fixable.

Hello again Oleg,

Thanks for taking the time to explain a few things, this is very helpful!

> Yes, FFI overhead is within 5% of the pure C program for simple cases: 
> passing around immediate values and pointers. But we have very important use 
> case - fuzzy search in the song info database: tens of thousands of records. 
> We use custom highly tuned indexing algorithm. The initial implementation was 
> written in Scheme, was small and beautiful (according to author) but unusably 
> slow. We tried to tune it, but measured that cost of passing strings through 
> FFI is still too big and unavoidable due to copying.

It was discussed before on this list, and I shot it down due to the
danger, however I think it may be possible to change the string
representation to always include a \0 at the end, so that passing it
to C will simply be a matter of passing a pointer.  The danger could
be avoided by a taint bit: if the string is known to not contain \0,
it can be passed directly.  Otherwise, it needs to be checked and
marked if it's safe.  If it's unsafe, an exception can be thrown.

The upshot of this is that a string will need to be checked at most
once, and never copied (except as is done normally, by the GC).
This will need some looking into, because I'm not sure we have any
bits to spare in the representation :S

> Additionally, there was some performance problems with unicode handling. So, 
> now we use libc locale functions for conversions and doing indexing and 
> processing in C. This is pain but at least 3 times faster than Chicken. There 
> are still a lot of trickery on the GUI side to provide responsive incremental 
> search because the amount of data returned is still quite large. 

Unicode is still tentatively on the TODO-list for CHICKEN 5.  If you
have any useful tips or suggestions, that would be helpful.  But to be
honest I'm not sure we'll get around to it: there are still a lot of
pending patches on this list.  I'm not through polishing the numbers egg
yet, and there hasn't been any visible effort at making the core modular
aside from the "modular compiler" patch.  So much to do, so little
manpower :(

> > It takes some more practice, but debugging C code called from CHICKEN is
> > quite doable in my experience, but then I've never done huge C & CHICKEN
> > projects, only smaller libraries.  Could you explain a bit more what the
> > problems are you ran into?
> 
> Yes, I’ve done some debugging of generated code for Windows port. It is 
> possible in principle, but requires some familiarity with the implementation 
> and used as a last resort (mysterious crashes and such). In reality call 
> stacks are almost infinite - it is hard to pinpoint interesting parts within 
> the wall of f1234 functions.

Yeah, that can certainly be a problem.  Debugging CHICKEN code can be a
bit like reading tealeaves, sometimes.  I don't see an easy way to fix
that, considering Cheney on the MTA is really an essential part of CHICKEN.

> And useful info about passed arguments and such is left in the generated 
> comments - you need to inspect the sources with the ‘list’ command to view it.

That's a good thing, right?  The sources are available and can be kept
using the -k switch.  If there are other comments that could be inserted
which might be more helpful, we're open to suggestions.

> We tried to improve this with the insertion of #line directives without much 
> success - code generator is too complex, especially where FFI is involved. We 
> are inserting logging statements everywhere. Unfortunately logging 
> considerably uglifies the code and makes some functional programming idioms 
> much harder to use (like map/fold/cut oneliners). Also various analysis tools 
> like Valgrind and libc malloc checkers fall flat when Chicken is involved.

I think Jerry mentioned he used Valgrind for debugging CHICKEN/C code,
and malloc checkers should work just fine with CHICKEN: it doesn't mess
with the C heap.

> >> We also struggled with posix and process control functions a lot (long 
> >> story), trying to be functional here backfires badly, so we ended up with 
> >> straightforward and ugly code (looking like verbose C with parentheses), 
> >> replacing some functions from standard library (namely process-run) and 
> >> customized error handling.
> > 
> > Would you care to unpack this a little?
> 
> We are trying to simulate parallel processing and separate responsibilities 
> with the worker processes communicating through sockets. There are also 
> message passing threads involved for monitoring and control. Judging by the 
> history this may be the most buggy part of the project. With numerous 
> workarounds and special case handling. SIGINT handling is still buggy, but 
> not critical for production. Yes, the task is complex, but the API is too 
> confusing and fragile too. It might be adequate for C but in Scheme a lot of 
> foots was shoot away.

Perhaps the new "hardwood" egg could be used once it grows the ability
to spawn nodes.  There's also concurrent-native-callbacks.  But yeah,
this is a part where tools are lacking.

What exactly is wrong with SIGINT handling?

> > Sounds interesting.  So at least you got something out of it aside from
> > just frustration ;)
> 
> There was some discussions about replacing Chicken scheduler with libuv event 
> loop and providing filesystem and socket API on top of it. The scheduler 
> modification is necessary to block green threads to simulate synchronous 
> calls. There are a lot of custom and confusing code in Chicken around select 
> function with workarounds for Windows. We think that libuv implementation is 
> superior. There are some concept code but we’ve not progressed too far with 
> this yet.

This has been suggested before.  (how) does libuv really fix this
situation?

> > If you can pinpoint the exact places where performance is particularly
> > bad we can (at least attempt to) fix them.
> 
> Passing large number of C strings through FFI back and forth, utf-8 (we 
> tested on uppercase conversion and trimming AFAIR). Update with defstruct is 
> horribly slow - I don’t know all the details, just heard the conversation.

defstruct, eh?  I'd have to take a look at that.  defstruct was never
optimised for speed at all, so it doesn't surprise me that's where a
bottleneck is.

> Scheduler even with disable-interrupts is still active - very hard to 
> diagnose, but mysterious bugs are fixed by going down to C and not returning 
> back until everything is settled (like fork -> exec). It would be nice to 
> have an option to get rid of it, i.e. for performance critical parts we would 
> like to have complete manual control - without interrupt handling and such 
> code inserted.

Have you tried calling C_disable_interrupts()/C_enable_interrupts()
from C?  This should give you complete control of the scheduler.  If you
(declare (disable-interrupts)), interrupt checks will not be inserted
in the generated C code of that compilation unit.  However, if you
call any external code which was compiled without that declaration,
of course interrupts will still be checked while that code is running.

> > performance will *always* matter.
> 
> This is true. But our new platform is even more customized for the given use 
> cases and contains various specialized hardware to assist the CPU (like DSPs 
> and ADC/DAC’s). It is still early prototype, but we are discussing how many 
> cycles we are ready to burn for supposedly faster and straightforward 
> development process.

I wish you the best of luck with your next project!

I'd like to note, though, that it's unfortunate that we're only now
having this conversation about your previous project: I'm sure that if
you'd asked earlier on the list, we could've helped you better to debug
or work around problems.  Like for example the disable-interrupts and
defstruct issues could possibly have been solved as you ran into them.
One of the great things about CHICKEN is its community; I'd advise
everyone to make good use of it!

Cheers,
Peter
-- 
http://www.more-magic.net



reply via email to

[Prev in Thread] Current Thread [Next in Thread]