[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ccrtp-devel] Seeking help porting ccrtp to OpenBSD 3.9

From: David Sugar
Subject: Re: [Ccrtp-devel] Seeking help porting ccrtp to OpenBSD 3.9
Date: Sat, 01 Jul 2006 23:03:59 -0400
User-agent: Thunderbird (X11/20060614)

First, some general comments:

I am not too familiar with the situation of OpenBSD and threads either
offhand.  While I am aware they have implemented smp, you may well be
correct that they are using a userland thread library that "simulates"
threading.  If so, it's quite possible there are calls which are not
wrapped which hence block the userland system entirely.

I think the best long term strategy to deal with systems that do not
have true native threading is to add GNU Common C++ core classes that
support GNU pth, and hence adding GNU pth as a "third" threading
architecture, along with the current support for native posix pthread
and native Microsoft Windows threading.  My reason is that we could then
build against a single userland threading library for all non-native
threading cases that will have known and consistent issues we can
identify and mitigate, which will remain the same regardless of
platform, rather than trying to adapt to different userland libraries
that each may behave differently and/or have different quirks and
limitations, forcing all kinds of variant code to handle.

Of course this is not a short term solution.  I presume the currently
"unstable" rthread library will offer kernel supported posix threading,
when it is ready.  That too may be a "long term" solution for OpenBSD
:).  Short term, if there are some limited fixes or tweaks that can be
done that are OpenBSD specific, I would be willing to incorporate them
until we have a viable alternative.

In regard to signals, I try to avoid using them entirely.  The behavior
on GNU/Linux varies based on the threading implementation (LinuxThreads
vs NPTL), and is also problematic on some other "pthread" platforms,
too.  They are also entirely unportable to native Microsoft Windows

Incidentally, it is worth noting we have changed the way semaphores
operate in GNU Common C++.  Originally we used native Linux & xBSD
semaphores.  Currently, they are constructed artificially on pthread out
of conditionals and mutexes.  Since Posix threading originally specified
no native Semaphore implementation on it's own, and so this was deemed
more portable, and is considered the reference way to do it.  This also
means that while OpenBSD probably has a working native BSD semaphore
system, if either conditionals or mutexes misbehave, then the
constructed Semaphore class may also cause problems.  You may want to
look at an older (I think 1.2.x release) of Common C++ and see how we
originally did the native BSD semaphore class vs how we do it now.
Maybe it is in this area you are getting your odd scheduling behavior.

And also, as you said, you may have some hidden race and deadlock
conditions, and I have seen cases where those kinds of bugs do get
exposed/only assert themselves when testing on additional platforms.
For example, I once discovered in an application that I had accidentally
unlocked a recursive mutex more times than I locked it!  This mistake
actually produces no detectable error on GNU/Linux, and so I was not
even aware of it, until I had tested on FreeBSD and had ugly crashes.

I would be happy to see what we can do to resolve OpenBSD issues.  I
would like to see Twinkle, and other packages using ccrtp, working
correctly there.

Michael Grigoni wrote:
> Hi Federico,
> Thanks much for your reply.
>> - The problem with ccrtptest seems related to port 34566 being
>>   unavailable for binding or any kind of conflict.
> There is nothing on that port but for a test I changed it to 2000
> in ccrtptest.cpp; the abort happened in exactly the same manner.
>>   Did you tried
>>   'ccrtptest --send' and 'ccrtptest --recv'
> There doesn't appear to be any arg parsing on 'ccrtptest.cpp';
> I changed the variable assignments  for 'send' and 'recv'
> one at a time from 'false' to 'true':
>    for 'send', ccrtptest runs silently for about five seconds and
>    quits. It doesn't appear to bind a udp port
>    for 'recv', ccrtptest runs silently for about eleven seconds and
>    quits. It _does_ bind udp port 2000 (I changed it from 34566 --
>    see above)
>> A last question. Are you currently able to get twinkle to work even
>> though some things fail?
> Michel was concerned about OpenBSD's pthreads POSIX compliance; pthreads
> in 3.9 conform to "ISO/IEC 9945-1 ANSI/IEEE (``POSIX'') Std 1003.1
> Second Edition 1996-07-12". A suite of regression tests verifies
> this functionality. It has a working recursive mutex. According
> to recent discussions in the OpenBSD lists, fairness of scheduling
> should not be a problem for an application which conforms to the
> POSIX threads specifications.
> Here is Michel's latest responses:
> This does not look nice. I think most of these problems are due
> to thread scheduling issues. In the log file you can see that
> the far-end sends a 200 OK when you answer the call. The twinkle
> listener thread receives it and prints the message in the log.
> The listener thread will handover the message to the transaction
> manager thread. But it seems this thread does not get processing
> time for a long time. The far-end keeps retransmitting the
> 200 OK and the listener thread receives them, so the listener
> thread seems to get processing time. Then all of a sudden the
> transaction manager thread gets time and the 200 OK is
> processed. Why doesn't the transaction manager thread gets
> processing time when it has work to do?
> I am not familiar with OpenBSD, but I did a quick search on Google.
> I see some articles telling the OpenBSD implements userland threads
> which do not provide true concurrency.
> The architecture of Twinkle heavily relies on fair thread
> scheduling. During a call more than 13 threads run simultaneously.
> Synchronization between the threads is mostly based on semaphores.
> On Linux with a posix thread implementation this works fine.
> I think you need an expert on OpenBSD and threading to look into
> these issues. If OpenBSD does not implement Posix compliant threads
> then I think you'll have a hard time to get Twinkle working.
> Another thing to look at is recursive mutex. Twinkle uses recursive
> mutexes. Not all OS's do support there, I believe.
> And then there is the painful coexistence of threads and
> signals. Twinkle supports LinuxThreads and NPTL to correctly
> handle SIGALARM.
> Having said this, some of the freezes may be due to bugs in
> Twinkle. I wouldn't be surprised if there are still some
> race conditions that may lead to a deadlock. I never experience
> them, but with a different thread scheduling algorithm, such
> bugs will rear their ugly heads.
>> (pretty much stuck with processes). My brief reading of LinuxThreads
>> descriptions leads me to believe that those threads are mapped to
>> processes (not very efficient?).
> Yep, that was in the 2.4 kernel, though the scheduling was good
> enough for Twinkle. Later kernels have NPTL instead of LinuxThreads.
> Threads are not mapped to processes anymore. I remember I had
> some nasty thread scheduling problems when going from LinuxThreads
> to NPTL; I had one thread holding a mutex and another thread doing
> a lock on the mutex, so the second thread gets blocked. Then the
> first thread released the mutex and quickly after it wanted to lock
> the mutex again. In LinuxThreads, the second thread got the mutex
> as soon as it was released by the first. In NPTL the first thread
> got the lock again and the second thread starved!
> There shouldn't be threads anymore in Twinkle that continuously
> lock and unlock mutexes.
>> I realize this is a daunting problem; I hope you have a few moments
>> to look at the manpage and let me know if anything in it is
>> going to be fatal to twinkle.
> I had a quick look. I find it hard to judge if there are
> real show stoppers hear. The pthread calls look good to me, but
> I cannot tell how exactly the thread scheduler will schedule the
> threads.
> Most of the threads in Twinkle do something like this in
> their mainloop:
> while (true) {
> sem_wait
> get event from queue
> do something with the event
> }
> When the queue is empty, the semaphore will be 0 and sem_wait blocks.
> Another thread puts a message in the queue and calls sem_post,
> so the sem_wait can now return.
> Question is: when will the thread scheduler allow the thread calling
> sem_wait to run?
> I cannot answer that question.
> The listener thread does a blocking I/O read on the UDP
> socket, so it can run when a UDP packet arrives.
> The ccrtp library also creates some threads. I don't know
> what their mainloop looks like.
> Same for the Qt mainloop.
> You might try and experiment with pthread_setschedparam and
> try round robin scheduling. I wanted to use that too, but on Linux
> a process needs root privileges to do that.
> On Linux I have 3 scheduling algorithms:
> 1) SCHED_OTHER - regular non-realtime.
> This is what Twinkle uses, so it is not realtime but good enough.
> 2) SCHED_FIFO - realtime fifo (need root privileges)
> 3) SCHED_RR - realtime round robin (need root privileges)
> In your man pages I only see SCHED_FIFO and SCHED_RR.
> Another call that looks interesting is: pthread_multi_np()
> The pthread_multi_np() function causes the process to return
> to multi-threaded scheduling mode.
> I have no idea what it exactly does. I don't have this call.
> Maybe there are some tools for debugging threads to see which
> threads run at what times. I don't know as I didn't need such
> a tool.
> --------------------------------------------------------------
> Here are the BUGS described in OpenBSD's pthreads manpage:
>      The library contains a scheduler that uses the process
>      virtual interval timer to pre-empt running threads.
>      This means that using setitimer(2) to alter the process
>      virtual timer will have undefined effects.  The
>      SIGVTALRM will never be delivered to threads in a process.
>      Some pthread functions fail to work correctly when linked
>      using the -g option to cc(1) or gcc(1).  The problems do
>      not occur when linked using the -ggdb option.
> I will relink ccrtp without '-g' and report the results.
> Regards,
> Michael
> _______________________________________________
> Ccrtp-devel mailing list
> address@hidden

Attachment: dyfet.vcf
Description: Vcard

reply via email to

[Prev in Thread] Current Thread [Next in Thread]