spamass-milt-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Status of CVS for 0.3.0 branch?


From: Chris Crowley
Subject: Re: Status of CVS for 0.3.0 branch?
Date: Thu, 09 Feb 2006 15:29:04 -0500
User-agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)

Dan -

Thanks for the e-mail.

I'll work with these suggestions:

>If you can get a stack trace out of the process (gdb it and run "thread
>apply all bt), that would help narrow down what's hanging.  Also try
>upgrading to a less-buggy glibc, or set the environment variable
>LD_ASSUME_KERNEL=2.4.1.  Any time I see a process hang on a futex,
>setting that has fixed it (it disables futexes entirely).
>  
>

and report what I find.  First, I'll run the stack trace on the process
as it is.  I don't use gdb frequently, so if you have recommendations
that are more specifc that what you've already provided, please e-mail
me again.  I'll try the env fix, then the glibc update if I can ( I'll
have to check some dependencies ).   It probably won't be until next week. 

Chris Crowley


Dan Nelson wrote:

>In the last episode (Feb 09), Chris Crowley said:
>  
>
>>My question, "Is CVS for the 0.3.0 branch improved from the distro,
>>and stable for production use?" If not, I'll drill down into the
>>problems with the 0.3.0 tar file that I've got, otherwise, I'll
>>install the CVS version and see if the problems persist.
>>    
>>
>
>Only minor changes have been made since 0.3.0; none should affect
>stability one way or the other.  My milters never seem to crash or
>hang, but I only process 1 message every 5-10 seconds.  Each milter
>thread is independant, so (barring OS bugs) hangs/crashes due to race
>conditions should not be possible.
>
>  
>
>>...details...
>>I've been running 0.2.0, and plan to upgrade soon.  I've build 0.3.0,
>>and have noticed in some high load testing that it fails differently
>>than the 0.2.0 spamass-milter.  By failure I mean that I see error
>>messages in the log. For example:
>><log>
>>Milter (spamassassin): local socket name /var/run/sendmail/spamass.sock unsafe
>>sendmail[10000]: ###ID: Milter (spamassassin): to error state
>>spamass-milter[13360]: SpamAssassin, mi_rd_cmd: read returned -1: Connection 
>>reset by peer
>>spamass-milter[19980]: SpamAssassin: thread_create() failed: 12, try again
>></log>
>>
>>and a strace on the process shows that it is "hung":
>><strace>
>>strace -p 13360
>>Process 13360 attached - interrupt to quit
>>futex(0xc9e20c, FUTEX_WAIT, 2, NULL <unfinished ...>
>></strace>
>>    
>>
>
>If you can get a stack trace out of the process (gdb it and run "thread
>apply all bt), that would help narrow down what's hanging.  Also try
>upgrading to a less-buggy glibc, or set the environment variable
>LD_ASSUME_KERNEL=2.4.1.  Any time I see a process hang on a futex,
>setting that has fixed it (it disables futexes entirely).
>
>  
>
>>From the logs, and a quick non-scientific assessment, I don't think
>>that 0.3.0 is failing any less frequently that 0.2.0 was.  It's just
>>that the 0.3.0 process actually persists after it fails, so my
>>restart script (which looks if the socket exists) doesn't work to
>>repair things.
>>
>>Thanks for any insight you can provide.  Of course, I'm able to
>>provide more details if they would be beneficial.
>>    
>>
>
>  
>


-- 
Christopher Crowley
Network Administrator
Tulane Technology Services
address@hidden
phone: (504) 324-2249





reply via email to

[Prev in Thread] Current Thread [Next in Thread]