pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pan-users] Re: Big XML files... (was Re: Re: Better processing of very


From: Matej Cepl
Subject: [Pan-users] Re: Big XML files... (was Re: Re: Better processing of very large groups?)
Date: Sun, 5 Jul 2009 06:18:07 +0000 (UTC)
User-agent: Pan/0.133 (House of Butterflies)

Steven D'Aprano, Sun, 05 Jul 2009 11:58:28 +1000:
> The missus uses Thunderbird, and as near as we can tell, its spam
> filtering is crap. She found false negative rates approaching 50% (half
> the actual spam was flagged as good) and false positive rates
> approaching 10% (one out of ten good emails was flagged as spam).

No, it isn't, but the problem is that as every Bayesian filter (and I am 
a big fan of them), it needs a lot of training. Thunderbird trying to be 
easy of use hides its users from this ugly fact[1] and delivers some kind 
of generalized set of trained data for some generalized entity 
eliminating by it the biggest strength of Bayesian filters, which is that 
they are unpredictable by spammers and indivualized to ones mailing 
patterns. If properly trained (by several THOUSAND of BOTH spam and ham 
messages), it can work pretty well.

Best,

Matěj
------------------
1) see for example http://bogofilter.svn.sourceforge.net/viewvc/
bogofilter/trunk/bogofilter/doc/bogofilter-tuning.HOWTO.html for thorough 
discussion of the issues related.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]