ifile-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ifile-discuss] usage of ifiles threshold option?


From: Paolo
Subject: Re: [Ifile-discuss] usage of ifiles threshold option?
Date: Tue, 8 Mar 2005 14:44:16 +0100
User-agent: Mutt/1.3.28i

On Mon, Mar 07, 2005 at 12:48:13PM +0100, C. Fischer wrote:
> could somebody please give an example of using ifiles `-T' (--threshold)
> option?  i want to know how to derive a specific number for it.

hello Clemens,

-T was introduced to allow for a 'grey zone' between the 2 winning 
categories (among 2 or more in the database). I.e., in a sense, it makes
1 further bin 'on the fly', into which the test item is thrown, whenever 
the 2 topmost ranks are closer than the threshold, in relative terms, 
according the the formula you get with --help:

R=(r0-r1)/(r0+r1), R*1000 < THRESH

if THRESH > 0.
Actually, you get 2 'grey zones', as you'd get a response like cat1,cat2
or cat2,cat1 according to which rank is absolute max.
In spam filtering, eg you can do a coarse classification with large 
threshold, and less comp.-expansive preprocessing, then reprocess with 
with narrower threshold, better preproc, MIME decoding etc. what makes 
into the 'unsure' bin on 2st pass.

- In previous msg you mentioned MIME processing: AFAIKT, that's not much 
effective WRT spam/ham classification - see reports in other projects, eg
CRM114 (crm114.sf.net) - see there as well for link to 'normalizemime', a 
tool to mangle/sanitize an RFC [2]822 msg in UTF-*.

- For possible algos/how to implement BCR, besides ifile itself and related 
papers, see comments in crm114 code, and you may want also to have a look at 
dabcl / L.Breyer sw/site : http://www.lbreyer.com/emailtut.html

hope his helps - if you come up with anything new/interesting pls report
back :)


-- 
 paolo
 
 GPG/PGP id:0x21426690 kfp:EDFB 0103 A8D8 4180 8AB5  D59E 9771 0F28 2142 6690
 "Indeed, it does come with warranty: it *will* fail, sometimes, somehow..."
                                                           - software vendor




reply via email to

[Prev in Thread] Current Thread [Next in Thread]