ifile-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ifile-discuss] Improving classification of spams


From: Booker Bense
Subject: Re: [Ifile-discuss] Improving classification of spams
Date: Fri, 10 Jan 2003 17:13:26 -0800 (PST)

On Fri, 10 Jan 2003, Jack Bertram wrote:

> Hi all
>
> I use ifile to filter into about 30 different folders and it does a very
> good job on nearly all mail.  However, it does a much less good job at
> correctly recognising spam email as spam. Now, I'm much happier with
> false negatives than false positives, so this isn't too much of a
> problem, but it does lead me to wonder why spam email in particular is a
> problem.
>
> My hypothesis is simple: my other folders are fairly homogenous, since
> they correspond to particular mailing lists, mail from particular people
> tending to talk about similar things, etc.  But spam email falls into a
> number of different categories: Nigerian spam, porn, etc, yet I put it
> in one folder.  Since ifile essentially computes an "average" for each
> folder, and compares an incoming email to that average, non-homogenous
> folders are harder to match correctly than homogenous ones.
>
> So, I'm asking two questions:
>
> 1. Is this hypothesis any good - does anyone else have the same
> experience as me, with non-spam categorised correctly but spam not
> recognised so well?

- So far, I've had very good luck with the spam/non-spam issue,
however I have hybrid system where I index everything, but only
use ifile as a last resort. (i.e. I have a bunch of prefilters
based on the sender/headers if none of those match see what ifile
suggests. ) Every message is indexed by ifile after it gets
filtered.

>
> 2. How many different sorts of spam do I have to distinguish in order to
> make spam matching work better?  Will a porn/non-porn distinction work
> well, or do I need to use more spam categories in order to get good
> matching.  What do other people on this list do?
>

- What I've done is distinguish between "ispam" and "spam". ispam
is basically anything that isn't plain ascii and spam is for
ascii readable spam.

- Also, I throw away the .idata file and reindex things every
couple of weeks. Not sure if this has any effect or not.

- Booker C. Bense

P.S. I have a Ruby module for using ifile w/the rmail package
if anybody is interested.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]