[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Ifile-discuss] Improving classification of spams
From: |
Jack Bertram |
Subject: |
[Ifile-discuss] Improving classification of spams |
Date: |
Fri, 10 Jan 2003 16:08:24 +0000 |
User-agent: |
Mutt/1.4i |
Hi all
I use ifile to filter into about 30 different folders and it does a very
good job on nearly all mail. However, it does a much less good job at
correctly recognising spam email as spam. Now, I'm much happier with
false negatives than false positives, so this isn't too much of a
problem, but it does lead me to wonder why spam email in particular is a
problem.
My hypothesis is simple: my other folders are fairly homogenous, since
they correspond to particular mailing lists, mail from particular people
tending to talk about similar things, etc. But spam email falls into a
number of different categories: Nigerian spam, porn, etc, yet I put it
in one folder. Since ifile essentially computes an "average" for each
folder, and compares an incoming email to that average, non-homogenous
folders are harder to match correctly than homogenous ones.
So, I'm asking two questions:
1. Is this hypothesis any good - does anyone else have the same
experience as me, with non-spam categorised correctly but spam not
recognised so well?
2. How many different sorts of spam do I have to distinguish in order to
make spam matching work better? Will a porn/non-porn distinction work
well, or do I need to use more spam categories in order to get good
matching. What do other people on this list do?
Cheers,
jack
- [Ifile-discuss] Improving classification of spams,
Jack Bertram <=