[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Ifile-discuss] Re: Saving ifile database source files
From: |
Clemens Fischer |
Subject: |
[Ifile-discuss] Re: Saving ifile database source files |
Date: |
30 Aug 2003 14:54:18 +0200 |
User-agent: |
Gnus/5.1003 (Gnus v5.10.3) Emacs/21.3 (berkeley-unix) |
* Joe Kelsey:
> Currently, I plan to delete old database files to keep the directory
> sizes under control.
you don't have to do this: ifile keeps only so many words in its
database. for this it has a stoplist and throws out rarely used
words. back when i used ifile for spam/non-spam cassification, my
database never grew beyond a few hundred kilobytes and i never had to
trim it.
> Why spend so much time on the website tallking about organizing huge
> quantities of mail if all you only really need the word counts?
a "good" spam-corpus is worth a lot (to me, at least), especially if
it contains the entire "diversity" of sh*t spammers come up with. i'd
say ifile does good with tenths of messages if you only have
spam/non-spam, but a few hundred of both are better. as for trusting
the person compiling the spams, i had a look at some, and they
contained nothing but real spam. the only thing that might matter to
you is this: spam sent to americans differs considerably from that
sent to europeans, and you definitely need a number of asian-language
spam these days.
clemens