ifile-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Ifile-discuss] Effect of widely differing volumes on ifile classifi


From: Jack Bertram
Subject: Re: [Ifile-discuss] Effect of widely differing volumes on ifile classification
Date: Mon, 7 Apr 2003 08:59:04 +0100
User-agent: Mutt/1.4i

* Jason Rennie <address@hidden> [030323 11:28]:
> address@hidden said:
> > Skewed in what respect?  Volume?  I'm a mathematician (by training, at
> > least) so at least high-level details would be interesting. 
> 
> I've attached a draft of a paper that discusses some of the issues.  
> Section 3.1 is most relevant to skewed data---when there are more 
> examples in one class than another---but you might be interested in the 
> rest of the paper too.

This is interesting - thanks.  It does pose me some interesting problems
as how best to use ifile on email, because my data is very skewed, so
over time, accuracy drops.  

In fact, the way I organise my emails is to have two parallel
hierarchies: one of which contains the most recent 100 emails in each
folder, and one of which contains the rest.  Training ifile (from
scratch) on the first hierarchy produces superb results of 99-100%
accuracy.  Training it on the second hierarchy drops accuracy to 60-70%.
This will be partly due to other effects - presence of old categories in
the archive, changing nature of corpus over time.  But it suggests that
I might be better off cleaning out the .idata file over time.

Is there any way that ifile could _directly_ age data in its database to
counter this?

jack




reply via email to

[Prev in Thread] Current Thread [Next in Thread]