pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Making Multiple Servers Happen In Your Lifetime


From: Mark H. Kraml
Subject: Re: [Pan-users] Making Multiple Servers Happen In Your Lifetime
Date: 07 Oct 2002 17:18:25 -0400

If we keep talking about this feature, we may get some good ideas down
and actual traction on the feature may follow.

I just signed up for a premium news account, clearly this feature would
be of great benefit. After giving it some thought it is, as stated
earlier, not so simple to do. A least not without a good hit on
performance. While my regular ISP has a limited amount of storage for
news (about 1-2 days), the premium ISP has over 20 days of storage. This
makes some newsgroups that have over 1 million headers.

It take a long time to download "ALL" headers from there. They also give
me a 6GB limit for 30 days, this means I really only want to get from
there if I really have to.

On the topic of performance, I have a 1GHz P3 with 512MB RAM, not the
fastest round, but not out of the norm of machines we would want to
support. Even today, with 1 million headers, the performance for loading
and managing groups is quite, well, unimpressive.



On Mon, 2002-10-07 at 14:46, Charles Kerr wrote:
> [Setting followups to pan-devel]
> 
> On Mon, Oct 07, 2002 at 03:06:23AM -0700, Duncan wrote:
> > As for what it can do now... On the face of it, the servers are separate.  
> > However, I was thinking about that in regards to it's cache handling the 
> > other day and wondering...  I haven't investigated in detail, but at first 
> > blush, it appears Pan keeps the group data separate by server, but has a 
> > common actual message cache, which appears to be stored based on MsgID, 
> > which 
> > remains the same between servers.  Thus, it's likely that while a message 
> > read on one server won't show as read on another, because that's tracked 
> > separately, if you go to retrieve it on the second b4 deleting it off the 
> > first (IOW, while the physical message is still in the cache), it shouldn't 
> > have to d/l it again, and should immediately see it is there, already.
> > 
> > At a minimum, due to the unified cache organized by MsgID, it should be far 
> > easier to get it working that way eventually.  However, this thought is 
> > fairly new to me, and I haven't had a real chance to explore how far it 
> > works, by loading the same group on two different servers, so it's all 
> > supposition at this point.
> 
> Yes, the unique Message-ID is the one thing we've got in our favor --
> it is be the key in any lookup table we use for cross-server support.
> However cross-server support hinges upon doing index numbers right.
> 
> News servers optimize article lookups in a group by having an index
> number for each article in the group.  User-Agents like Pan ask for
> articles by index rather than by message-id.  (Though the NNTP spec says
> you can request articles by message-id, many servers don't honor the
> request or do so with varying degrees of success)  For bonus fun,
> a crossposted article has a different index for each server+group pair.
> 
> So to identify an article with enough detail to search across separate
> servers, and to ensure that a single delete/save/read propagates the
> state across servers & groups, the Message-ID needs to map to tuples
> of [server,group,index].  Happily we can get this information by parsing
> the Xref headers fetched from each server.
> 
> (I wrote tasks.dtd with this in mind -- see task.xml's message identifiers ;)
> 
> Chris and I have talked about replacing Pan's current data file format with
> SQLite <http://www.hwaci.com/sw/sqlite/>, which is small, fast, and portable
> enough to to not scuttle the Windows port.   Letting a database map the
> message-id to [server,group,index] tuples would be much better than munging
> the current data files to do this,  since currently each server+group pair
> has its own file, and read article indices are stored in a per-server file.
> 
> An issue tied to managing these msgid->(server,group,index)+ relations
> is how to track read articles.  Mapping msgid to a "read" flag is easy
> to write, but it's insufficient for importing/exporting newsrc files:
> if we key off the msgid internally, any article in the newsrc string
> that's been deleted in Pan will show up as unread in Pan's exported newsrc
> file:
> 
>    (1) Pan user imports a .newsrc from her other newsreader.  It includes:
>        "alt.binaries.sounds.mp3.jackhammers: 1-8000, 8010-8014, 8020"
>    (2) User in Pan deletes some articles which had the indices 8011 and 8013
>    (3) User exits Pan, which writes the following line to .newsrc:
>        "alt.binaries.sounds.mp3.jackhammers: 1-8000, 8010,8012,8014, 8020"
>    (4) Back in the other newsreader, articles 8011 and 8013 are now unread.
> 
> To have isomorphic .newsrc import/exports, it would be better to keep
> read/unread flags markings in a [server,group,newsrc] tuple where the newsrc
> is some representation of a single newsrc line (in the db, a newsrc string;
> in Pan, a pan/base/Newsrc object).  This needs to be taken into account to get
> cross-server articles right.
> 
> The next step to getting cross-server harvesting right is, IMO, to get
> the tables defined right and to move over to SQLite.  I'd be interested
> in any feedback/discussion/action on this on pan-devel.
> 
> cheers,
> Charles
> 
> 
> _______________________________________________
> Pan-users mailing list
> address@hidden
> http://mail.freesoftware.fsf.org/mailman/listinfo/pan-users
> 







reply via email to

[Prev in Thread] Current Thread [Next in Thread]