pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] pan for Windows crashes when reading large newsgroup


From: Duncan
Subject: Re: [Pan-users] pan for Windows crashes when reading large newsgroup
Date: Sat, 20 Oct 2012 02:43:55 +0000 (UTC)
User-agent: Pan/0.140 (Chocolate Salty Balls; GIT f91bd24 /usr/src/portage/src/egit-src/pan2)

K Shen posted on Sat, 20 Oct 2012 01:49:51 +0100 as excerpted:

(Please don't top-post.  There's a reason pan has that warning, too.  But 
edit or summarize particularly my replies as I am quite wordy, and people 
will otherwise be paging and paging to see your reply.  I did notice it's 
plain text this time.  Thanks.)

> I was also trying to see if there might be other reasons for the recent
> increase in memory usage; I guess the only reason is that the traffic on
> this newsgroup have increased recently.

Most likely...

> Duncan, thanks for the details on the reason why pan is using so much
> memory. I wonder if it is possible to reduce the number of headers kept
> in memory in the ways I suggested, i.e.
> 
> 1) Do not read in all headers, but only those that are specified via
> filters (e.g. to certain authors/titles). I am already filtering the
> headers I do see already, as there is no way I can deal with all the
> millions of headers, so I only ever see a very small subset of these
> anyway. I know that if a header is not read in, I will not be able to
> see the associated article, but in order to reduce the memory usage, it
> seems the first thing to give up is these filtered headers that I would
> most likely see even if they are read in anyway. Can this be done, or is
> there some reason that all headers have to be read in?

The thing here is that pan does all header processing in memory.  It 
can't filter until it has fetched and processed the headers (aka 
overviews), and for that, they must be in memory.

One possibility, tho.  What do you do with messages you're done with, 
either have already read/downloaded, or are scoring and setting to no-
visiblity?  Are you deleting them, or simply letting them expire, and if 
the latter, what's your expiration set for and could you lower it?

I ask because a deleted/expired article will require many less resources, 
basically, just the server/group sequence number of that post.  HOWEVER, 
if you delete, then visit a different group with the same messages 
crossposted, those messages will show up again as unread (or at least 
they used to, I think they still do, I've not done binaries in years tho, 
and I keep my text group posts unexpired).  So you might want to at least 
be sure you keep them long enough to deal with the crossposts, if you 
subscribe to multiple groups that commonly have articles crossposted 
between them.

Where cross-posting isn't an issue, provided you have a new enough pan 
(AFAIK not a problem with SD's prebuild MS-platform pan binaries unless 
you haven't updated in some months), take a look at preferences, actions 
tab.  There you should be able to set automated delete (and/or mark-read) 
via score level.  Presumably you'd delete ignored, and depending on how 
your scoring is setup, possibly low/negative and maybe even normal/0 
scored posts.

Similarly, be sure you're deleting headers (not just letting them expire) 
when you're done with them.

To be specific, I don't believe that changes incoming memory requirements 
much, as to score and delete the information must be processed first, but 
pan reads ALL existing headers into memory, threads them, then grabs new 
ones and plugs them in where they belong in the existing hierarchy.  By 
deleting messages ASAP instead of letting them simply expire, you cut 
down on that existing articles memory baseline, which should both 
decrease memory usage AND speed up sorting, etc.

FWIW, pan startup speed is also dramatically affected by retained 
messages, tho in that case I'm not sure whether it's the messages 
themselves, or the headers, since I've been keeping both (only text-
groups as I said), unexpiring, for years, now.

Of course shortening expiration if you can helps too, but if you can 
delete as soon as you're done with them and not wait for expiration, that 
should help even more.


One other possibly helpful thing, here, tho I'm not sure how well it 
works since I last did binaries before this feature was available.  I 
believe pan bypasses much/all of the header handling entirely, if you 
have a good source of nzb files and use them exclusively, never actually 
downloading headers.  But as I said, my binaries experience is from 
before that feature, so I'm not sure if that's a correct belief or not.  
But if you have a good nzb source available, try it and see.

> 2) Read in N day's worth of headers, but not necessarily the last N
> days, i.e. read in older headers, starting from an earlier date than
> today, e.g. 10 days of headers starting 10 days ago (i.e. headers from
> 10-20 days ago).

Pan doesn't implement any UI for doing this.  Basically, what you'd have 
to do is fetch say 10 days, then 20 days, then 30... and I don't remember 
whether deleted headers are re-fetched in that scenario or not.  For sure 
the articles would need to be marked read before delete to prevent it, 
but I don't know if marking-read and then deleting works, or not.

But if you have to keep the last 10 days to avoid re-fetching when you 
get 20 days... it rather defeats the purpose, as the memory usage will be 
about the same anyway.

> These seem to be simpler to implement than changing the way pan handles
> headers (using database as you suggested). I am not familiar with
> programming newsreaders, so perhaps I missed something?

True.

Do experiment with the auto-delete (with mark-read also) action, along 
with delete as soon as you can after downloading, etc, above, tho.  If 
you've been letting them expire, you may find doing the deletes a 
sufficient solution for now.

> I am using pan for Windows because pan is the only newsreader I found
> that could handle newsgroup with large number of articles, I have an
> Intel Mac as well as the Windows 7 laptop that I can read newsgroups.
> Actually I would prefer to read news on my Mac (which has 16G of real
> memory), but I haven't found a newsreader that can read in more than
> 100K of headers. Is there any alternatives to pan that can handle huge
> number of headers? I am actually trying to see if I can run pan on my
> Mac now, but I can't find any pre-built binaries, so I am trying to
> compile it now, but this seems non-trivial....

Sci-fi's the local OSX authority.  Take a look back at his posts.  (You 
can access them via pan as news, at news.gmane.org if you like.  That's 
what I do for this list, as a newsgroup on gmane.  You may want to take a 
look around the web site, especially the FAQ, too, to see what gmane's 
all about, especially before trying to post via gmane.  http://gmane.org )

You may be able to pick up a pre-built OSX binary from him.  You'll 
certainly be able to pick up a lot of pointers to building and perhaps OSX 
specific bugs.  I do know there's a couple Mac binary repos for at least 
the dependencies, mac-ports is one IIRC, I forgot the other, but if you 
pickup the pan binary from sci-fi, you'll need to use whichever repo he 
uses.

As for other news binary harvesting apps for MS, I'm *definitely* not 
current, but there used to be a MS/Linux both app called BNR2, that was 
said to work extremely well for binary harvesting, back before pan did 
anything near as well as it does now.  The caveat was that it would 
sometimes corrupt its database, but BNR veterans at least had no problem 
working around that, knowing just which db file to delete, and how to get 
right back to work with hardly any disruption at all.

There's certainly others, but all my first-hand MS info is from the turn 
of the century.  A bit of googling should help, if noone here pops in 
with that sort of info.

Travis is one of the local MS pan users, but he's not as technically 
inclined I think (he just uses SD's binaries), and I've no idea whether 
he does binaries or what he may know about binary harvesting alternatives 
on MS.

Another alternative would be a LiveDVD/thumb-drive, ubuntu or whatever, 
or assuming room, install it and multi-boot.  Pan's unlikely to be on the 
DVD, but you should be able to procure the appropriate packages and store 
them elsewhere for quick local reinstall, if necessary.

Yet another totally different alternative would be doing something like 
leafnode as a local news server, configuring it for only the groups 
you're interested in.  It'd pull from your existing provider and cache 
locally, and you could run pan connected via loopback (the server would 
run on 127.0.0.1 or whatever).  That way you could scale back 
dramatically on the headers you have pan managing, say expiring in 24-48 
hours.  But I really don't know if leafnode runs on MS or not, and 
depending on the level of server filtering available (whatever server you 
run), this is likely a pretty space intensive option, even for a few 
groups, if they're busy groups as this thread is discussing.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]