pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Re: Forcing an expire?


From: Ron Johnson
Subject: Re: [Pan-users] Re: Forcing an expire?
Date: Thu, 09 Jul 2009 08:24:36 -0500
User-agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.22) Gecko/20090701 Thunderbird/2.0.0.22 Mnenhy/0.7.6.666

On 2009-07-09 03:59, Duncan wrote:
Ron Johnson <address@hidden> posted
address@hidden, excerpted below, on  Thu, 09 Jul 2009 01:26:03
-0500:

Of course, that assumes it's not a simple permissions issue...
Nope.

$ dir alt.binaries.dvdr*
-rw------- 1 me me 3658599777 2009-07-02 03:37:45 alt.binaries.dvdr

Just that isolated file doesn't help a lot... Is that supposed to be the <PanDataDir>/groups/ file for that group? I'm guessing so, but it would help if you'd have indicated that some way. At first, I thought maybe it was your filesave dir for that group, but then I decided a 3.6 gig dir size didn't make sense, and (after writing the below about the groups subdir) decided it had to be the groups header file.

$ dir .pan2
total 557968
drwxr-xr-x   4 me me      4096 2009-07-09 07:52:31 ./
drwxr-xr-x 204 me me     36864 2009-07-09 07:50:02 ../
-rw-------   1 me me      6797 2009-07-09 01:11:47 accels.txt
drwxr-xr-x   2 me me     36864 2009-07-09 07:52:31 article-cache/
-rw-------   1 me me      2181 2009-07-09 01:12:01 group-preferences.xml
drwxr-xr-x   2 me me      4096 2009-07-09 00:59:06 groups/
-rw-------   1 me me   4992242 2009-07-01 18:02:53 newsgroups.dsc
-rw-------   1 me me      1419 2009-07-09 01:12:00 newsgroups.xov
-rw-------   1 me me    188829 2009-07-01 18:02:53 newsgroups.ynm
-rw-------   1 me me  18980141 2009-07-09 01:12:01 newsrc-1
-rw-------   1 me me        89 2009-07-09 01:12:01 posting.xml
-rw-------   1 me me      4914 2009-07-09 01:12:01 preferences.xml
-rw-------   1 me me       239 2009-07-02 20:35:24 Score
-rw-------   1 me me       406 2009-07-09 01:12:57 servers.xml
-rw-------   1 me me 299701100 2009-07-09 07:52:26 tasks.nzb
-rw-r--r--   1 me me 246784000 2009-07-09 07:52:49 tasks.nzb.tmp


$ dir
[snip]
drwxr-xr-x   2 me me      4096 2009-07-09 00:59:06 groups/

But what about the files in that directory? Or is that what the top dir was supposed to be, the perms for that specific file? If so, you didn't indicate it.

Sorry.

$ pwd
/home/me/.pan2/groups

Oh... I think I might have just spotted your issue!

That alt.binaries.dvdr group file, assuming that's what it is, 3.6 gigs, single file, right?

Yup. And tasks.nzb started out as 360+MB, which with a fast pipe to giganews was getting rewritten once a minute.

Are you on x86_32 or x86_64 (assuming Linux, maybe I should ask about that too),

64 bit kernel with 32-bit userland.

           how much memory do you have,

8GB RAM

and if you are 32-bit, what are your kernel bigmem options? Also, what filesystem is that on?

ext3.

Obviously, I'm wondering if you're running into resource limits somewhere, either memory (trying to load a 3.6 gig file into memory), or possibly, the filesystem filesize limit, altho that would normally be a more rounded number, 2 gigs, 4 gigs, or the like.

I ran pan thru strace, and did see it using the 64-bit versions of file functions.

If that's the size of the header data you're dealing with, which seems likely, OUCH! No WONDER you're complaining about performance! I'd try to keep it under, say, 2 gigs. If that means deleting posts when you're done with them, every day, that's what you'll have to do. However, I think/hope every week or twice a week will be sufficient.

But meanwhile, you need to figure out how to reduce the filesize.

Well... that's what I'm *trying* to do...  ;)

One thing you could do would be simply delete it. I'd move it elsewhere for testing first, so you can move it back and try something else if pan goes crazy, but I think pan should be able to reconstruct it if it's loaded and doesn't find that file, by re-downloading headers, without losing your read message tracking, since that's stored in the newsrc files. You can tell pan to redownload a single day's worth, change groups so it writes, and see how big that is, before deciding how many more days to download at once.

But (for now) I *really* want that old data.

FWIW, I'd be interested in seeing the size for one day's worth, or a week's worth if it'll take it (since that will eliminate daily fluctuations a bit better), just to know how big the problem we're dealing with actually is.

It seems to me that's more likely to cause huge issues than the comparatively small 1/3 gig tasks.nzb file that you've been complaining about, particularly since the kernel should be caching the file and it will normally be updating fast enough that the writes won't all get to disk.

????

I see tasks.nzb getting rewritten almost on a continuous basis. The issue is much more noticeable now that I'm on giganews and my nntp bandwidth has sextupled.

But a 3.6 gig single file, ESPECIALLY on 32-bit, is going to cause **HUGE** issues if pan's trying to work with the whole thing in memory at once, as I suspect it is. If you're 64-bit and are working with a decent filesystem, the issues won't be as bad, but it's still a huge amount of data to be trivially shifting around!

Which was the genesis of my "Better processing..." thread back on July-02.

Since then, I've discovered the 5-Minute Rule: anything you'll need within the next 5 minutes should be kept in RAM. Everything else should be on disk.

http://en.wikipedia.org/wiki/Five-minute_rule

I guess it's time to post this and see if I'm right, and perhaps get some idea of just the size of daily or weekly data we're dealing with.

--
Scooty Puff, Sr
The Doom-Bringer




reply via email to

[Prev in Thread] Current Thread [Next in Thread]