[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] pan for Windows crashes when reading large newsgroup
From: |
Duncan |
Subject: |
Re: [Pan-users] pan for Windows crashes when reading large newsgroup |
Date: |
Fri, 19 Oct 2012 14:26:56 +0000 (UTC) |
User-agent: |
Pan/0.140 (Chocolate Salty Balls; GIT f91bd24 /usr/src/portage/src/egit-src/pan2) |
K Shen posted on Fri, 19 Oct 2012 07:08:57 +0100 as excerpted:
> Hi,
Hello. =:^)
Before we get into the message, let me remind you to please turn off the
HTML. Being a pan user you probably already know how annoying it can be,
seeing that in pan... which many here use for this list, via gmane.org's
list2news service.
> I am using pan newsreader for Windows to read news for several years
> now, but in the past month or so, I have started to see regular crashes
> of pan when reading a newgroup with a large number of articles.[...]
> without such problems previously [...] traffic [may] have increased
> [...]
>
> fault module name is libcairo-2.dll. After a few crashes, I have
> noticed that the crash happens when the memory used by pan.exe is
> around 1,800,000KB. [...]
>
> I have just had another crash, while reading in the headers. [T]he
> Commit memory for pan.exe was 1,896,692KB.
>
> I have been using a 32 bit x86 Windows XP laptop with 2G of real memory
> up to 3-4 months ago, which was replaced by a 64 bit x86-64 Windows 7
> laptop with 4G of real memory. This was about 1-2 months before I
> noticed the crash problem, and I don't know if this new configuration is
> important for the crashes (I have not seen the crashes on the old
> laptop).
>
> Does anyone know if the crash is caused by the amount of memory
> used/number of headers? Is there any known reason why the crash seem to
> happen when the memory used by the process is around 1.8-1.9G?
Short version: You're very likely running into the infamous 32-bit memory
limits that are the reason the computing world is moving to 64-bit.
Rather longer version: In general, the single-byte-addressable flat-
address-space limit of a 32-bit system is 4 GB. However, this is the
total of the "virtual" address space, which must be split between several
different uses, primarily between user-space and kernel-space, with the
most common split being 2:2 user/kernel, two gigs each, user low, kernel
high.[1]
AFAIK, many MS 32-bit consumer/home/pro kernels have a hard 2G/2G split
(tho the server editions generally use PAE[2] mode as Linux does, and
thus have a far higher limit, even for 32-bit). That's also the default
split on Linux 32-bit kernels, but of course it's source available and
can be rebuilt using one of the other available options. These include a
3G/1G user/kernel split option, a 4G/4G option that actually dedicates a
separate 4-gigs to each and switches between them every time it switches
user-mode/kernel-mode (lower efficiency, but if your 32-bit app needs >3
gigs...), and the 64-gig max PAE mode[2], also less efficient due to the
additional layer of indirection it uses.
Switching to a 64-bit kernel does allow the /kernel/ to natively access
memory above the 4 GB barrier, but if you're running 32-bit apps, they're
still limited to their old sub-4-gig, and possibly sub-2-gig, size. I
don't know enough about Windows to know how it manages user-space
limits. In theory, I /believe/ 32-bit apps should have access to a full
4 gigs of virtual userspace (they do on Linux when running on a 64-bit
kernel), but it's very likely that there remains either an MS kernel
enforced 2-gig barrier, or the default compiler options used when
building an app make that assumption, maybe both.
Getting back to pan, on large groups with many millions of headers, pan
does unfortunately use gigs of memory, because at present, it builds a
tree of all that header and threading data in memory. This is actually
rather better than it used to do... I remember when pan would run into
trouble at 100k-200k headers! One of the things done to help manage
memory usage since then, is that now it does string-combining for
repeated strings such as author and subject, keeping only one copy of an
author name string in memory and reducing the others to references to the
first, for instance, and keeping only one copy of the subject line for
multi-part posts, which it auto-combines and displays as a single entry.
For many years (since well before Charles left), there has been talk of
switching to a database backend of some type, perhaps sqlite-based, to
track all this data, so only a relatively small bit of it would need to
be in memory at once. However, Charles left as lead dev before it was
ever implemented. I suspect he wasn't familiar with coding for databases
and they're notoriously hard to get correct for the unexperienced, with
crash and data-loss bugs being extremely common, so he was hesitant.
Then pan was basically abandoned code for a couple years, then adopted by
someone who could maintain it but didn't have the time to really add new
features, and only recently (a year or so ago) has Heinrich Mueller come
along, with all the new features he has implemented at such a furious
pace!
And he's working on the disk-backed database backend, but as I said,
databases are notoriously HARD to get right the first time, so even when
he does have something out to test, it's quite likely it'll be some time
before that code is actually reasonably stable.
Meanwhile, you appear to still be running a 32-bit pan on your 64-bit MS
kernel, once pan hits 1.8-1.9 gigs, along with various other overhead
that pushes it over the 2-gig barrier into what would often be kernel
space on a 32-bit system and is apparently still reserved as kernel space
unavailable to your 32-bit pan, on your now 64-bit system.
Actually, for the biggest groups on servers with a high retention
(giganews is known for this, some of the others have the problem too on
they heaviest traffic groups), even an 64-bit 8-gig system can run into
problems trying to get and process ALL headers. Someone calculated what
it would take to handle them and posted the results at one point, and
IIRC, it was something over 16 gigs, 17-ish, I think. FWIW I have 16 gig
now (tho I haven't done binary groups in years), so it'd push even my
system into swap some.
So you're kind of between a rock and a hard place. Until Heinrich comes
out with that database backend I've seen him mention a few times, your
options include switching to a 64-bit pan, continuing with the N-days
header thing, or trying something else that HAS implemented a database
backend. It's /possible/ there's some options you can tweek to let you
access a full 4 gigs with a 32-bit pan, but that's ultimately likely to
run into the same issues as well. You REALLY either the still being
coded pan database backend (Heinrich would have to tell you its status,
he could be barely started, or just about ready to pop the announcement,
I simply don't know), or a 64-bit pan and likely 8 or 16 gigs RAM, or to
find another news harvesting alternative other than pan that already has
such a database backend. It's really that simple.
Of course since you didn't post the server and group name (not that I
blame you, the group name can be... rather private info to be posting),
it's also possible that it's not that big after all, and that you're
running into some other problem. However, pan /is/ known to have this
problem especially on 32-bit, and that close to the 2-gig barrier on a
group you did say was heavy traffic, chances are it really /is/ the
memory barrier you're hitting. Unfortunately...
---
[1] It's not relevant here but complicating matters further is the fact
that the top of the 32-bit space, often the half-gig to gig, with high-
graphics-memory machines it can near two-gigs, is reserved for legacy 32-
bit PCI device hardware I/O address usage, even on 64-bit machines. For
machines with 3+ gigs of physical RAM, this presents a problem as the PCI
hardware I/O area masks any physical memory located at these reserved
addresses. The solution is to remap this otherwise hidden physical
memory above the 4GB barrier, but for a number of years many BIOSes
didn't come with this option, and people with these machines who upgraded
to 4 GB simply wasted between a quarter gig and a full gig of RAM, as it
was hidden behind the PCI hardware IO area and thus unusable. That's
also why say an 8-gig physical-ram machine will often count up to 9 or 10
gigs in its POST (power-on-self-test) -- it's remapping up to two gigs up
above the PCI hardware IO memory hole.
http://en.wikipedia.org/wiki/3_GB_barrier
[2] PAE, Physical Address Extension:
http://en.wikipedia.org/wiki/Physical_Address_Extension
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
Re: [Pan-users] pan for Windows crashes when reading large newsgroup,
Duncan <=