pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Segfault at exit


From: Jim Henderson
Subject: Re: [Pan-users] Segfault at exit
Date: Sat, 24 Dec 2016 17:56:17 +0000 (UTC)
User-agent: Pan/0.141 (Tarzan's Death; GIT 194f2dc git.gnome.org/pan2)

On Sat, 24 Dec 2016 07:49:09 +0000, Duncan wrote:

> While I don't see your segfault (which does support your theory that
> it's a problem with your config), pan does have a workaround for crashes
> preventing read-message, etc, writeouts, that I've been using for years
> now.  IIRC I reinforced the habit (which I /think/ I had even before
> that, but that reinforced it) back when I was having problems not with
> pan itself, but with xorg and kwin, back when the composite extension
> (and kde/kwin's use thereof) was new and had a leak that would regularly
> crash xorg... of course taking pan down along with it, without a chance
> to write out its current state, of course.  Yes, it was /that/ long ago.
> Anyway, by doing this, I kept the lost data to some level I considered
> reasonably manageable.
> 
> The key here is to realize that pan writes out the per-newsgroup data
> when it switches groups, so in busy groups with enough unread messages
> that I didn't want to risk losing at least semi-current read-messages
> tracking, I developed the habit of deliberately clicking to some other
> group and back every N messages or so.  On busy mostly binary groups
> with thousands of unread messages, N would be 500 messages or so, a
> couple times per thousand messages; on more technical groups where I
> went slower, N might be 50 or 100 messages.
> 
> That seemed to address the problem rather nicely, even in the above case
> where it wasn't pan, but X, taking pan with it, that was crashing.  I
> keep up with the habit today, tho I'm probably less religious about it
> than I was when X was regularly crashing, so one might say pan has
> trained me well. =:^)

Yeah, I'd observed the writing out using strace as I used it.  The last 
thing it does before the segfault, according to strace, is ... well, 
here's the strace output:

--- snip ---

eventfd2(0, O_NONBLOCK|O_CLOEXEC)       = 28
write(28, "\1\0\0\0\0\0\0\0", 8)        = 8
write(8, "\1\0\0\0\0\0\0\0", 8)         = 8
futex(0x17f2bd0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x13fc8d0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x17e90b8, FUTEX_WAKE_PRIVATE, 1) = 1
poll([{fd=28, events=POLLIN}], 1, 25000) = 1 ([{fd=28, revents=POLLIN}])
poll([{fd=28, events=POLLIN}], 1, 25000) = 1 ([{fd=28, revents=POLLIN}])
read(28, "\1\0\0\0\0\0\0\0", 16)        = 8
poll([{fd=28, events=POLLIN}], 1, 25000) = 1 ([{fd=28, revents=POLLIN}])
read(28, "\1\0\0\0\0\0\0\0", 16)        = 8
write(28, "\1\0\0\0\0\0\0\0", 8)        = 8
futex(0x23bdc00, FUTEX_WAKE_PRIVATE, 2147483647) = 0
close(28)                               = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
+++ killed by SIGSEGV +++

--- snip ---

Before that, it was deleting cache files.  But as you can see here, the 
eventfd2() file handle closes successfully.

The interesting thing is that even switching groups doesn't get the 
message counters updated properly.  But as I think about what I do in my 
setup, I wonder if fuse might be a factor here.

I may need to test that.

See, what I do with my pan installation is store the config files in an 
encfs container.  I mount the containers (one for .pan2 and one for News) 
prior to launching pan, and unmount them after pan exits.  I had toyed 
around with adding a delay to the 'fusermount -u' commands, but that 
didn't make a difference on the segfault (which makes sense, since pan 
has to exit before the volumes are umounted).

(Mostly as a point of interest - I do this because I have secure access 
to a couple NNTP servers that hold sensitive information, and I sync the 
config between multiple systems using Dropbox - but I don't want the 
passwords stored in a format that can be read by anyone who happens to 
hack my Dropbox account for some reason.)

But maybe I need to sync before umounting the encfs containers, if that's 
corrupting data in some way that's causing the segfault.

>> I built a debug build and did a backtrace in gdb, and it points to
>> pan.cc line 1140.  That seems to tie to the process of freeing up
>> secured passwords in memory, so I thought it might be something in
>> servers.xml, but I don't see anything obvious in that file that's a
>> problem (other than perhaps a missing CR/LF at the end of the file, but
>> I tried adding that and the behavior didn't change).
>> 
>> Any ideas on where I should start?
> 
> First thing I'd do is isolate whether the problem only triggers when you
> connect to the server, not if you're local-only.  Either set pan as
> offline (if that setting sticks across pan restarts, I'm not sure
> whether it does), or if necessary, toggle the get new headers on startup
> and when entering group options (in preferences, behavior, groups) to
> OFF, so you can start pan and switch groups without pan fetching
> headers, then restart pan and browse some already locally cached headers
> and messages without doing anything that actually triggers pan to
> connect to the server, and see if the problem still occurs.

That's a good idea - I hadn't thought about that as a way of isolating 
online vs. offline behaviour.  I don't think that setting is persistent 
across restarts, as it's not a preference, but I can check that easily 
enough.  I do have it configured to clear cache (to reduce data storage 
needs in Dropbox).

> That alone should help confirm whether it's password and/or server
> related.
> 
> Then, if it's still happening when pan isn't network connecting, it
> makes this next step easier.
> 
> Use the old bisect method on pan's data dir, first ensuring that the
> problem disappears with a clean config, then try bisecting the problem
> down to a single file, testing a theoretical half of the problem space
> at a time.  Of course this is dramatically easier if the test set of
> files remain static, thus the reason to test without network activity
> and downloads going on, if possible.
> 
> I'd probably test right away with a clean article cache, to see if it's
> that, and particularly if the problem only happens with a server
> connection, I'd test right away with a clean ssl_certs dir and let pan
> redownload the certs, of course comparing them to the previous certs
> manually.
> 
> You've long since updated pan and the certs from back when pan was
> writing corrupt binary files instead of the ascii-based cert files it
> should have been writing and writes in current versions, right?

None of the servers I use use SSL (which is silly, given the nature of 
some of the data hosted on them), so that won't be an issue, but 
bisecting the issue is easy enough - I can break out the multiple newsrc 
files from other configs and see what happens.  My guess is that it's a 
newsrc file that's having a problem (I've historically had problems with 
message counters getting corrupted, though that hasn't happened in a 
while now).

> The groups dir and newsrc files are also suspect since it's writing them
> out that's failing.
> 
> And IIRC I had a problem with a corrupt tasks.nzb at one point, tho that
> should be regularly updated, so I wouldn't expect it to be the problem
> in this case as it has been an ongoing problem for you for some time,
> and that was a more urgent "pan won't work at all" problem for me, when
> it got corrupted.

Also good to know.  I don't tend to use nzb files, but if that's a 
standard behaviour, then that could well be something to look at.

Thanks for the ideas!

Jim

-- 
 Jim Henderson
 Please keep on-topic replies on the list so everyone benefits




reply via email to

[Prev in Thread] Current Thread [Next in Thread]