pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] A bug only a nerd could love (Duncan? :;)


From: Duncan
Subject: Re: [Pan-users] A bug only a nerd could love (Duncan? :;)
Date: Sat, 16 Mar 2013 09:06:58 +0000 (UTC)
User-agent: Pan/0.140 (Chocolate Salty Balls; GIT 34d5f94 /usr/src/portage/src/egit-src/pan2)

walt posted on Fri, 15 Mar 2013 19:35:02 -0700 as excerpted:

> Well (as usual for a dedicated nerd) I've changed so many things at the
> same time that I can't tell which change may (if there is a bug) have
> caused this behavior:

LOL.  I'm fighting one of those ATM as well.  (FWIW totally unrelated to 
pan; apparently either kde 4.10.[0,1] or the live-git kernel 3.9-pre I'm 
running has a resource leak of some sort and at some point will no longer 
start any new processes, altho current processes continue to run fine -- 
it's not main memory as that continues to be normal, and I'm on 64-bit, 
so the memory zones that could be the problem on 32-bit shouldn't be an 
issue.  It /might/ only trigger after a suspend (to ram) and resume, but 
I'm not sure... and it takes a day or two to trigger, so while I haven't 
seen it the last couple days I'm not sure whether that's because I've 
been rebooting into new kernels every day to try to see if it's fixed, or 
just because I've not been giving it time to trigger...)

> I'm running the latest git [d7bd6aa1] as usual.  I was fetching a
> gazillion headers from my primary news server when I decided to test
> pan's recent header-compression feature.

... Which has certainly had its share of "development issues", but seems 
to finally be working fine for me the last couple weeks...

> I opened the 'edit servers' dialog and selected the 'XZVER' option (I
> already know this particular server supports XZVER) and clicked 'okay'.
> 
> Well, pan's entire gui interface locked up instantly, including the
> 'edit servers' dialog box, and the network traffic halted at the same
> instant.
> 
> However, I saw that the pan process was still consuming 100% of (one)
> CPU even while the pan gui remained completely unresponsive.
> 
> Well, I thought, strace should give me some info, so I sicked strace on
> the pan process -- and strace gave me absolutely nothing even while the
> pan process was consuming 100% of (one) CPU.
> 
> WTF?

[The following may well be well known info for you, but for the benefit 
of others for whom it isn't, and for me too, since the process of 
explaining it solidifies my own grasp of the concept... =:^) ]

The thing to remember with strace is how it works... by inserting itself 
between the normal application and the kernel in ordered to trace system 
calls (the reason for that "s" in strace... "system" here generally 
referring to kernel).  If the currently executing logic makes no such 
system calls, either because it's in a tight userspace-only loop for 
because the currently executing logic simply doesn't make any such calls, 
there's no "s" to strace!

While that might seem perfectly obvious to a coder who knows all about 
the services provided by the system and when and how they're called, to a 
normal user, or even a relatively advanced gentoo sysadmin user such as 
myself, wrapping one's head around that does take a bit, in part because 
we're so used to seeing the thousands of system calls going by that a 
normal app typically invokes, typically fast enough that the process of 
printing them out itself is the bottleneck in an straced process, that 
not being a coder particularly familiar with the process, it's easy 
enough to fall into a trap of thinking what we're seeing is the activity 
of the whole app, NOT just the system calls that are the only thing we're 
ACTUALLY seeing reported.

Sometimes, just for perspective, it's interesting to strace only open 
calls (-feopen is commonly used here) of a typical desktop process.  
Seeing just how many library, font, icon, config... files a typical X-
based process attempts to open, and how fast it actually happens, is mind-
blowing in itself.  And that's just the tip of the system-call iceberg, 
which itself is just the tip of the iceberg of all the app is doing, 
which given a modern multi-tasking system, is just the tip of the iceberg 
of all the system is doing!   It really gives a person some perspective 
on how fast a system really operates these days... sort of like looking 
up at the night sky in a dark area and realizing that the vastness one 
sees is only a hint... after seeing some of the images produced by 
Hubble, etc...  We humans are just the dust mite on the speck of dust 
that is the earth in the solar system, itself a speck of just in a 
galaxy, itself a speck of dust...  while computer processes are arguably 
a few recursions short, in its own way that vastness of scale is 
similarly mindblowing to think about.

But what's REALLY mindblowing is to realize that never-the-less, 
individual humans still actually program all those apps, and it's both 
possible and in some contexts routine to reverse engineer the machine 
code back into assembler, and step thru the functionality at an 
individual machine instruction level, instruction by instruction.


But back to present contextual reality...

Taking that high level theory back down to our particular case, however, 
the lack of such system calls in our pan instance while it's in theory 
downloading a bunch of headers is still both alarming and an important 
clue as to the problem, since typically during the header download 
there'd at minimum be the usual network access calls as well as memory 
allocation activity going on (and probably more), so an entire lack of 
strace activity really *DOES* indicate a serious problem, in the form of 
a userspace-only loop that's tight enough it's not making any system 
calls at all!

> Next I sicked gdb on the pan process and found that pan was evidently
> stuck in some infinite loop involving the glib sockets and istream libs,
> but I lack the expertise to take this any further.

... But that loop, while involving glib sockets and istream libs, is all 
userspace... no "s" calls to strace!

> As a postscript, I examined my ~/.pan2/servers.xml and found that pan
> had saved my 'XZVER' choice correctly before freezing up.
> 
> I'm thinking (maybe?) that pan needs to open a new socket for the
> compressed- header istream instead of trying to read from the old
> (uncompressed) socket connection?

That has been my experience as well, altho I didn't try while it was 
actually DOWNLOADING headers.  In my case, some weeks ago when I was 
working on getting back to binaries and was taking it a step at a time, 
uncompressed plain-text connection working -> compressed header plain-
text connection working -> ssl-encrypted connection working, I noted (I 
think in the ssl context at that point) that pan continued using the 
existing open connections as it had; and setting pan offline and back 
online didn't fully break existing idle connections, so pan stayed in 
clear-text until I actually quit and restarted it.

So that would presently be a bug, less serious in my case as pan didn't 
lockup because I didn't attempt the change while it was actually 
DOWNLOADING something, but still a problem.

The ultimate fix, therefore, would be to have pan specifically terminate 
existing connections and renegotiate them, whenever either the ssl/
cleartext or compressed/uncompressed headers options are toggled.  
(Obviously, if pan's doing full message download at the time, not header 
download, compressed-headers should in theory be togglable without 
terminating the existing connections.  However, it may be simpler to 
simply terminate and restart all connections whenever such a setting 
changes, regardless.)

Meanwhile, easy workaround for this bug!  As the saying goes, if it hurts 
when you bang your head against the wall, QUIT DOING THAT! =:^)  Pan may 
not be able to handle it automatically just yet, but that doesn't mean 
you have to change the settings when pan's active.  Ensure it's idle, 
change the settings, restart pan to be sure, THEN use the new settings. 
=:^)

> This is all beyond my ken and I'm off to bed now...

Yeah, when there's an active bug to trace I can't sleep properly either.  
Finally getting to sleep properly after tracing it down as far as I know 
how... is nice!  The sleep of someone who worked hard to attain a goal, 
finally attained it, and can now sleep in peace without it nagging at him 
any more! =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]