pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pan-users] Re: freeze, high CPU getting new headers


From: Duncan
Subject: [Pan-users] Re: freeze, high CPU getting new headers
Date: Tue, 5 Aug 2008 05:46:32 +0000 (UTC)
User-agent: Pan/0.133 (House of Butterflies)

walt <address@hidden> posted
address@hidden, excerpted below, on  Tue, 05 Aug 2008 02:43:59
+0000:

> Ah!  What you want is a highly-threaded newsreader.  Pan2 isn't there
> yet, and don't hold your breath.  I believe I read a post from Duncan
> saying that legacy pan is multi-threaded, but I'm not certain.  Duncan?

Yes, legacy-pan was fully multi-threaded.  New-pan's main loop and tasks 
are single threaded, because Charles decided the multithreading was too 
hard to debug.  However, it splits off threads for selected tasks.  For 
example, setting up connections should be multithreaded IIRC -- assuming 
it has the tasks to use them, it spins off a thread for each connection 
setup, then feeds the negotiated connection back to the main processing 
thread.  Similarly, the combiner/decoder splits off threads so it doesn't 
interfere with the downloading.  There may be other areas as well, but 
otherwise, (new-)pan is single-threaded.  And if you note... the threaded 
jobs are all basically timing independent, so there should be few if any 
threading related race conditions or possibilities.  THAT's the big 
difference from legacy-pan, and the reason Charles set upnew-pan's 
threading (or lack thereof) the way he did.

> Meanwhile, there is a bug *somewhere* that needs fixing.  But where?

GL's suggestion, I think, was that it was just taking the time to sort 
all those headers, and there wasn't a lot that could be done about it.  I 
don't believe that's the case, because it's not behaving that way for 
everyone, and where it is, it's fairly recent -- it was working fine some 
months ago.

That implies it's either some deoptimization somewhere, or a bug where 
pan gets stuck in a loop processing the same header for awhile, but 
eventually either gives up and moves on, or gets it right.

I'm guessing it's a deoptimization in something.  But... in what?

As I pointed out, the gtk_tree_model_iter_next proc has showed up near 
the top of all three traces so far.  Now, that /could/ be simply 
processing different posts, but I find it curious that it was in all 
three, as if it's spending an inordinate amount of time there.  That's my 
first guess for a deoptimization.  But those traces are indeed kind of 
like shooting blind.  It /looks/ like it may be that proc, but maybe not, 
too.

The strace thing will show kernel calls, file-opens and the like.  If 
it's looping on the same data, that might show repeated accesses to the 
same calls.  Depending on what they are, it might again be normal.  Or 
not.  But it should provide a more complete picture than we have now.

After that, assuming it doesn't make the problem plain. I'd suggest the 
debug backtraces again, only take several in short (30 seconds runtime 
apart) succession.  If they show similar patterns to the ones we have 
already, then find the bits similar to this:

#16 0x0000000000491bb7 in PanTreeStore::insert_sorted (this=0x2ccbad0, 
address@hidden) at pan-tree.cc:828

and compare the values for "this" and "new_parent".  If they are 
changing, then pan is simply processing a bunch of data, maybe slow, but 
it's working thru it.  If it's still processing the same post ("this"), 
trying to attach it under the same parent, 30 seconds later, THEN WE 
FOUND A PROBLEM.  We won't yet know for sure if it's looping on the same 
one (unless the rest of the trace comparison shows it verified looping) 
or if we have a serious deoptimization, but any decent 64-bit machine 
anyway should be fast enough that it shouldn't be working on threading 
the same post for 30 seconds.  If it is, and it's working on it still 30 
seconds later than that, we have it on the same one for a minute, which 
will point to an even worse bottleneck.

Here's a couple more with values that could be traced:
#20 0x00000000004f05a9 in pan::DataImpl::MyTree::apply_filter 
(this=0x292e330, address@hidden) at my-tree.cc:235
#21 0x00000000004f0d9a in pan::DataImpl::MyTree::add_articles 
(this=0x292e330, address@hidden) at my-tree.cc:351

If it's showing different ones 30 seconds apart, try 10 seconds apart.  
Really, it shouldn't be taking 10 seconds, either, unless you're deep 
into swap or something and it's thrashing disk to swap.  Disk is of 
course much slower than memory, so if it's thrashing disk, that's going 
to account for the slowdown.  In that case, we're looking in the wrong 
place ATM, but the reports haven't sounded to me like disk is being hit 
that hard during all this, so I've assumed that's not it.  < 10 seconds, 
it likely depends on your system, CPU speed, memory, etc.

Of course, the above assumes you know how to have gdb suspend the run 
(the interrupt does that) to do the bt, then resume it, then suspend it 
again 30 seconds later, keeping both backtraces (plus a few more in the 
suspend/bt/resume/wait30 cycle) to compare them later.  I'm not a gdb 
guru, so I'd be feeling my way on that too.

As for 64-bit gcc being not ready for prime-time yet, I would have agreed 
during the gcc 3.x series, but in the 4.x series from 4.1.x anyway, I 
think it's reasonably mature.  It hasn't had the decades that x86_32 has, 
but I'd say it's pretty close, all things considered.  I know it has been 
very stable and reasonable here.

Now one thing I /don't/ know about is how good the generic not-
specifically chip optimized x86_64 stuff is, or for that matter, the 
Intel em64t optimization, because I run and optimize for amd64/k8 here 
(plus sse3 since my CPUs have it, early AMD 64-bit chips didn't, so 
that's got to be added separately if you have it).  However, all x86_64 
is the same instructions in general, just ordered differently if 
specifically targeting Intel vs AMD vs generic, so (with the exception of 
SSE3/4, etc) the instructions should be the same on x86_64, regardless.

That said, I /am/ optimizing mine for amd-k8, not the generic x86_64 that 
binary distributions probably compile to, and it /is/ possible the 
generic isn't quite as solid as the specific targeted I've been using and 
can thus vouch for.

Still, I'd put the chances of gcc screwing up at about the same as with 
32-bit, not more.  While it does happen, I'd consider it more likely that 
there's a hardware problem.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman





reply via email to

[Prev in Thread] Current Thread [Next in Thread]