pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pan-users] Re: freeze, high CPU getting new headers


From: Duncan
Subject: [Pan-users] Re: freeze, high CPU getting new headers
Date: Wed, 6 Aug 2008 17:06:30 +0000 (UTC)
User-agent: Pan/0.133 (House of Butterflies)

Rhialto <address@hidden> posted
address@hidden, excerpted below, on  Wed, 06 Aug 2008
09:38:30 +0200:

> On Wed 06 Aug 2008 at 01:24:11 +0000, walt wrote:
>> Your 'stuck' traces show an infinite loop of file read and writes, and
>> the output includes the names ROLE_TOOL_BAR and ROLE_TOOL_TIP over and
>> over. It's easy to figure out that these symbols belong to atk
>> (accessibility tool kit), and that's no surprise because pan is linked
>> to libatk.
> 
> I thought they may be X11 connection data?

The URL for the straces again (please keep this bit as long as referring 
to them):    http://www.xs4all.nl/~benscho9/pan/

First three stalled, 4th working, 5th running but stalls.

I don't know what it all means, but here's some explanation in English of 
what's going on, for the non-programmers (like me) among us, based on the 
straces and what manpaging the various calls turned up.

You may be right about the X11 connection data.

Those writes aren't normal writes, but writevs, the "v" apparently for 
"vector" or buffer, as they write multiple buffers, according to the 
manpage (man section 2, of course).

I read the manpages for several of the calls in the trace and know rather 
more about Linux internals than I did before! =8^)  Using this sample 
(the first writev after probably the stall in strace5, line 4850):

writev(11, [{"GIOP\1\2\1\0\214\v\0\0", 12}, 
{"\350\221\372\244\3\0\0\0\0\0\0\0\34\0\0\0\0\0\0\0C\374\200x4e\250(\300
+(("..., 2048}, {"ROLE_TOOL_TIP\0\0\0\n\0\0\0ROLE_TREE\0TI"..., 908}], 3) 
= 2968

As I guessed, the first value (11 above) is the filedescriptor it's 
writing to.  The last (3 above) is the number of vectors/buffers it's 
writing to the file, and in between is a structure containing an array of 
<count> structures, each one a buffer begin address and length, which 
strace reports as the actual data, instead.

So the GIOP... \350... and ROLE_TOOL_TIP... stuff above are strace's 
report of what was in those buffers, while 12, 2048 and 908 are the 
lengths of each, respectively.

The return value is the number of bytes written. From above, 12+2048
+908=2968, as expected.

Now we don't know what the filedescriptor referred to (that would have 
been mapped earlier by an open call, which would have given the path of 
the file to open and returned the new filedescriptor, but we don't have 
that part of the trace), but if you are correct, maybe it's the X socket, 
updating the tooltip, and in other calls, the toolbar.

The poll call is similarly composed of compound structures.  It waits for 
one of several open files to have something happen to them.  Each file 
substructure is an array of the filedescriptor, events to wait for, and 
r(eturn)events that happened.  The last two numbers are the number of 
files to poll and the timeout (-1 = wait forever).  The return value is 
the number of revents that occurred before the program was awakened.

So taking this from immediately after our writev example above:

poll([{fd=6, events=POLLIN}, {fd=11, events=POLLIN|POLLPRI, 
revents=POLLIN}, {fd=12, events=POLLIN|POLLPRI}, {fd=13, events=POLLIN|
POLLPRI}], 4, -1) = 1

We have it waiting for events on four files, filedescriptors (fds) 6, 11, 
12 and 13.

POLLIN is data appeared that can be read, POLLPRI is priority data 
appeared (this can be for example out of band data on a socket, say the 
connection died or something), and there are others possible as well of 
course.

So on the fds listed, pan is mainly waiting for data to appear, but also 
for priority-data if it appears as well, except on fd 6.  The call was 
made with no timeout (wait forever if necessary), and the return says 1 
event was triggered -- which filled in the appropriate revents, in this 
case POLLIN (incoming data) on fd 11.

If you look at the sequence at that point (again starting with line 4850 
of strace5), it then reads 12 bytes from the same file descriptor (11) 
into a particular memory buffer (GIOP... but different than what was 
previously written), then reads 36 more bytes into another buffer 
(\350...).  It then writes to the same fd again, then polls on the same 
four fds as previously as it begins repeating the cycle.

So we have a write from buffers, poll (wait until something returns), 
read the data, then start a new cycle with another write.  If this is to 
the X socket, then that's all the system calls it's making at that 
point.  Of course, we don't know from strace what else the program was 
doing at the time that didn't involve system calls.

BTW, brk = move the program break, effectively increase or decrease 
(increase as we see just above line 4850 here) the allocated memory.  
stat just gets information on a file, but doesn't actually read it or 
even open it.  I don't quite understand all those stats of /etc/
localtime, without opening it then, but as they were occurring in the 
working strace, they must be fine.

Contrast the stall with the non-stall, most of the strace from the top, 
where it was more or less continually reading fd 20 until it got the try-
later error (EAGAIN), then checking apparently STDIN (fd 3 by convention, 
checking for user input), and on seeing none, polling until more data was 
available from fd 20 again.  The data it was reading there appears to be 
header data, but it's just reading it, not writing anything.

So it reads in a bunch of headers, then stalls, apparently doing no 
system calls except for what appears to be updating X.  What it's doing 
in the the background isn't obvious from the strace since it's only a 
system-call trace, after all, but the reasonable assumption is that it's 
digesting the data it got, figuring out where to plug it into the 
existing linked-list of posts as threaded, etc.

I don't see a lot more there, but then again, the above is pretty fresh 
knowledge here too, based on a quick lookup of the call manpages 
comparing it to the strace.  Maybe someone else can make more of it, but 
I don't think there's a lot more to make.

In any case, I learned something and hopefully this posting will be 
useful to others who weren't already familiar with system calls and 
strace.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman





reply via email to

[Prev in Thread] Current Thread [Next in Thread]