help-cfengine
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Coredumping problem on Mandrake 10/2.6 kernel


From: Tod Oace
Subject: Re: Coredumping problem on Mandrake 10/2.6 kernel
Date: Mon, 8 Nov 2004 09:24:43 -0800

If there's a core dump try getting a "backtrace" using gdb. Gdb will need both the core dump and the cfengine binaries and sources that you compiled with to do that. The backtrace will show which area of code cfservd failed in and hopefully why.

If you don't have a core dump handy...well... Figure out how to enable core dumps. Maybe "unlimit" before starting cfservd?

Another idea would be to run cfservd with "-d"ebug output into a file. Just be careful so you don't run out of disk space. The debug output might be useful if some particular client is triggering the problem. But the backtrace will probably be more useful. -Tod


On Nov 8, 2004, at 08:59, Brian Thomas wrote:

I'm sure you have to answer this question all the time, and I apologize
for my ignorance, but... Where do I go from here? :)

I mean, I understand that there can be (and were!) multiple libdb's on a
system, and how the libraries from one and the headers from another can
cause problems. But what I don't understand is how to get around it. Nor
do I understand why I have the specific problem I do; that of cfservd
running for hours just fine, but then deciding to crash. What in the
header mismatch might result in that?

For what it's worth, I've made sure there's only one version of libdb
(4.x, I removed the 3.x) from the system, recompiled everything, and the
problems still persist. And, in fact, the version I compiled against my
own 3.x build on Friday ran ok only for about 8 hours, it crashed
sometime during the night, so my theory about a libdb4 weirdism doesn't
seem to hold water.

I believe this is more than just a libdb version collision problem,
although I wholeheartedly believe libdb is still at fault in some way. I
just don't know how to A) troubleshoot or B) work around it. Short of
removing all libdb packages entirely from the system, how can I make
sure to get a standalone cfengine compile that won't go hunting through
/usr/include for header files when I've specified an particular
--with-berkeleydb path, for the purposes of entirely removing the
possibility of a mismatch being the culprit?

Again I apologize for what I know this list gets pinged about regularly,
but I've yet to find a solid answer on this, just a lot of folks
stumbling around trying to figure out what to do.

Brian

-----Original Message-----
From: Mark Burgess [mailto:Mark.Burgess@iu.hio.no]
Sent: Friday, November 05, 2004 9:10 PM
To: Brian Thomas
Cc: help-cfengine@gnu.org
Subject: Re: Coredumping problem on Mandrake 10/2.6 kernel


A likely explanation is that you have multiple versions of Berkeley
db on the system and yuo are mixing old header files with newer
libraries. This will cause core dumps, just as mixing
regex libraries will...

Mark

On Fri, Nov 05, 2004 at 03:11:08PM -0800, Brian Thomas wrote:
Well... Progress, at least on the cfagent front:

I can actually run cfagent without a problem now, statically linked
against a libdb-3.3.11 compile. I don't know if cfservd will stay up
until the next run of the clients today, since that's the only time
there's enough load, but whatever the problem, it seems
(unsurprisingly)
linked to libdb. But it is NOT an issue with dynamic loading
weirdness,
since as far as everything I can tell, it's all static.

Brian

-----Original Message-----
From: Brian Thomas
Sent: Friday, November 05, 2004 2:46 PM
To: help-cfengine@gnu.org
Subject: Coredumping problem on Mandrake 10/2.6 kernel

So I'd thought originally I'd solved my problems with coredump
problems
on Mandrake 10.x, but my excitement was premature. Furthermore, in
testing I realized my locally-compiled version is not just having a
problem with cfservd; it looks like cfagent is crashing as well.

I originally was, and still am, having problems with 'cfservd'
coredumping after running for awhile, usually under heavy-ish load. At
my half-hour intervals it would crap out, and appeared to be related
to
libdb.

So in an effort to solve this, I undertook an effort to compile
statically against libdb. I'll skip the intervening frustration,
suffice
to say I decided during the ordeal that just compiling my own libdb
and
my own openssl static libraries and compiling against them was
probably
better anyway than using the system static libdb.a. No problem with
the
compile process itself once I did that, and I can verify (with ldd) I
am
relying on neither a dynamic libdb nor a dynamic libcrypto.

The problem is, I have twice the problems! Why? Because now cfagent is
coredumping, and much more spectacularly (Read: Immediately) than
cfservd, although cfservd is still crashing under load.

Included below is lots of relevant, maybe too much, information. I'm
not
sure what to do at this point; originally I thought this was an issue
with the tls (/lib/tls) versions of the libraries, and tried
compiling/executing against each individually, with the same results
either way.

So first, the software versions. Bear in mind I have the exact same
problems when compiling against the Mandrake-installed versions of
openssl and berkeleydb:

Openssl 0.9.7e
BerkeleyDB 4.2.52
Cfengine 2.1.11

Next, configure line (After this it's just a 'make'):

./configure --with-berkeleydb=/var/tmp/db-4.2.52
--with-openssl=/var/tmp/openssl-0.9.7e

Next, OS config:

# uname -a
Linux amd-usa 2.6.3-7mdk-p3-smp-64GB #1 SMP Wed Mar 17 15:34:39 CET
2004
i686 unknown unknown GNU/Linux
# cat /etc/issue:
Mandrake Linux release 10.0 (Official) for i586
Kernel 2.6.3-7mdk-p3-smp-64GB on a 4-processor i686 / \l

Next, gdb output. This first one is from the cfservd crash:

# gdb -c ./core.32076 cfservd
GNU gdb 6.0-2mdk (Mandrake Linux)
[warranty deletia]
This GDB was configured as "i586-mandrake-linux-gnu"...Using host
libthread_db library "/lib/libthread_db.so.1".

Core was generated by `./cfservd -m'.
Program terminated with signal 11, Segmentation fault.

warning: current_sos: Can't read pathname for load map: Input/output
error

Reading symbols from /lib/libnss_nis.so.2...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/tls/libpthread.so.0...done.
Loaded symbols for /lib/tls/libpthread.so.0
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_nisplus.so.2...done.
Loaded symbols for /lib/libnss_nisplus.so.2
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
#0  0x080b0a6f in __bam_pinsert ()
(gdb) backtrace
#0  0x080b0a6f in __bam_pinsert ()
#1  0x080af683 in __bam_page ()
#2  0x080af070 in __bam_split ()
#3  0x080f3bf9 in __bam_c_put ()
#4  0x080dc06b in __db_c_put ()
#5  0x080d588f in __db_put ()
#6  0x080e250e in __db_put_pp ()
#7  0x08063d97 in LastSeen (hostname=0x40427900
"hostfoo.shopping.com",
role=cf_accept) at ip.c:443
#8  0x0804e130 in VerifyConnection (conn=0x8254e68, buf=0x4042e966
"10.20.3.50 hostfoo.shopping.com root 0")
    at cfservd.c:1777
#9  0x0804d06c in BusyWithConnection (conn=0x8254e68) at
cfservd.c:1234
#10 0x0804cbc1 in HandleConnection (conn=0x8254e68) at cfservd.c:1133
#11 0x4002c7d3 in start_thread () from /lib/tls/libpthread.so.0
#12 0x40144b4a in clone () from /lib/tls/libc.so.6

Next is the output from the cfagent crash. Note, these two crashes DO
NOT happen at the same time! Usually I can crank up a cfservd, and as
long as there's no significant load it will run fine, while cfagent
will
crash every time. Similarly, cfservd will always eventually crash,
whether or not I run the locally-compiled cfagent against it. I am
still
guessing the two crashes have the same similar root causes, but they
do
not trigger each other!

# gdb -c ./core.32098 cfagent
GNU gdb 6.0-2mdk (Mandrake Linux)
[warranty deletia]
This GDB was configured as "i586-mandrake-linux-gnu"...Using host
libthread_db library "/lib/libthread_db.so.1".

Core was generated by `./cfagent --debug'.
Program terminated with signal 11, Segmentation fault.

warning: current_sos: Can't read pathname for load map: Input/output
error

Reading symbols from /lib/libnss_nis.so.2...done.
Loaded symbols for /lib/libnss_nis.so.2
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_nisplus.so.2...done.
Loaded symbols for /lib/libnss_nisplus.so.2
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
#0  0x40115b47 in memcpy () from /lib/libc.so.6
(gdb) backtrace
#0  0x40115b47 in memcpy () from /lib/libc.so.6
#1  0x080cf5ec in __bam_copy ()
#2  0x080cf01e in __bam_psplit ()
#3  0x080cd86c in __bam_page ()
#4  0x080cd280 in __bam_split ()
#5  0x08111e09 in __bam_c_put ()
#6  0x080fa27b in __db_c_put ()
#7  0x080f3a9f in __db_put ()
#8  0x0810071e in __db_put_pp ()
#9  0x0805ba27 in LastSeen (hostname=0xbfff4650
"serverfoo.shopping.com", role=cf_connect) at ip.c:443
#10 0x0805b265 in RemoteConnect (host=0xbfff4650
"serverfoo.shopping.com", forceipv4=110 'n') at ip.c:192
#11 0x080590c7 in OpenServerConnection (ip=0x8290c40) at client.c:57
#12 0x08054308 in MakeImages () at do.c:2435
#13 0x0804d70e in DoTree (passes=1, info=0x81cdf00 "Update") at
cfagent.c:1274
#14 0x0804b435 in main (argc=2, argv=0xbfffe7a4) at cfagent.c:107




_______________________________________________
Help-cfengine mailing list
Help-cfengine@gnu.org
http://lists.gnu.org/mailman/listinfo/help-cfengine

--


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Work: +47 22453272            Email:  Mark.Burgess@iu.hio.no
Fax : +47 22453205            WWW  :  http://www.iu.hio.no/~mark
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~





_______________________________________________
Help-cfengine mailing list
Help-cfengine@gnu.org
http://lists.gnu.org/mailman/listinfo/help-cfengine

--
Tod Oace, Intel Corporation <tod@intel.com>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]