[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Sks-devel] Re: sks recon cores when it claims "reconciliation complete"
From: |
Yaron Minsky |
Subject: |
[Sks-devel] Re: sks recon cores when it claims "reconciliation complete" |
Date: |
Mon, 26 Jan 2004 15:08:19 -0500 (EST) |
User-agent: |
SquirrelMail/1.4.2-1 |
Hi Chris. First off, a small thing. This kind of message should probably
go to address@hidden (And of course, when sending there, messages
shouldn't be encrypted.) I'm forwarding this message there in the hopes
of getting feedback from others.
Speaking of which, has anyone had experience running SKS on freebsd? I'm
wondering if Chris' problems are unique.
I haven't seen this particular error before, and the error report is
unhappily hard to track down. The error happens between lines 177 and 180
of reconserver.ml. It might be useful to stick a few more plerror
instructions in there to see precisely where the error is.
For the most part, segfaults in ocaml come in two places; where you use
the native marshalling code and you get the type wrong --- that doesn't
occur at all in my code, so we can ignore that case --- and in the
interfacing code between ocaml and C. There are two places this might
happen in SKS. The first is the interface to the berkely database, and
the second is in numerix, which is the large-integer arithmetic package
that SKS relies on. The hashconvert function, which occurs in the
critical lines in question, could be the source of the problem, so adding
printout statements could help track the problem down.
One ugly possibility is that the error is somewhere else entirely, and the
exception hits there just because that's when the GC happens to do a big
collection that runs over the memory in question. If that's the case, it
will be harder to track down what's going on.
The bytecode won't really help you here, since core dumps don't generate
stacktraces. I'm not sure why the bytecode isn't working for you. Can
you run the ocaml interpreter? You can invoke it by typing "ocaml" at the
command line. Also, try doing "ocaml unix.cma", and then do something
like "Unix.dup;;" and see if it throws a gasket.
y
> Thanks for the pointer to "sks cleandb" that did the trick. anyway i'm now
> syncing with a few other machines, and whenever recon claims to have
completed
> it cores.
>
> pyxis:ttyp9# tail log.recon
> 2004-01-26 10:12:45 Reconciliation complete <-- core dump
> 2004-01-26 10:13:17 Opening log <-- i restart
> sks recon
> 2004-01-26 10:13:17 sks_recon, SKS version 1.0.6
> 2004-01-26 10:13:17 Copyright Yaron Minsky 2002-2003
> 2004-01-26 10:13:17 Licensed under GPL. See COPYING file for details
> 2004-01-26 10:13:17 Opening PTree database
> 2004-01-26 10:13:17 Setting up PTree data structure
> 2004-01-26 10:13:18 PTree setup complete
> 2004-01-26 10:13:18 Initiating catchup
> 2004-01-26 10:13:22 Fetching filters
> 2004-01-26 10:13:26 Starting event loop
> 2004-01-26 10:14:35 Recon partner: <ADDR_INET 213.141.74.169:11370>
> 2004-01-26 10:14:35 Initiating reconciliation
>
> I don't know much about debugging ocaml, I assume that "alloc_small" is
> some sort of ocaml intrinsic? I find a bunch of things that call it in the
> sks source, but no definition thereof...
>
> pyxis:ttypa# gdb sks sks.core
> GNU gdb 4.16.1
> Copyright 1996 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you
are
> welcome to change it and/or distribute copies of it under certain
conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for details.
> This GDB was configured as "i386-unknown-openbsd3.4"...
> Core was generated by `sks'.
> Program terminated with signal 11, Segmentation fault.
> Reading symbols from /usr/lib/libz.so.3.0...done.
> Reading symbols from /usr/local/lib/libdb.so.4.2...done.
> Reading symbols from /usr/lib/libm.so.1.0...done.
> Reading symbols from /usr/lib/libc.so.30.3...done.
> Reading symbols from /usr/libexec/ld.so...done.
> #0 0x1c0e3f6b in alloc_small ()
> (gdb) bt
> #0 0x1c0e3f6b in alloc_small ()
> #1 0x1c0ed7f0 in alloc_custom ()
> #2 0x1c0bec68 in sx_split ()
> #3 0x1c085aac in Numerix__fun_2305 ()
> #4 0x60cf7fe0 in ?? ()
> Cannot access memory at address 0x10cf7fe0.
>
> objdump -S says this was going on....
> 1c0e3f64 <alloc_small>:
> 1c0e3f64: 55 push %ebp
> 1c0e3f65: 89 e5 mov %esp,%ebp
> 1c0e3f67: 83 ec 0c sub $0xc,%esp
> 1c0e3f6a: 57 push %edi
> 1c0e3f6b: 56 push %esi
> *CORE*
> 1c0e3f6c: 53 push %ebx
> 1c0e3f6d: 8b 75 08 mov 0x8(%ebp),%esi
> 1c0e3f70: 8b 7d 0c mov 0xc(%ebp),%edi
> 1c0e3f73: 8d 1c b5 04 00 00 00 lea 0x4(,%esi,4),%ebx
> 1c0e3f7a: a1 bc ad 05 3c mov 0x3c05adbc,%eax
> 1c0e3f7f: 29 d8 sub %ebx,%eax
> 1c0e3f81: a3 bc ad 05 3c mov %eax,0x3c05adbc
> 1c0e3f86: 3b 05 c0 ad 05 3c cmp 0x3c05adc0,%eax
> 1c0e3f8c: 73 12 jae 1c0e3fa0 <alloc_small+0x3c>
> 1c0e3f8e: 01 d8 add %ebx,%eax
> 1c0e3f90: a3 bc ad 05 3c mov %eax,0x3c05adbc
> 1c0e3f95: e8 f2 f6 ff ff call 1c0e368c <minor_collection>
> 1c0e3f9a: 29 1d bc ad 05 3c sub %ebx,0x3c05adbc
> 1c0e3fa0: 8b 15 bc ad 05 3c mov 0x3c05adbc,%edx
> 1c0e3fa6: c1 e6 0a shl $0xa,%esi
> 1c0e3fa9: 8d 84 3e 00 03 00 00 lea 0x300(%esi,%edi,1),%eax
> 1c0e3fb0: 89 02 mov %eax,(%edx)
> 1c0e3fb2: a1 bc ad 05 3c mov 0x3c05adbc,%eax
> 1c0e3fb7: 83 c0 04 add $0x4,%eax
> 1c0e3fba: 5b pop %ebx
> 1c0e3fbb: 5e pop %esi
> 1c0e3fbc: 5f pop %edi
> 1c0e3fbd: c9 leave
> 1c0e3fbe: c3 ret
> 1c0e3fbf: 90 nop
>
> (gdb) info registers
> eax 0x7 7
> ecx 0x0 0
> edx 0x4 4
> ebx 0x5 5
> esp 0xcf7fe000 0xcf7fe000
> ebp 0xcf7fe010 0xcf7fe010
> esi 0x3c059460 1006998624
> edi 0x8 8
> eip 0x1c0e3f6b 0x1c0e3f6b
> eflags 0x10292 66194
> cs 0x2b 43
> ss 0x33 51
> ds 0x33 51
> es 0x33 51
> fs 0x33 51
> gs 0x33 51
>
>
> I'll try run the bytecode version with backtrace turned on and see if that
> gets me any further. or not...
>
> pyxis:ttypa# ocamlrun bin/sks.bc help
> Fatal error: unknown C primitive `unix_dup'
>
> I'll see if the cvs code helps any.
>
> OS: OpenBSD 3.4-current i388
> DB: 4.2.52
> ML: Ocaml 3.07
> CC: "gcc version 2.95.3 20010125 (prerelease, propolice)"
>
|--------/ Yaron M. Minsky \--------|
|--------\ http://www.cs.cornell.edu/home/yminsky/ /--------|
Open PGP --- KeyID B1FFD916
Fingerprint: 5BF6 83E1 0CE3 1043 95D8 F8D5 9F12 B3A9 B1FF D916
- [Sks-devel] Re: sks recon cores when it claims "reconciliation complete",
Yaron Minsky <=
- Re: [Sks-devel] Re: sks recon cores when it claims "reconciliation complete", Chris Kuethe, 2004/01/26
- Re: [Sks-devel] Re: sks recon cores when it claims "reconciliation complete", Chris Kuethe, 2004/01/26
- Re: [Sks-devel] Re: sks recon cores when it claims "reconciliation complete", Yaron M. Minsky, 2004/01/26
- Re: [Sks-devel] Re: sks recon cores when it claims "reconciliation complete", Chris Kuethe, 2004/01/26
- Re: [Sks-devel] Re: sks recon cores when it claims "reconciliation complete", Chris Kuethe, 2004/01/26
- Re: [Sks-devel] Re: sks recon cores when it claims "reconciliationcomplete", Yaron Minsky, 2004/01/28
- Re: [Sks-devel] Re: sks recon cores when it claims "reconciliationcomplete", Dan Egli, 2004/01/28
- Re: [Sks-devel] Re: sks recon cores when it claims"reconciliationcomplete", Yaron Minsky, 2004/01/29