sks-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sks-devel] Re: sks recon cores when it claims "reconciliation compl


From: Chris Kuethe
Subject: Re: [Sks-devel] Re: sks recon cores when it claims "reconciliation complete"
Date: Mon, 26 Jan 2004 15:20:17 -0700 (MST)

On Mon, 26 Jan 2004, Yaron Minsky wrote:

> I haven't seen this particular error before, and the error report is
> unhappily hard to track down.  The error happens between lines 177 and 180
> of reconserver.ml.  It might be useful to stick a few more plerror
> instructions in there to see precisely where the error is.

And a few other places too... like around line 220 of reconserver.ml

(** CK *)
ignore (plerror 4 "calling print_hashes (client)");
      let hashes = hashconvert elements in
      print_hashes hashes;
(** CK *)
ignore (plerror 4 "finished print_hashes (client)");

> For the most part, segfaults in ocaml come in two places; where you use
> the native marshalling code and you get the type wrong --- that doesn't
> occur at all in my code, so we can ignore that case --- and in the
> interfacing code between ocaml and C.  There are two places this might
> happen in SKS.  The first is the interface to the berkely database, and

Nope, the db routines are stable. Though I might take a kick at allowing
more concurrent writers. Also, I know you removed the (no)?transactions
options, but that would be really handy to have back for keyring loads
when there are no other db or recon processes going on.

> the second is in numerix, which is the large-integer arithmetic package
> that SKS relies on.  The hashconvert function, which occurs in the
> critical lines in question, could be the source of the problem, so adding
> printout statements could help track the problem down.

There are a couple of places where hashconvert is called, one as gossip
server and one as gossip client. Code's the same but having slightly
different messages tells me who's making the code explode. This is
important since in server mode, I do actually get diff files written
and I start requesting keys.

2004-01-26 14:01:14 Recon partner: <ADDR_INET 64.217.17.198:11370>              
2004-01-26 14:01:14 Initiating reconciliation                                   
2004-01-26 14:38:54 Reconciliation complete                                     
2004-01-26 14:38:54 calling print_hashes (client)                               
*core*

I'll dig into this some more now that my networking guys know that my
box will generating out-of-profile traffic and they don't need to break
my kneecaps.

> One ugly possibility is that the error is somewhere else entirely, and the
> exception hits there just because that's when the GC happens to do a big
> collection that runs over the memory in question.  If that's the case, it
> will be harder to track down what's going on.

That could be it too. :( GC can get ugly...

CK

-- 
Chris Kuethe, GCIA CISSP: Secure Systems Specialist - U of A CNS
      office: 157 General Services Bldg.    +1.780.492.8135
              address@hidden

     GDB has a 'break' feature; why doesn't it have 'fix' too?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]