sks-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sks-devel] SKS PTree possible corruption?


From: Yaron Minsky
Subject: Re: [Sks-devel] SKS PTree possible corruption?
Date: Tue, 13 Jul 2004 21:32:22 -0400

Hmm.  This is a rather odd outcome.  I'd be curious as to which OS
you're running on, because as I understand unix semantics, the error
you're seeing simply shouldn't happen.

I've attached the relevant snippet of code.  As you can see, the alarm
syscall is done with value 0 right before the block in which the
exception you saw was caught.  Calling alarm with 0 disables the
alarm, and so that should really be that.  Why sigalarm is going off I
can't really fathom.

In any case, I have some workarounds in mind that might fix it.  But
before we go down that route, I'ld like to get some more information
as to what platform you're running on.

Another thing worth noting is that the "PTree may be corrupted"
message is too alarmist.  What may be corrupted is the in-memory copy
of the PTree.  Since the transaction in question is aborted, there
should be no problem with the on-disk PTree, and so there should be no
need for rebuilding the PTree database.

Yaron

(** does a single catchup-run, returning true if no results were retrieved
  by the catchup *)
let single_catchup count =
  let resp = ReconComm.send_dbmsg 
               (LogQuery (count,PTree.get_synctime (get_ptree ()))) in
  let log = 
    match resp with
      | LogResp log -> log
      | _ -> failwith "Unexpected response"
  in
  match log with
    | [] -> true
    | _ -> 
        let length = List.length log in
        let newts = last_ts log in
        let old_timeout = Unix.alarm 0 in
        let txn = new_txnopt () in
        begin
          try
            applylog txn log;
            ignore (plerror (if length = 0 then 5 else 3) 
                      "Added %d hash-updates. Caught up to %f" 
                      length newts);
            PTree.clean txn (get_ptree ());
            commit_txnopt txn
          with
            | Sys.Break ->
                abort_txnopt txn;
                raise Sys.Break
            | e ->
                ignore (eplerror 1 e 
                          "Raising Sys.Break -- PTree may be corrupted");
                abort_txnopt txn;
                raise Sys.Break
        end;
        ignore (Unix.alarm old_timeout);
        false


On Tue, 13 Jul 2004 13:03:48 +0200, Dinko Korunic <address@hidden> wrote:
> Hi all. I do seem to have rather often this problem:
> 
> 2004-07-11 19:59:57 Raising Sys.Break -- PTree may be corrupted:
> Eventloop.SigAlarm
> 2004-07-11 19:59:57 <get_missing_keys.catchup> callback interrupted by
> break.
> 2004-07-11 19:59:57 DB closed
> 
> I'm sure the process wasn't interrupted in any way, yet this occurs with
> alarmant frequency. I've tried rebuilding the PTree already, which
> solved the situation, however this occured at least twice after that
> event.
> 
> Any help, hint, idea, solution even? Should I upgrade the DB library?
> 
> --
> |  |--.----.-----. Dinko 'kreator' Korunic       #include <stddisclaimer.h>
> |    <|   _|  -__| http://www.srce.hr/~kreator/ | http://kre.deviantart.com
> |__|__|__| |_____| PGP:0xEA160D0B | IRC:kre | ICQ:16965294 | AIM:kreatorMoo
> 
> _______________________________________________
> Sks-devel mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/sks-devel
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]