sks-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sks-devel] sks-peer.spodhuis.org catching back up


From: Jason Harris
Subject: Re: [Sks-devel] sks-peer.spodhuis.org catching back up
Date: Tue, 29 May 2012 22:08:42 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, May 29, 2012 at 05:42:03PM -0400, Jeffrey Johnson wrote:
> On May 29, 2012, at 5:29 PM, Phil Pennock wrote:
> > On 2012-05-29 at 14:20 -0400, Jeffrey Johnson wrote:

> > Ah.  Everything I tried was using a dbenv, as the context for opening
> > the db.  I saw nothing in the docs suggesting that opening a db designed
> > for use with a dbenv was possible without a dbenv.  *sigh*

Last time I checked, when you run two processes with DB_RECOVER,
the existing files (__db_.00{1,2,3,4,5}) are immediately removed by
the 2nd process using the database.  The 1st process keeps its copies
of these files open, but both processes believe the backing store
files (key, keyid, meta, subkeyid, time, tqueue, word) are under
their their exclusive control and, of course, corruption quickly ensues.

Removing DB_RECOVER allows multiple processes to join the environment
and use the database relatively well, until deadlock occurs.  I had
tried running db_deadlock with various flags but no success.  When
deadlock occurred, both processes needed to be killed and db_recover
needed to be run (manually, of course).  This can be avoided if SKS 
is modified to use BDB better and retry operations in the event of
deadlock.

Of course, all processes (sks db, sksclient) would need to use the
same locking mechanism/flags.

> > Yes, it was opinion, covering the state of affairs.  Since SKS is
> > normally going to run as the *only* user of the DB files, and we know
> > that even using the "sks dump" command to dump current keys requires
> > stopping sks, I stand by my opinion that the locking is currently not
> > what it could be.

NB:  There is currently NO locking in SKS.

SKS normally has only 1 process using each DB/environment, so there is
no chance of deadlock.  "sks recon" just asks "sks db" what has changed,
according to the latter's "time" DB, and updates PTree with the new
key=timestamp, value=hash pairs.

> I'd agree that there is something fishy about how SKS uses BDB in
> the PTRee store. Nothing that can't be lived with ? but I have seen
> too many deadlocks in the PTree database for me to believe "correct".

If the "time" DB timestamps don't monotonically increase, "sks recon"
"detects" a replay attack, although it is truly self-inflicted.

> Note that a "real" fix is a hugely painful amount of QA on a low-incidence
> error pathway: the existing incidence of approx 1-2 months between failures
> is more than acceptable imho.

Not QA, correcting a bad assumption in the original design.

How many keyservers run NTP, BTW?

> I can/will confirm the behavior if needed: dynamic state reproducers are
> all that is difficult.

Check the inode numbers on the existing and recreated/new
__db.00{1,2,3,4,5} files.  Note that the 1st process takes no steps
to use the new files.

-- 
Jason Harris           |  PGP:  This _is_ PGP-signed, isn't it?
address@hidden _|_ Got photons? (TM), (C) 2004

Attachment: pgpsZ3IGUOtYZ.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]