sks-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Sks-devel] stuck in "Reconciliation attempt from <ADDR_INET [XXXXX]:330


From: Daniel Kahn Gillmor
Subject: [Sks-devel] stuck in "Reconciliation attempt from <ADDR_INET [XXXXX]:33091> while gossip disabled. Ignoring."
Date: Thu, 04 Apr 2013 09:49:00 -0400
User-agent: Notmuch/0.15.2 (http://notmuchmail.org) Emacs/23.4.1 (x86_64-pc-linux-gnu)

I realized today that zimmermann.mayfirst.org (aka keys.mayfirst.org)
had dropped out of the main SKS pool. [0]
https://sks-keyservers.net/status/ suggested it was > 4K keys behind
everyone else.

Looking in the recon logs, i saw at least five days of:

   Reconciliation attempt from <ADDR_INET [XXXXX]:33091> while gossip disabled. 
Ignoring.

but my logs didn't go far enough back to indicate why gossip was
disabled.

keys.mayfirst.org is running SKS 1.1.3.

Restarting the server seems to have reset the ability to gossip, and
it's now catching up, but i am curious to understand the logic here.  Is
it possible that the failure of a peer i'm currently trying to reconcile
with could cause SKS to stay in a gossip_disabled state?

in recoverList.ml, i see this definition:

  let gossip_disabled () = 
    not (Queue.is_empty recover_list) || !gossip_disabled_var

gossip_disabled_var is set at the tail of recoverList.ml within a "let
update_recover_list" stanza (which i think is a sort of function
definition, but i don't know ocaml well enough to get the terminology
right; corrections welcome!)

and gossip_enabled_var is only ever cleared inside a "let rec
get_missing_keys () =" stanza in reconserver.ml, apparently only when
the Queue is empty.

Could someone more proficient in ocaml work through the code and tell me
if this scenario seems plausible?

 A) i start a recon with peer X

 B) peer X has 200 keys more than i do.

 C) i add those keys to my recon queue, and disable gossip

 D) i fetch the first 100 keys from peer X succesfully (removing them
 from my queue)

 E) X fails or goes offline

 F) I try to fetch the next 100 keys ...

 G) I never manage to do so, so my queue is never empty, and
 gossip_disabled_var is never cleared

 H) i never accept reconciliation from any other peer again


If this could happen, it seems like an ill-timed failure of one peer (or
a malicious peer) could cause fairly significant damage to at least its
immediate peers in the gossip network.

Is there any mechanism for an sks instance to decide to give up its
attempt at reconciliation and start accepting gossip again from other
peers?

        --dkg

[0] https://support.mayfirst.org/ticket/7058

Attachment: pgpgwavUFI7HP.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]