circle-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [circle] How are data items stored/forwarded?


From: rodrigo benenson
Subject: Re: [circle] How are data items stored/forwarded?
Date: Mon, 4 Oct 2004 12:27:05 -0400

now that you are involved in the Circle twistified rewrite maybe 
rereading 
http://lists.gnu.org/archive/html/circle-discuss/2004-05/msg00028.html

could give you some ideas.

rodrigob.


On Mon, 4 Oct 2004 08:52:44 -0700, Paul Campbell <address@hidden> wrote:
> On Mon, Oct 04, 2004 at 08:59:12AM +0200, Thomas Voegtlin wrote:
> > nice to see some activity on this list again!
> > I have been moving to a new place, new job, etc...
> > I'm afraid I will not have time to look closely at
> > circle for a few weeks (or months)
> >
> > if you find a bug or want to improve something,
> > please feel free to submit a patch
> 
> The reason that I'm asking so many questions (and checking into various
> things) is that I wrote a lot of code and tinkered with the idea of writing
> my own DHT from scratch in the twisted framework. I had mostly written
> the core. Then I started looking closely at thecircle again, and I realized
> that even after I finish rewriting the core, I've STILL got the work of
> doing the user interface, all the services, etc., etc.
> 
> It became pretty clear to me that it would be easier/faster to simply
> refactor thecircle in Twisted after all.
> 
> I started with node.py this week. Sorry Thomas, but your giant polling thread
> had to go! I know you are so disappointed. :)
> 
> I've completely removed threading from the core code (node.py) and done a lot
> of work moving things in and out of various classes to straighten it up.
> Right now, it looks like this:
> 
> UDPRPC -- Code to run the RPC's. I did away with the packet format from the
> original thecircle (decided not to really preserve much except calling
> semantics to minimize everything). Now the packet format is like this:
> 
> Byte #1: Future use flag, Request/Response flag, Long/Short packet Flag, Pkt #
> Byte #2-5: Nonce (ticket #)
> And then the rest is the usual safe_pickled info, although you don't need
> to encode the nonce in the packet anymore, and I'm removing all references
> to the nonces outside of the UDPRPC class.
> 
> This substantially shrinks the packet size down smaller than TCP packets,
> EVEN with the UDP header encapsulating it.
> 
> The protocol works like this:
> 1. Every RPC must have a unique nonce (ticket # in a lot of the existing
> code base). That is how the receiver separates them. Although there is
> absolutely nothing wrong with intermixing messages. There is a large time
> window however on receiving a complete message (30 seconds).
> 
> 2. The response *IS* the ACK. So no acking is necessary.
> 
> 3. Out of order packets and lost packets are not recovered. Basically, the
> usual semantics is to retransmit anyways and packet loss rates are typically
> very low, so all that TCP ordering/acking silliness is unnecesary.
> 
> 4. For short messages that fit within a single UDP packet (I arbitrarily
> limited the payload to 560 bytes or less since with the header, this nicely
> fits within the 576 byte limit on lots of machines), simply send the packet
> as-is with a packet # of zero.
> 
> 5. For "long" packets, send the packets in order, starting with packet #0
> as the first packet. Set the "LONG" bit on all packets except for the very
> last one. Increment the packet number (4 bits) for each packet sent. This
> limits the total message length to 8959 bytes which I felt is more than
> sufficient for an RPC protocol (you should be doing something more like
> TCP if you want to send more per packet).  All "long" packets must have a
> 560 byte payload or the protocol will toss it. The LAST packet in the chain
> has the "SHORT" bit set (actually, the long bit is simply 0 instead). I didn't
> fix it yet, but if the message length just so happens to be exactly a
> multiple of 560 bytes, it will end up sending a "short" packet with a length
> of zero.
> 
> I moved the entire "handler" code into a separate subclass of class Node,
> and I allowed two formats for functions (via "add_handler"). The first
> format is as before. The second format is to add a bare function. I did this
> for a simple reason. After creating a nice dispatcher, the existing code
> totally BREAKS the idea of a dispatcher with the giant "perform standard
> function" call which is a giant nest of if/elif's! This simple change to the
> dispatcher added flexibility and got rid of the standard function.
> 
> I loaded all of the various standard calls into a separate class (subclassed
> to Node) as well for clarity.
> 
> The "call" semantics have changed drastically. Now instead of getting a thread
> return and then calling "get_reply" later (often busy waiting), the code
> uses the twisted way...simply make the call and you get a "deferred" back.
> Then handle the deferred in continuation-passing style.
> 
> I'm slowly moving all the "bare" functions from the node.py module back into
> the Node class where they belong as well.
> 
> I added code to the Peer and Link classes to do their OWN polling & updating
> on a per-instance basis. Also, since the Peer class holds everything necessary
> to make activate/deactivate hashtable functions, I let the Peer polling tasks
> handle that function as well. Also, I moved the invariant checking code out
> into Peer/Link where it belongs.
> 
> One thing I've noticed a lot is that there is a LOT of invariant checking. And
> when you start refactoring the code, you start to notice that "hey, how many
> times do I really need to check if it's a valid IP address?" all over the
> code. So I've done away with a lot of invariant checking, when it is simply
> redundant and doesn't break semantics.
> 
> There are no more locks. Everything is single-threaded in the reactor-style
> so far. So that should help performance dramatically.
> 
> I made a small change (although I haven't really used it yet) in the
> safe_pickle code that I stole from banana/jelly in twisted's Perspective
> Broker (and yes, I considered using PB before scratching the idea for a lot
> of reasons). It now contains a code book. If safe_pickle reads a string that
> is in the thesaurus, it codes it as 128+index instead of "S<string>". That
> means that all of the usual handler calls ("who are you") are coded in a
> single byte instead of a short string. I left the parameters alone so far.
> The only trick here is that the code book has to be identical everywhere
> unless I code something into the pickled results for version-identification
> later on.
> 
> So...the reason that I'm asking a lot of questions about the data model and
> the search semantics is that those are the last routines in node.py that I
> have left to refactor. Everything else is done enough that I can probably
> test the code out on a couple machines within a week or two. After that,
> it's mostly a matter of refactoring everything else to both get rid of the
> threading code and also to adapt it to the twisted way of doing things
> (mostly, using continuation-passing-style). And then after that...I can
> componentize it to allow for hot loading/unloading of code, changing plugins
> on the fly, etc., etc.
> 
> I'm not entirely sure that I'm even going to stick with thecircle's current
> data model and node structure (it's just a ring with some vaguely
> Kademlia-like features right now as far as I can tell). Just that for the
> first few trials it may be easier to simply refactor the code rather than
> come up with something entirely new and different.
> 
> Once I get that far along, if I can find a copy of the old "circleget" code
> or make something up similar that relies on just the node.py code and I get
> it working more or less with a few nodes, I'll be happy to release a
> preliminary version of the code. Then it's "just" a matter of refactoring
> everything else with the new structure.
> 
> I'm not entirely throwing out PB by the way. I think that Perspective
> Broker is a truly awesome and simple way to handle the "daemon/client"
> setup that thecircle is moving towards. PB would make the transition almost
> trivial since it already has security semantics and handles both remote
> objects and remote procedure calls. It is much more tightly integrated than
> my own RPC code. It should work fine both over pipes and TCP/IP (where it
> was designed). Also, twisted's web interface (and/or Nevow) would make the
> HTML interface to thecircle a bit easier to construct.
> 
> So in short, if I can get the base "node.py" code off the ground along with
> simple translations the other key modules, twistifying thecircle should bring
> a lot of benefits.
> 
> BTW...I looked at the idea of writing node.py itself as a component and trying
> to "componentize" the handler interface. I can't really figure out a good
> way to do it yet. If anyone has any ideas, I'll be happy to swap some code.
> 
> 
> 
> 
> _______________________________________________
> Circle-discuss mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/circle-discuss
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]