[Librefm-discuss] [Events and Plugins] Events Engine Design

librefm-discuss
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Librefm-discuss] [Events and Plugins] Events Engine Design

From:	P. J. McDermott
Subject:	[Librefm-discuss] [Events and Plugins] Events Engine Design
Date:	Wed, 22 Jun 2011 06:04:05 -0400
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20110307 Icedove/3.0.11
Hi all,

I've been working lately on the design of the events engine.  In this
e-mail, I intend to share with you the results of my work and my
thoughts and plans.


The Events Engine Interface
---------------------------

I began the design from the outside (the interface) and worked my way to
the inside (the implementation), as I usually do on non-trivial
projects.  I determined that the interface to the events engine must
offer the following functionality:
  - Register an event with its parameters.
  - Add a hook to an event.
  - Dispatch an event (call all hooks registered for the event).
  - List all events.
  - List all parameters of an event.
The last two functions will be used by the karma and badge plugins.  For
example, the administrative interface of the karma plugin will list all
registered events, and the administrator will select one of the events
and assign it a karma award.  (This is why all events and their
parameters must be registered.)


Overview of the Implementation Designs
--------------------------------------

After defining the basic interface, I had to design an implementation
that could store all relevant data with efficiency of space and retrieve
and manipulate the data with efficiency of time.  In other words, the
engine as a whole should be lightweight and fast.  I came up with three
designs toward this end and evaluated the space and time complexities of
each design.  I'll give you an overview of each design before finally
describing my choice in detail.  But first, I should explain the metrics
I use in describing algorithmic performance, since this may be something
of a confusing topic of computer science.

Computational Complexity

I will be describing the performance of each design using measures of
computational complexity, specifically in terms of execution time.
Complexity is defined to be relative to some function of n, where n (the
"problem size") in this case is the number of events registered.  O
("big-O" or "big-Omicron") means that the complexity is less than or
equal to the given function.  (The Wikipedia article on big-O notation
has much more information on computational complexity. [1])

This probably still isn't very clear, so let's see some examples.  An
algorithm with time complexity O(n) ("big-O of n") has an execution time
that is approximately proportional to the problem size, n.  We say that
this algorithm has "linear" complexity or that it completes in "linear
time".  If it takes 1.0 seconds to complete with n=100, then it should
take up to 2.0 seconds with n=200.  An algorithm with time complexity
O(1) ("big-O of one") completes in "constant time".  That is, the
problem size has no effect on the execution time and the algorithm
always takes about the same amount of time to complete.  We always try
to find algorithms with the smallest possible complexity; algorithms
with time complexity O(1) are pretty much the holy grail of computer
science.

In some cases, you'll see algorithmic complexity in terms of a problem
size "p".  I define p to be the number of parameters for a particular event.

Design 1: Parallel Arrays

In this design, the events engine class would have three parallel [2]
arrays: one enumerating the event names, one containing lists of event
hooks, and one listing event parameters.  To list all events, we would
return a reference to the first array (more on this later).  To dispatch
an event, we would find the event name in the first array, get its key
(an "event ID" perhaps), and use that key in the hooks array.  This
would work something like this:
  $hooks = &$this->hooks[array_search($ev_name, $this->events)];

There are two problems with this design.  The first has to do with PHP
itself.  When returning the events array to list all events, for the
sake of performance we should return a reference to the array rather
than a copy of it.  Returning a reference in PHP requires two things.
First, the function must specify that it will return a reference; for
example, this function returns a reference:
  function &foo($bar) {return $bar;}
Second, when calling the function, you must also use the reference
operator; see the following example:
  $baz = &foo($bar);
This is rather complicated and easy to overlook.  If a programmer
forgets the ampersand when listing events in this design, performance
degrades from constant time to linear time because the array is then
copied rather than referenced.

The second problem with this design is that it is rather complicated and
doesn't have the best performance.  Recall the example for finding the
list of event hooks to be called:
  $hooks = &$this->hooks[array_search($ev_name, $this->events)];
Many functions would have to use array_search() to find an event's
numerical key, then use that in another array.  array_search() uses a
linear search to traverse the hash table and find the desired value (I
even checked the PHP source code to be sure), so its time complexity is
O(n) on average.

Design 2: Multidimensional Array

In this design, all data related to events would be stored in a single
multidimensional array (the exact opposite of using parallel arrays).
The array's keys would be the event names and its values would be
subarrays.  The subarrays would contain arrays of hooks, arrays of
parameters, and other information about the events.  The performance of
this design for functions such as dispatching events is slightly better
than that of the first design.  In this design, to get a list of hooks,
we would do something like this:
  $hooks = &$this->events[$ev_name][2];
Easy.  No more array_search().  What is the asymptotic performance of
this line of code?  Well arrays in PHP are implemented as hash tables
[3] with separate chaining [4].  In the best case, accessing a member of
an array in PHP happens in constant time.   But if many keys produce the
same hash (that is, there are many hash collisions), then all of these
members are chained together in a linked list.  Traversal of these lists
brings the performance of the array down to linear time in the worst
case.  So some people call this performance "O(n) but really close to
O(1)" [5].

But there is a problem with this design.  How do we list events?  We
would have to run array_keys() on the multidimensional array.  This
would happen in linear time.  The first design was able to do this in
constant time (since the array was already in the desired structure).
We could say that it doesn't really matter, since in practice we'll
probably only be listing events in the administrative interfaces of the
karma and badges plugins.  But we can still do better.

Design 3: Multidimensional Array With Iterators

The data storage in this design is identical to that in the previous
design.  The main difference lies in how we list events and their
parameters.  Instead of returning arrays, we provide iterators [6] that
work with the data for us.  An events iterator would iterate over the
multidimensional events array and provide the key of the current member
as the value.  Therefore, we eliminate the need for array_keys() and
regain constant time performance.  This is the design I have chosen
because it seems to offer the best asymptotic performance in both time
and space in every function of the engine.


Design 3 in Depth
-----------------

For this design, I propose three classes: an events iterator, an event
parameters iterator, and the events engine itself.  Following are
details of the classes and their properties and methods, including
assessments of time complexities for all methods.  Note that in the case
of arrays, for verbosity I use a template syntax similar to that of
C++0x and Java.
  class EventsIterator implements Iterator
    private int $position
    private int $length
    private array<string, array<int, mixed>> $events
    public __construct(int &$events_count,
                       array<string, array<int, mixed>> &$events)
      Time complexity: O(1)
    public string current()
      Time complexity: O(1)
    public int key()
      Time complexity: O(1)
    public void next()
      Time complexity: O(1)
    public void rewind()
      Time complexity: O(1)
    public boolean valid()
      Time complexity: O(1)
  class EventParametersIterator implements Iterator
    private int $position
    private int $length
    private array<int, string> $parameters
    public __construct(int &$parameters_count,
                       array<int, string> &$parameters)
      Time complexity: O(1)
    public string current()
      Time complexity: O(1)
    public int key()
      Time complexity: O(1)
    public void next()
      Time complexity: O(1)
    public void rewind()
      Time complexity: O(1)
    public boolean valid()
      Time complexity: O(1)
  class Events
    private int $events_count
    private array<string, array<int, mixed>> $events
      The array key is an event name.
      Each array value (subarray) has four members:
        The member with key 0 is an integer to count the parameters.
        The member with key 1 is a sorted array of parameter names.
        The member with key 2 is an array of hook callbacks*.
        The member with key 3 is a Boolean value indicating whether the
          event is currently being dispatched. This is used in
          Events::dispatch() to prevent recursive event dispatching,
          which could corrupt the parameter array's internal pointer.
    public bool registerEvent(string &$ev_name,
                              array<string> &$parameters)
      Time complexity (includes sorting parameters array):
        Best case:  O(1) + O(p lb p)*
        Worst case: O(n) + O(p^2)
    public EventsIterator getEvents()
      Time complexity: O(1)
    public EventParametersIterator getEventParameters(string &$ev_name)
      Time complexity:
        Best case:  O(1)
        Worst case: O(n)
    public bool addHook(string &$ev_name, callback** &$hook)
      Time complexity:
        Best case:  O(1)
        Worst case: O(n)
    public bool dispatchEvent(string &$ev_name,
                              array<string, mixed> &$arguments)
      Time complexity (includes comparing arguments against parameters):
        Comparing unsorted arrays, best case:     O(1) + O(p^2)
        Comparing unsorted arrays, worst case:    O(n) + O(p^2)
        Sorting and comparing arrays, best case:  O(1) + O(p lb p)
        Sorting and comparing arrays, worst case: O(n) + O(p^2)
  *  "lb" is the symbol for the binary logarithm [7], as defined in ISO
     31-11 [8].
  ** This is a "pseudo-type" that represents function names and class
     method arrays, as explained in the PHP manual. [9]

Notice that the best case time complexity of almost every method is
constant!  And in the other methods, it is a sum of constant and
loglinear time, O(1) + O(p lb p) (as a result of the call to quicksort
to speed up parameter-argument comparisons).

We may delay sorting event parameter arrays (currently to be done in
Events::registerEvent()) until Events::dispatchEvent() and
Events::getEventParameters().  All events must be registered, but not
all will be dispatched during any single HTTP request.  Therefore,
sorting all parameter arrays may be wasteful.  This change would make
the complexity of Events::registerEvent() between O(1) and O(n), make
the complexity of Events::getEventParameters() between O(1) + O(p lb p)
and O(n) + O(p^2), and leave the complexity of Events::dispatchEvent()
unchanged.  This change would also require a new member of the big
subarrays to track whether each parameters array is sorted, but I think
the space and time tradeoffs here are probably worth it.


Conclusion
----------

I hope this is a clear enough explanation of the designs and the one
I've chosen.  If you have any questions or comments, or if you think
something here is wrong or something could be done better, as always
feel free to let me know.  If the design I've chosen is acceptable to
everyone, I guess my next step will be to implement it in GNU FM.


Thanks,
P. J.

E-mail:        address@hidden
Freenode IRC:  pehjota
Microblogging: http://identi.ca/pehjota
Software work: https://gitorious.org/~pehjota


1: http://en.wikipedia.org/wiki/Big_O_notation
2: http://en.wikipedia.org/wiki/Parallel_array
3: http://en.wikipedia.org/wiki/Hash_table
4: http://en.wikipedia.org/wiki/Hash_table#Separate_chaining
5:
http://stackoverflow.com/questions/2473989/list-of-big-o-for-php-functions#2484455
6: http://www.php.net/manual/en/class.iterator.php
7: http://en.wikipedia.org/wiki/Binary_logarithm
8:
http://en.wikipedia.org/wiki/ISO_31-11#Exponential_and_logarithmic_functions
9:
http://www.php.net/manual/en/language.pseudo-types.php#language.types.callback
[Prev in Thread]
Current Thread
[Next in Thread]
[Librefm-discuss] [Events and Plugins] Events Engine Design, P. J. McDermott <=
Prev by Date: [Librefm-discuss] RFC: Initial Design Work on Events and Plugins Systems
Previous by thread: [Librefm-discuss] Libre.fm GSoC Project - sephiap
Index(es):
- Date
- Thread