watt-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Watt-dev] A cross-language spec for WATT


From: Daniel P. Berrange
Subject: Re: [Watt-dev] A cross-language spec for WATT
Date: Wed, 12 Oct 2005 22:34:19 +0100
User-agent: Mutt/1.5.9i

On Tue, Oct 11, 2005 at 02:34:56PM -0700, David Lutterkort wrote:
> On Sun, 2005-09-04 at 07:36, Daniel P. Berrange wrote:
> > The concept of an operation
> > can be easily extended to cover any interaction with external systems,
> > such as remote procedure calls, or interactions with messaging services
> > like IBM MQ.
> 
> In all generality, an operation will likely be a point in the execution
> of the program, best represented by a stacktrace, plus additional
> metadata.
> 
> > For developers of a web application, live pages may be 
> > desired to provide just in time view of the data collected. For a
> > production support team, low detail, but long term aggregation of
> > operation statistics may be desired to identify potential trouble
> > spots, or abnormal runtime behaviour. 
> 
> That I think is the main value of a 'developer support' type system:
> collecting highly selective data about the execution of a program. In
> some ways, this is similar to what a profiler does, except that
> profiling data is for many purposes way too fine grained. 

The other interesting/important difference in this from an execution 
profiler such as oprofile or gprof, that this is really focused on
instrumenting what is technically 'idle' time from the POV of the 
program being instrumented. eg time waiting for Oracle to complete
a SQL query, time way for a remote app to complete an RPC op, etc,
etc. It is really hard to analyse this kinda of stuff with traditional
profilers, particularly of the thing your waiting on is a remote 
service.

> > For testers operating an
> > integration test harness, reports on the data may be desired to
> > provide a qualitative view on the system wrt to a previous baseline.
> 
> Interesting. Can you elaborate that a little ?

Well, consider you have a web application, and a test suite consisting
of 100 HTTP requests, and expected responses. Well, if you run the test
suite through, collecting dev support data, then analyse it to generate
some metrics. For example, average SQL queries & time per request, max
time of any request, etc, etc. Now denote this to be the baseline performance.
If you now have the test suite run automatically once a day, on the latest
nightly build of the app, you can automatically compare latest code to the
baseline, and find out that yesterdays changes increased the average SQL
time by 1/2 second, or added 5 queries to every page. So you get very quick
turn around on performance regressions, which previously might only have
been noticed 3 months later during QA scalability testing, by which point
it might be very hard to fix the code without major changes.

> > Object model
> > =============
> 
> Looks excellent; I would call the HTTPProcess an HTTPRequest, though.
> 
> Instead of distinguishing between a process and a script, why not just
> add the additional attributes from script to the process ? They are easy
> enough to gather from /proc/<pid>. It might be useful at some point to
> harvest the whole /proc/<pid> dir for a process. Before we do that
> though we should have some reporting use cases that clearly demonstrate
> the need.
> 
> With context attributes, it migh be cleaner to let people subclass the
> standard object model with their special-purpose classes. I don't think
> a process is different from a stage in that respect. In a way, a
> transaction is a stage with special context attributes added. But all
> this points to the problem of extensibility of the object model, which I
> would just punt on for now.

I think there is value in keeping context attributes as a opaque (key, value)
pair hash. My thought is that you have two types of data:

  * Core data with some particular semantic significance to the object,
    which has specific value to WATT & the analysis reporting / analysis
    algorithms. 

  * Metadata which is significant to the application being instrumented,
    which is semantically opaque to WATT. It primary purpose is tagging
    objects for the purposes of presentation (in the web admin view),
    and for grouping / filtering of reports / analysis. 

For core data, subclassing objects to add in explicit fields does make
sense, hence HTTPRequest subclassing the generic Process class to add in
the request URL, URL parameters, etc. Metadata though would be totally
aribtrary, and you could not neccessarily predict what would be stored.
Particularly if you think about a plugin architecture, an app developer
may add an context attribute representing the name of the application
corresponding to the URL, a 3rd party authentication plugin, might, say
attach an attribute about the LDAP user authenticated for the request.
For this metadata, since you can't predict in advance what will be recorded,
and indeed may have a number of 3rd party plugins also adding data, 
subclassing would not be feasible, and (key,value) hashed pairs would be more
appropriate.

> > Database Operation
> >  - row count. (8 bytes)
> 
> This might be tricky to get to if we want to avoid that the trace tool
> causes database queries by itself.

This is something I'd consider to be optional. If the language's DB APIs
provided such information automatically then include it, otherwise simply
leave it out. In Perl DB, statement handles have a simple 'rows' property
whcih gives you acccess to row count for UPDATE, INSERT & DELETE queries.
Some DBI backends may also give it to you for SELECT queries, but its
optional. I certainly wouldn't run another query to collect it via a count(*)

> > Messaging Operation
> > RPC Operation
> 
> Do you know whether it is possible to hook into the appropriate
> subsystem to collect the metrics similar to how the DB driver is
> instrumented ?

For Java, if you're using the JMS messaging APIs you could probably
write a JMS provider which proxied to the real JMS provider as you
do with JDBC.  Dunno, about RPC apis, since there's no real accepted
standard. In Perl I use a number of different techniques. For DBI,
I subclass the main DBI class and programs instantiate that instead 
of the regular, while for DBus RPC I do something really nasty and
replace the Net::DBus::RemoteObject->call_method  subroutine in the
interpreters symbol table.

> > Configuration
> > =============
> > Depending on the circumstances of the deployment, it can be
> > desirable to record different levels of detail.
> 
> Another very good point. I think there are two options: either make it
> possible to easily add/remove points of intrumentation or ignore points
> of instrumentation that are always 'on'. Your logging proposal does the
> second.

Yes, I've not really considered the former option before. Particularly
wrt to stages within a process being instrumented I can see value in
being able to turn on/off instrumentation for particular packages/methods.

> For the Java part, I have been thinking of using BC annotation to define
> stages, so that adding a stage could be done simply through stating in a
> config file something like
> 
> watt.stage.foo=com.example.SomeClass#someMethod

Using dynamic, rather than statically inserted instrumentation would
be a much more scalable approach for stages. (cf KProbes / SystemTap
vs LinuxTraceToolkit). I imagine I might be able to do a similar trick
in Perl by hooking/replacing subroutines in the interpreter symbol table.

> which says 'Stage foo is entered (exited) whenever someMethod in
> com.example.SomeClass is entered (exited)'. It has the advantage that no
> code changes are needed to instrument the app, and the downside that the
> scope of a stage has to coincide with a method call.

In case where one needed finer scope, one could still have option of
explicit static instrumentation, or refactoring the method. So its not
a significant downside, given its huge upside in terms of scalability
and easy of use - particularly since it shifts the decision of what
to instrument from developers, to administrators, the latter often being 
the ones whom discover (in production!) the bits which need instrumenting.

> > Storage mechansim
> > =================
> >  - Database - persist records to a fully normalized database. 
> >               This is high overhead on insert, and fairly space
> >               inefficient, but lends itself very well to bulk
> >               data analysis
> 
> I would assume that this is mainly useful when the data needed for a
> report would consume a significant amount of memory. It might be better
> to leave the details of database storage up to the reporting tool, and
> just focus on the plain file storage mechanism, i.e., something that
> allows the instrumented process to write its data out as fast as
> possible.

Actually the main reason for using a DB is to make it easy for Joe Blogs
administrator / developer to write SQL queries for data mining. If they
had to analyse the data from files directly it'd quickly become very
tedious & not at all scalable if you get into dealing with systems where
you want to analyse many 10's of thousands of transactions at once. I am
open to the idea of storing to files & then batch loading to the database.

> > File Storage
> > ------------
> > When storing stats for a process, the bucket
> > is chosen pseudo-randomly, for example by taking modulus of
> > the UNIX PID wrt to 'n'.  ie, getpid % n. The choice of value
> > for 'n' is thus a factor of the ratio between the time for a
> > single process, and the time to store the process's stats.
> 
> Why not use a simple per-process counter, where the counter is either
> unbounded or bounded ? This would also make it easy to clean up unwanted
> stats in a FIFO manner.

For the top level I wanted to come up with a mechanism for taking a
finite number of buckets, and assigning them to processes, while at 
the same time not creating a synchronization locking bottleneck. The 
'getpid % n' calculation, while not entirely eliminating need for 
synchronization, does reduce it a trival level. A process only needs 
to take out a lock on the particular bucket designated by 'getpid % n', 
and this lock can then be released the moment the process has allocated 
itself a  storage file within the ring buffer of the 2nd level of 
directories. 

So the lock will be held for a tiny amount of time, and at most 1/n
of the total running processes will even contend on each bucket - and
this is in the worst case where all processes finish their transaction
and need to write to disk at the same time. Essentially one should see
no statistically significant contention at all (assuming 'n' wasn't
chosen too small relative to total runnig processes).

> > Within each bucket, a ring buffer storage mechanism is used. 
> > Each bucket is intended to store 't' sets of stats, so, when 
> > the 't'th entry is created, it returns an overwrites the 1st 
> > entry again. 
> 
> I am not sure I understand the reason for the two-level hash mechanism.
> Is this just to keep directories to a reasonable size ?

Mostly to lower synchronization contention, but keeping directories
small is another useful benefit - when one can have say 500 or more 
threads processing one transaction per second, the number of files
required quickly becomes incredibly large.

> > Index files
> > -----------
> > 
> > The index.txt file within each sub-bucket contains one line
> > per detail file. 
> 
> Shouldn't the generation of the index file be left up to the reporting
> tool ? It seems that the information that is useful in an index file
> highly depends on the report that is to be generated.

The index files are primarily intended for use by the JSP web admin
pages. They contain the minimum data required to allow one to display
the 'last 100 transaction' summary pages, without having to read each
individual transaction detail file. If one didn't want the JSP pages
for browsing stats we could easily make ti possible to turn the index
files off.

> > Detail files
> > ------------
> 
> Sounds good. We need to put some more thought into how the XML files are
> written for long-running processes so that reporting on them can be done
> while the process is still running. 

Yeah this is a tricky problem. One (the only?) nice thing about an in-memory
storage of transaction is that it makes it trivial to display 'in-flight'
transactions - eg ones which have started by not yet commit/aborted. XML is
inherantly unsuitable for accessing 'in-flight' data because an XML file cant
be considered well-formed / valid until the closing tag is written. Of course
you can't write said closing tag until the transaction is finished.

I've toyed with the idea of saying skip XML, and use YAML which can easily
be read in-flight, since it doesn't have closing tags to worry about. Only
thing putting me off is the desirable to avoid too many code dependnacies.
You can guarentee all languages have an XML parser these days, but an YAML
parser typically requires an extra download & install step.

Dan.
-- 
|=-            GPG key: http://www.berrange.com/~dan/gpgkey.txt       -=|
|=-       Perl modules: http://search.cpan.org/~danberr/              -=|
|=-           Projects: http://freshmeat.net/~danielpb/               -=|
|=-   address@hidden  -  Daniel Berrange  -  address@hidden    -=|

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]