help-smalltalk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Help-smalltalk] the test test - an experience report


From: Stefan Schmiedl
Subject: [Help-smalltalk] the test test - an experience report
Date: Thu, 16 Jul 2009 15:33:16 +0200

You might recall my building a tool for online testing
with Iliad and GNU Smalltalk. Today was "the" day, and
here is what happened.

I had gst running on one of my webservers, students
accessed it through an apache proxy set up to serve
static files and forward the remaining requests to
the app. As testers we had 5 classes with about 25
students each. The test they took had 13 exercises
with a total of 95 questions.

The clients were about 25 PCs in a private subnet,
going through a NAT-ing DSL router with something
between 200 kbit/s and 500 kbit/s upstream and
2 Mbit/s and 6 Mbit/s downstream (IIRC), shared with a few
other users across the rest of the school grounds.
Response time was snappy, iliad's "ajax loader"
mostly only flashed in the corner, without a chance
of showing off the animation.

From the users' point of view, it was a nice change
of things, and they quite liked it, once we got over
the initial hurdle (below).

I watched the app through atop and paid attention to
the RSIZE, i.e. the amount of physical RAM the gst-remote
process was using
1) after startup, before any access: 19 MB
2) after displaying the registration page with a single widget
   for 25 clients: 33 MB
3) after everybody started working, i.e. with 25 full blown
   test widgets: 65 MB
4) after finishing the test: between 100 MB and 110 MB

I have saved the images of every run 
  gst-remote --eval="ObjectMemory snapshot"
and will try to find out what has caused this growth.

CPU load on the server varied upto 90% during the initial phase
(more below) and upto 30% during test execution. The numbers are
not really reliable, as the machine is hosting a few other apps,
which might have contributed. The machine is a "small" single
core 64 bit AMD64 3700+ with 1GB RAM.

Now for the interesting problem, that managed to make me
a bit nervous... and would have been a total showstopper,
had it not happened in this "experimental testing session".

The students of class 6a logged into their Windows domain accounts, 
started Firefox and entered the URL for the test (stage 1 above).
Then they entered their names into the registration page (stage 2)
and clicked on the button to access the test. Shortly after server
CPU load went to 100% with the following error message being repeated
as fast as the remote terminal could cope with:

  "Socket accept error: Error while trying to accept a socket connection"

Client side a one-liner 500 error message was reported.

Time for pkill gst-remote ... I rebuilt the image and started the
server again. This time we staged the 25 "almost simultaneous" login
attempts into four batches of 6 each and things worked fine from that point on.

After finishing the test, the students logged off and the next class, 6b ...
had the exact same experience ... and 6c and 6d, too.

For the final group I tried a different approach:
They logged on, opened the URL, and sat on their hands.
I killed gst-remote, rebuilt the image, restarted gst-remote and told them
to reload the page. They then entered their names and started clicking on
the answers and the Socket error of Doom appeared again. Kill, rebuild,
restart. Everybody loads the registration page (not staged, just 25 students
clicking when they're ready), enters their name and works on the test as it 
should be. No hiccup.

I am very open to any suggestions as to what could have caused this
misbehavior. I don't think iliad is concerned (besides generating ajax 
requests),
so swazoo and the gst socket implementation are my next suspects.

My vague-feeling-in-the-gut-proved-by-handwaving hypothesis is that it
might be the combination of building a fairly large widget tree *and*
creating a bunch of new socket connections at the same time that's causing
the trouble.

I'll try to build a test bed to reproduce the disaster in a controlled
setting, but it will be a few days before I can really get to this.

Any ideas?

s.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]