[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-smalltalk] the test test - an experience report
From: |
Stefan Schmiedl |
Subject: |
Re: [Help-smalltalk] the test test - an experience report |
Date: |
Wed, 22 Jul 2009 23:12:34 +0200 |
Hi Paolo, Nico.
Another piece of the puzzle?
On Thu, 16 Jul 2009 15:33:16 +0200
Stefan Schmiedl <address@hidden> wrote:
>
> The students of class 6a logged into their Windows domain accounts,
> started Firefox and entered the URL for the test (stage 1 above).
> Then they entered their names into the registration page (stage 2)
> and clicked on the button to access the test. Shortly after server
> CPU load went to 100% with the following error message being repeated
> as fast as the remote terminal could cope with:
>
> "Socket accept error: Error while trying to accept a socket connection"
>
> Client side a one-liner 500 error message was reported.
>
> Time for pkill gst-remote ... I rebuilt the image and started the
> server again. This time we staged the 25 "almost simultaneous" login
> attempts into four batches of 6 each and things worked fine from that point
> on.
>
> After finishing the test, the students logged off and the next class, 6b ...
> had the exact same experience ... and 6c and 6d, too.
>
> For the final group I tried a different approach:
> They logged on, opened the URL, and sat on their hands.
> I killed gst-remote, rebuilt the image, restarted gst-remote and told them
> to reload the page. They then entered their names and started clicking on
> the answers and the Socket error of Doom appeared again. Kill, rebuild,
> restart. Everybody loads the registration page (not staged, just 25 students
> clicking when they're ready), enters their name and works on the test as it
> should be. No hiccup.
While I have not yet managed to reproduce the error message through
a ruby mechanize script, I have noticed something suspicious:
Start the server, check sockets on the server
server # netstat -n | grep 4080
server #
Run a mechanize script performing a few requests on the client.
The script fetches the first page and the referenced css and js files.
client $ ruby mech.rb 1
client $
Look at sockets on client
client $ netstat -n | grep 4080
tcp 0 0 192.168.1.5:37021 88.198.5.34:4080 FIN_WAIT2
Look at sockets on server
server # netstat -n | grep 4080
tcp 0 0 88.198.5.34:4080 93.223.36.238:37021
CLOSE_WAIT
Wait about 10 min .... (typing this text)
Look at sockets on client
client $ netstat -n | grep 4080
client $
Look at sockets on server
server # netstat -n | grep 4080
tcp 0 0 88.198.5.34:4080 93.223.36.238:37021
CLOSE_WAIT
Run mechanize script again:
client $ ruby mech.rb 1
client $
Sockets on client:
client $ netstat -n | grep 4080
tcp 0 0 192.168.1.5:57747 88.198.5.34:4080 FIN_WAIT2
Sockets on server:
server # netstat -n | grep 4080
tcp 0 0 88.198.5.34:4080 93.223.36.238:37021 CLOSE_WAIT
tcp 0 0 88.198.5.34:4080 93.223.36.238:57747 CLOSE_WAIT
soooo.... the problem described above has nothing to do with timing issues,
but instead resource exhaustion due to _too many_ open sockets in CLOSE_WAIT
state.
Note also that the problem is heavily exacerbated when the app is accessed
through an apache proxy as was done in the test session. In this scenario,
running the same requests as above
client $ ruby mech.rb 1
results in the following server-side mess:
server # netstat -n | grep 4080
tcp 0 0 127.0.0.1:4080 127.0.0.1:57163
CLOSE_WAIT
tcp 0 0 127.0.0.1:4080 127.0.0.1:57157
CLOSE_WAIT
tcp 0 0 127.0.0.1:57155 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:4080 127.0.0.1:57156
CLOSE_WAIT
tcp 0 0 127.0.0.1:57163 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:57157 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:4080 127.0.0.1:57161
CLOSE_WAIT
tcp 0 0 127.0.0.1:57153 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:57161 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:4080 127.0.0.1:57155
CLOSE_WAIT
tcp 0 0 127.0.0.1:57162 127.0.0.1:4080 TIME_WAIT
tcp 0 0 127.0.0.1:4080 127.0.0.1:57159
CLOSE_WAIT
tcp 0 0 127.0.0.1:57154 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:4080 127.0.0.1:57158
CLOSE_WAIT
tcp 0 0 127.0.0.1:57156 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:57160 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:57158 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:4080 127.0.0.1:57160
CLOSE_WAIT
tcp 0 0 127.0.0.1:57159 127.0.0.1:4080 FIN_WAIT2
tcp 0 0 127.0.0.1:4080 127.0.0.1:57153
CLOSE_WAIT
tcp 0 0 127.0.0.1:4080 127.0.0.1:57154
CLOSE_WAIT
The FIN_WAIT2 sockets will disappear after a while, the *10* CLOSE_WAIT sockets
won't.
And since they are already closed, they won't be reused either, AFAICT.
Now look at what google found for me:
http://www.sunmanagers.org/pipermail/summaries/2006-January/007068.html
I think, one of swazoo/sport/socket needs a behavioral readjustment.
s.
- Re: [Help-smalltalk] the test test - a more detailed analysis of image contents, (continued)
- Re: [Help-smalltalk] the test test - a more detailed analysis of image contents, Paolo Bonzini, 2009/07/20
- Re: [Help-smalltalk] the test test - a more detailed analysis of image contents, Stefan Schmiedl, 2009/07/20
- Re: [Help-smalltalk] the test test - a more detailed analysis of image contents, Paolo Bonzini, 2009/07/20
- Message not available
- Re: [Help-smalltalk] the test test - a more detailed analysis of image contents, Paolo Bonzini, 2009/07/20
- Message not available
- Re: [Help-smalltalk] the test test - a more detailed analysis of image contents, Paolo Bonzini, 2009/07/21
- Re: [Help-smalltalk] the test test - a more detailed analysis of image contents, Paolo Bonzini, 2009/07/20
- Message not available
- Re: [Help-smalltalk] the test test - a more detailed analysis of image contents, Paolo Bonzini, 2009/07/21
- Message not available
- Re: [Help-smalltalk] the test test - a more detailed analysis of image contents, Paolo Bonzini, 2009/07/21
- Re: [Help-smalltalk] the test test - a more detailed analysis of image contents, Paolo Bonzini, 2009/07/21
- Message not available
- Re: [Help-smalltalk] the test test - a more detailed analysis of image contents, Paolo Bonzini, 2009/07/21
Re: [Help-smalltalk] the test test - an experience report,
Stefan Schmiedl <=
Re: [Help-smalltalk] the test test - more data, Stefan Schmiedl, 2009/07/24