pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pan-users] Re: Pan or server problem?


From: Duncan
Subject: [Pan-users] Re: Pan or server problem?
Date: Wed, 11 Apr 2007 14:20:16 +0000 (UTC)
User-agent: Pan/0.126 (Demon Sweat)

"Rinaldi J. Montessi" <address@hidden>
posted address@hidden, excerpted below, on  Wed, 11 Apr
2007 02:22:08 -0400:

>  8  205.152.152.46 (205.152.152.46)  34.860 ms  30.335 ms  29.538 ms 
>  9  205.152.152.6 (205.152.152.6)  33.518 ms  30.661 ms  30.502 ms
> 10  * * *
> 11  * * *
> 12  * * *
> 13  * * *
> 14  * * *
> 15  * * *
> 16  * * *
> 17  * * *
> 18  * * *
> 19  * * *
> 20  * * *
> 21  * * *
> 22  * * *
> 23  * * *
> 24  * 66.21.240.205 (66.21.240.205)  30.012 ms !A * 
> 25  * * *

It doesn't specifically note what the !A is, but there's some interesting 
notes about !<letter> notation and skipped hops in the traceroute manpage.

BTW, tcptraceroute is useful for this sort of thing, too.  It sends TCP 
syn packets to a selected port (80/http by default, but we'd select 119/
nntp here).  Most hops will immediately RESET, indicating port closed.  
The destination service host will of course normally respond with the 
open port, SYN ACK, in which case the tracing end kernel sends the 
reset.  Due to the technique it uses, of course, it requires a privileged 
user (aka root) to run.

Here's the past-Cox route a tcptraceroute newsgroups.belsouth.net 119 
shows from here:

10  * axr01aep-so-1-1-2.bellsouth.net (65.83.236.185) 84.210 ms  85.229 ms
11  ixc01aep-pos-4-0.bellsouth.net (65.83.237.99)  87.418 ms  86.879 ms  
86.830 ms
12  205.152.152.42  85.017 ms * 86.340 ms
13  205.152.152.94  143.494 ms  85.189 ms  86.084 ms
14  * 66.21.240.205 96.508 ms  97.215 ms
15  bignews.bellsouth.net (216.77.188.18) [open]  102.067 ms  99.325 ms *

Note the [open] notation indicating the port is open on the last, which 
is our target.  After the two 205 IPs, it shows only a single further 
router hop before it hits the server itself.  The !A may be mean rate 
limited or the like, as obviously, your hop 24 should be hop 10 only it's 
not showing in the regular traceroutes, and bignews aka newsgroups should 
be your hop 11.

I'm not sure what's going on, but TCP TIME_WAIT indicates a TCP 
connection in the half-closed state -- one side has said close, but the 
other side hasn't gotten the ACK on the close yet.  They eventually 
timeout, if the ACK gets lost or whatever, but it can take awhile.

Assuming you tried that telnet session when everything from your end was 
closed (nothing in netstat), it's indicating the other end isn't allowing 
the connection, saying all ten allowed connections are in use.  It may 
not be dropping them as it should.  

Note that tornado is a high-winds product, and they are known for "stuck 
connection" issues in certain configurations.  Here, Cox outsourced news 
to highwinds-media October-ish of last year, and we've been fighting this 
sort of thing since then... only they're allowing only four connections.  
In the configuration here, it's partly because they have two newsserver 
farms setup to authenticate to the same (separate) authentication server, 
and what happens is that in some cases the individual news-frontends that 
serve the actual posts will experience issues and issue a TCP RESET, 
which SHOULD say the connection's dead, and can be reestablished.  Only 
for whatever reason, the front-end doesn't send the info to the auth-
server, which then thinks the RESET connections are still alive and 
kicking, and thus won't authenticate any further connections.

That you immediately get your full allotment of half-closed TCP sessions 
seems to indicate a very similar problem on bellsouth, but you shouldn't 
be the only one seeing it if so (tho not everyone will necessarily see 
it, the folks with good enough quality connections to seldom see resets 
and possibly there's something else involved as well, never seem to have 
the issue, or at least not to the same degree).

Previous to October-ish of last year, Cox ran its own servers, using 
Highwinds software on Sun Hardware.  (The big companies often go with 
Highwinds, as it's one of the only commercial products available that can 
run on the "Big Iron" necessary to server a big ISP.  The alternative is 
apparently running certainly tens and possibly hundreds of PC server 
level machines, with open source or small commercial level products, but 
big corporations being what they are, they like to go big and 
centralized, so...)  Cox's own servers had various problems (mainly 
completion and retention), but reliable authentication wasn't one of 
them.  That aspect just worked.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman





reply via email to

[Prev in Thread] Current Thread [Next in Thread]