pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Re: Pan 0.120 only queuing tasks


From: Per Hedeland
Subject: Re: [Pan-users] Re: Pan 0.120 only queuing tasks
Date: Tue, 1 May 2007 13:58:16 +0200 (CEST)

Duncan <address@hidden> wrote:
>
>Per Hedeland <address@hidden> posted
>address@hidden, excerpted below, on  Mon,
>30 Apr 2007 19:12:02 +0200:
>
>Duncan wrote...
>
>>> If it's what I think it is, pan has sent ACKs for packets
>
>That BTW was a mis-type.  More accurately, it's not the app (pan in our 
>case) but the OS TCP/IP stack that will be sending the ACKs in the 
>ordinary case, as the app (pan) pulls the packets out of the receive 
>buffer thus making room for more.

Actually the ACKs are sent as soon as the data has been received, they
don't wait for the app to read it - otherwise the sender would have to
uselessly retransmit that data if the app is "slow" in reading. The
shrinking receive buffer, i.e. lower receive window size, is advertised
along with those ACKs, eventually causing the sender to stall if the app
doesn't read.

>> Uh, I dare say the Internet would have been somewhat less of a success
>> if TCP was indeed as fragile as you make it out to be. Remember, this
>> stuff was designed to survive direct hits with nuclear weapons...:-) TCP
>> does indeed not work well with persistent packet loss - that is to say,
>> performance goes down the drain, but if at all possible, the bits do
>> eventually arrive where they're supposed to. Of course ACKs are
>> retransmitted as needed just like everything else, i.e. the "zombie
>> connection syndrome" that you describe simply cannot happen if the
>> communication path still exists, and packets at least "occasionally" get
>> through. There has been some pathological cases where the communication
>> path goes away (modem drops connection, computer is powered-off without
>> proper shutdown) at exactly the wrong moment, but I believe a TCP/IP
>> stack that doesn't handle those would be considered buggy today.
>
>Well, you are describing the ideal and properly implemented state.
>
>What I can say is that, regardless of the reason (which might be the 
>below at times), in practice, users have to deal with such zombie 
>connections from time to time.  It does seem to happen less on good 
>connections, and proper implementation has "the world" to do with it as 
>well, but it does unfortunately happen in real life to real users.

Well, there will always be bugs of course, but fundamental ones in
TCP/IP implementations should be rare enough these days that it
shouldn't be the first place to look for communications problems. And at
least in my reading, your previous message came across as describing
inherent flaws in TCP - there may be some of those too, but not of such
a fundamental nature.

The specific scenario you described, that the sender has sent enough
that he needs more receive window, and the ACKs (that might have
advertised a new window) have been lost, so everything grinds to halt,
simply doesn't happen *if the communication path still exists*: Since
the sender hasn't received ACKs for the data he sent, he will retransmit
it (this doesn't require more receive window since it's "old" data), and
that retransmission will elicit new ACKs to make up for those that were
lost.

Of course if *no* ACKs are ever received by the sender, progress can't
be made - this I would describe as the communication path (at least
partially) having gone away. In this case the sender will eventually
give up and reset the connection, but that "eventually" can be a pretty
long time - the original spec says 5 minutes, but I believe there are
implementations that will try (or at least can be configured to try)
significantly longer.

>Actually, if you read my previous post, that was in fact what I was 
>saying, that the highwinds software implementation has an end-user 
>reputation for being what amounts to a rather poor implementation, with 
>all sorts of issues unless it happens to be perfectly tweaked for every 
>one of perhaps dozens of factors, or unless it is run way under its rated 
>capacity.

But that software isn't a TCP/IP implementation AFAIK, just another
user-level app...

>The half-closed state was in fact what I was referring to, with close 
>packets being lost in transit, such that one end or the other doesn't 
>realize the other is closing the connection (or has successfully closed 
>in, in the case of the final ACK, thus keeping a now dead connection open.

Again I don't know of any such states that will remain "indefinitely"
unless *the local app still holds the connection open*. CLOSE_WAIT is
precisely that, FIN has been received from the remote but the local app
hasn't closed the socket - this state *should* be able to remain
indefinitely by design, but as soon as the local app closes the socket
the complete shutdown is initiated (and that may fail, and need to time
out, of course).

>There may in fact be other mechanisms at work here as well.  However, 
>this one seems to fit the available data.  At least in the Cox/Highwinds-
>media case, multiple RESETS jumbled together with continuing packets on 
>the connection (sequenced AFTER the RESET, so it's not just late 
>packets), have been documented via packet capture in the continuing 
>saga.  That sort of stuff should NOT be happening.  If the server has 
>sent a reset, it shouldn't continue sending regular packets on the same 
>connection.  It should wait, and if necessary send another reset, but it 
>shouldn't continue sending ordinary data packets with sequence numbers 
>indicating they were sent AFTER the RESET.  That's indicative of a 
>seriously screwed up implementation.

Yes, that violates the spec of course - though I would guess the most
likely cause is not a broken TCP/IP stack (far less a broken app), but
one that is fronted by a semi-confused firewall. Current firewalls
typically "listen in" on the TCP session to be able to reject "bogus"
packets, but don't have the complete picture which is only available at
the endpoints. I've personally come across an implementation that could
(depending on configuration) send RST in response to perfectly
legitimate packets, because it incorrectly deemed them to be "out of
window" - and without the intended recipient of the original packet
being informed, it may well send further "normal" packets with higher
sequence numbers.

Of course it doesn't matter to the end user whether it is app or stack
or firewall causing the problem as long as they're all at the other end,
but I still don't see how any of this could cause connections to stay
around significantly longer than the local app desires. And a newsreader
app has no reason to leave a socket in CLOSE_WAIT for an extended period
of time since it can't be used for further NNTP protocol exchange - if
it reads any remaining data, it will subsequently see an EOF on the
socket, and should proceed to close it.

--Per Hedeland




reply via email to

[Prev in Thread] Current Thread [Next in Thread]