Monday, March 21, 2011

TCP Zero Windows - and Why They Can Be VERY Bad...

You may know that every TCP/IP connection requires the use of "send buffers" and "receive buffers."  These are simply 'holding pens', in which the TCP/IP stack stashes data; send buffers hold data from the application until it can be dispatched on the network, and receive buffers hold data received from the network until the application can retrieve it.

During the course of a TCP/IP connection--with every packet, in fact--each endpoint tells the other how much space it has left in its receive buffer; this lets the "other side" know how much more data it can send without waiting for ACKnowledgments.  At this point, we refer to this "space available" as the TCP Window.  So, if the default receive buffer starts at 64Kb, the endpoint has received 16Kb of data, and the application hasn't 'picked it up' yet, the TCP Window is 48Kb.   If the application makes a pickup, the TCP Window will shoot right back up, only to drop again when the next round of data is received from the network.  So, it's completely normal to see the TCP Windows (on both sides) ebb and flow during a conversation.  Nothing to see here, right?  Wrong.

What happens if an endpoint's receive buffer is filled and the application DOESN'T make a pickup in a timely fashion?  Well, the TCP Window ("space available", remember?) drops to 0 and the TCP Zero Window condition arises.   Think of this as "window closed."  The other endpoint will STOP transmitting data, and will then begin a series of probes (usually at either 30-second or 1-minute intervals) to see if the recipient's TCP Window has opened.  If the recipient should empty its receive buffers at all (in other words, the application makes even a partial pickup), it will announce the new "space available" with a TCP Window Update, acknowledging the last packet received; if not, it will continue to return TCP Zero Window information.  So, to translate this into teenager-speak:
  • Yadda yadda yadda  (I got room for 18432 more)
    • blah blah blah (I got room for 40 more)
  • Yadda yadda yadda yadda yadda (I got room for 18418 more)
    • blah blah blah (I got room for 11 more)
  • Yadda yadda (I got room for 19235 more)
    • SHUT UP! (no more room - window is closed = Zero Window)
  • (1 minute later) Can I talk now? (I got room for 20183 more)
    • NO! SHUT UP! (no more room - window is closed = Zero Window)
  • (1 minute later) Can I talk now? (I got room for 65535 more  <- his application has been picking up!)
    • NO! SHUT UP! (no more room - window is closed = Zero Window)
    • (Some seconds later) Ok, you can talk now... (I got room for 23482 more <- application finally made a partial pickup = TCP Window Update)
  • Yadda yadda yadda...
Now, it isn't unusual to see Zero Window conditions that last for a second, or perhaps even 2 seconds; such things can be caused by traffic floods, CPU contention on the endpoint, and the like.  If, however, you see Zero Window conditions lasting longer than 1-2 seconds, or if you see Zero Window conditions affecting multiple connections at the same time, you have a problem on your hands.  Note that this is usually NOT a "network problem;" in most cases, the network is delivering data just fine.  The problem is that the application (or some intermediate actor, like a VPN client or security software) is not picking the data up from the TCP/IP stack, so the local endpoint(s)  put the other end(s) "on hold" until the matter is resolved.
 
I have seen Zero Window conditions that lasted for upwards of 10 minutes in production environments.  Note that this is NOT a connection killer; as long as the Zero Window Probes are answered, the connection will remain alive as far as the application(s) are concerned.  If you check the connection with netstat, it will show up as "ESTABLISHED," even though no application-layer data is flowing.  This happens because the TCP stack is agnostic when it comes to actual data flow; as long as its housekeeping packets (such as keepalives or TCP Zero Window Probes) are answered normally, it's happy to maintain the connection.  You won't "see" Zero Window conditions without capturing and analyzing the packet data.
Finding Zero Window conditions in network captures is not a difficult matter.  With the Wireshark network analyzer, you can use the generic "tcp.analysis.flags" display filter; if you have a very large capture file, you can be more specific with the "tcp.analysis.zero_window" display filter.  (You can also go to the menu and use Analyze->Expert Infos)  Once you locate packets flagged with "[TCP Zero Window]" (as shown in the screenshot above), you can right-click on the packet and select "Follow TCP Stream" to see that particular conversation in context.  Again, your Big Red Flags for problems above the network layer are Zero Window conditions lasting longer than 1-2 seconds and/or Zero Window conditions afffecting multiple connections at the same time. 


Good hunting!

Saturday, March 12, 2011

Aiport Delays/Cancellations - What's in Your Survival Kit?

I recently had one of those Flights From Hell(tm), in which a 45-minute layover became a 4-hour layover.  Of course, the canny business traveler is prepared for such things, even in these days of reduced baggage and carryons, which leads to the question - what's in YOUR travel survival kit?

After some discussion with friends on Twitter, we came up with the following list.  The idea is that this is stuff that fits in one's backpack or laptop bag, so that one doesn't have to carry yet another bag.  Our list:

  • Outlet splitter/USB charger (I like http://kmrt.us/dQkytf for this purpose) to share charging stations/outlets.
  • Playing cards
  • Pocket book of crosswords/sudoku (and, obviously, pens/pencils)
  • MP3 player/headphones/earphones (if you don't automatically carry that, as I do)
  • Peanuts/trailmix/granola bars, in case the delay extends past closing time for airport shops
  • Instant coffee/cocoa and/or vitamin drinks (such as Emergen-C)
  • Collapsible drinking cup (such as http://bit.ly/f24DFb for example)

I already had most of these, but I'll be adding Emergen-C (thanks to @billmalchisky for the tip).  After being caught by an unplanned overnight stay last night (sans baggage), I took to adding an airline "amenity kit" (the ones received in business/first class) as well, since they include basic toiletries such as a toothbrush/toothpaste, earplugs, comb, eyeshades (useful if one wishes to nap in the terminal), socks...already packed in a delightfully small pouch.  All told, my "survival kit" occupies but one pocket of my traveling pack; the security is well worth the space.

What's in YOUR survival pack?