One recent case was presented as "many failed connections," and a 6-minute packet capture soon landed in my lap. Now, every Wireshark user has their own approach; I usually take advantage of Wireshark's display filters to get a general "feel" for the incidence of Layer 3/4 problems. With a typical capture file, I'll start with tcp.analysis.flags,which simply tells Wireshark, "hey, show me what YOU think are TCP problems." Now, as I said, none of these tools are perfect, so take these results with a grain of salt; they're only as good as are the underlying data, and it's very easy to collect inaccurate or incomplete data. After taking a look at the results of this display filter, I noticed what seemed an high number of TCP retransmissions, so I decided to see exactly which packets were being retransmitted with a different display filter, tcp.analysis.retransmission, which will show me only those packets Wireshark believes to be TCP retransmissions. The resulting numbers were somewhat high, but I've seen worse. Now, the complaint was very specific that new connections were failing; no mention was made of existing connections being interrupted/terminated; so, I went to Wireshark's Statistics->Conversations dialog and sorted on the "Packets" column to look for very short conversations and found HUNDREDS of conversations that only lasted for a few packets, like these:
So, the remote endpoint starts a conversation with a SYN packet and the local endpoint responds immediately, but we see the remote endpoint retransmitting its SYN packet within 10ms. The local endpoint retransmits its SYN/ACK, but neither the original nor the retransmitted SYN/ACK seem to reach the remote endpoint, and the conversation attempt is ultimately terminated with a TCP reset (RST) packet. Back I go to Wireshark's display, this time to ask about a very specific type of TCP retransmission:
tcp.analysis.retransmission && tcp.flags.syn==1 && !tcp.flags.ack==1With this display filter, I'm asking Wireshark to show me all retransmitted SYN packets; the "!tcp.flags.ack==1" eliminates SYN/ACK packets from the display. The results were startling; within a 6-minute period, more than 110 endpoints had retransmitted more than 170 SYN packets...and all of them had failed to complete the TCP handshake.
Well, if conditions are this bad to START conversations, then there must be thousands of cases in which existing connections die before completing successfully, right? Let's go back to Wireshark's Statistics->Conversations dialog and sort on Duration to look at long-lived conversations:
If I were looking at a general network congestion issue on the local network, I'd expect conversations to suffer equally--packets are packets, right?--but this is something different. That seeming conflict in the data prompted what proved to be the key question:
If I'm seeing HUNDREDS of new conversations fail the TCP handshake due to excessive retransmissions, why DON'T I see established conversations suffering excessive retransmissions as well?Well, after few moments' thought, it occurred to me that the only network devices that usually make specific distinctions between new and existing connections are those involved in network security. A brief conversation with the customer revealed that an intrusion protection system (IPS) was in place and "inspecting" conversations. When we conducted a test that bypassed the IPS, the incidence of failed TCP handshakes decreased by roughly 98%; our troubleshooting attention is now properly directed.
So, the moral of this story: Pay attention to the data, but pay equal attention to what isn't there.