Tuesday, January 31, 2012

A Windows Metric That Most Folks Ignore - NIC Output Queue Length

There's a Windows system statistic that can be an ironclad indicator of resource contention...but most folks never seem to consider it.

Each network interface card (NIC) in your system has an associated output queue; the TCP/IP stack (WInsock) drops packets in this queue, from which the NIC processes them and puts them out on the wire.  Obviously, this number should always be very low; Microsoft states that any value higher than 2 indicates a NIC/network bottleneck.  In recent customer situation, I've seen queue lengths as a high as 8-10; a colleague claims to have seen queue lengths as high as 25!

The first steps in alleviating high NIC output queue lengths is (duh) to ensure that network connectivity (e.g. the local switch/router) is not suffering from congestion.  The second step is to ensure that one has installed the 'latest and greatest' NIC drivers.

Here's where things get interesting, from a system troubleshooting point of view; the LOOPBACK adapter also has an output queue.  This is critical, for two reasons:

  1. Loopback adapters exist only as a construct in RAM, so memory contention can affect loopback adpaters disproportionately, and
  2. Many applications, particuarly server applications, use loopback connections for interprocess communication.

Now, there's no NIC for the loopback, so high output queue lengths on this adapter are indicative of memory contention.  I've seen cases in which high loopback output queues led to "connectivity problems" in the system/application logs, but the root cause was determined to be extremely high paging/swapping and overall memory contention among processes.  Troubleshooting problems here will require examination of overall system memory utilization, paging/swapping, et cetera.

We aren't done, however; if virtualization is involved, our troubleshooting will be pointing us in a different direction.  When running in a virtual machine, what the application "sees" as its NIC is a virtual adapter; all Windows virtualization systems handle these 'pseudo-interfaces' in RAM.  So, we obviously can't go update NIC drivers; instead, high NIC output queue lengths in VM environments usually indicate one of two things: either the physical NIC of the virtual host is so overloaded that it isn't picking up the packets from the various "virtual adapters" of the individual virtual machines, or (since they're all handled via RAM constructs) the virtual host is suffering from memory contention.  Keep in mind that you may not see high CPU/memory utilization on the virtual machine running Windows, since it only 'sees' what the virtual host tells it to see; you need to look at CPU/memory statistics on the virtual host itself!

If you're running Windows systems, take a look at Output Queue Length in Perfmon.

No comments: