Troubleshooting MAC-Flushes on NX-OS

An interesting client problem in one of our multi-tenant data centers came to my attention the other day. A delay sensitive client noticed a slight increase in latency (20 ms) at very intermittent intervals from his servers in our data center to specific off-net destinations. The increase in latency was localized to the pair of Nexus 7000’s functioning as the core switch layer (CSW) and the layer3 edge for this particular data center. Beyond that all appeared normal on the N7K CSWs.

A TCP dump from a normal trunk interface attached to the N7Ks, showed unicast traffic on the N7K-2 device when the N7K-1 device was setup to receive internet traffic inbound and forward it into the data center client VLANs.  The N7Ks are setup using the Cisco VPC (Virtual Port Channels).

Upon investigating what appeared to be legitimate unicast traffic, the IP ARP tables showed the relevant destination MAC addresses, with the timers not indicating any recent problems. The host MAC addresses for these ARP entries however were absent in the CAM table. After forcing a refresh of both tables it was obvious that there was a problem with the MAC address entries, not refreshing as they should.

n7k-2#clear ip arp vlan 600
n7k-2#clear mac address-table dynamic vlan 600
n7k-2# show ip arp vlan 600 | be ARP
IP ARP Table
Total number of entries: 138
Address         Age       MAC Address     Interface
xx.xx.xx.4    00:00:07  0025.9003.855e  Vlan600
xx.xx.xx.5    00:00:07  0025.9003.859a  Vlan600
xx.xx.xx.6    00:00:07  0025.9003.252c  Vlan600
xx.xx.xx.7    00:00:07  0025.9003.2548  Vlan600
xx.xx.xx.8    00:00:07  0025.9003.855e  Vlan600
xx.xx.xx.9    00:00:07  0025.9003.859a  Vlan600
--snip--
n7k-2# show mac add vlan 600 | b VLAN
   VLAN     MAC Address      Type      age     Secure NTFY Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------------
G 600      0026.980c.dbc2    static       -       F    F  sup-eth1(R)
n7k-2#

By this stage I had my suspicions about the problem but not yet the exact cause.

NX-OS has a range of very useful (yet poorly documented) internal system commands that offer a great deal more information than the usual show commands. Inspecting the L2FM (Layer2 Feature Manager) state for a given MAC address could verify my suspicions.

The command below showed a brief historical event log of the Layer 2 MAC Database.

n7k-2# show system internal l2fm l2dbg macdb address 0025.9003.2548 vlan 600
Legend
------
Db:  0-MACDB, 1-GWMACDB, 2-SMACDB, 3-RMDB,    4-SECMACDB
Src: 0-UNKNOWN, 1-L2FM, 2-PEER, 3-LC, 4-HSRP
     5-GLBP, 6-VRRP, 7-STP, 8-DOTX, 9-PSEC 10-CLI 11-PVLAN
     12-ETHPM, 13-ALW_LRN, 14-Non_PI_MOD, 15-MCT_DOWN, 16 - SDB
     17-OTV
Slot:0 based for LCS 19-MCEC 20-OTV/ORIB

 VLAN: 600 MAC: 0025.9003.2548
    Time                     If/swid    Db           Op           Src Slot
    Wed Jan  9 11:56:54 2013 0x160002ef 0  INSERT 3   19   0
    Wed Jan  9 11:56:54 2013 0x160002ef 0  INSERT 2   0    15
    Wed Jan  9 11:56:56 2013 0x160002ef 0  FLUSH 0   0    15
    Wed Jan  9 11:56:56 2013 0x160002ef 0  DELETE 0   0    15
    Wed Jan  9 11:56:56 2013 0x160002ef 0  INSERT 3   19   0
    Wed Jan  9 11:56:56 2013 0x160002ef 0  INSERT 2   0    15
    Wed Jan  9 11:56:56 2013 0x160002ef 0  FLUSH 2   0    15
    Wed Jan  9 11:56:56 2013 0x160002ef 0  DELETE 0   0    15
    Wed Jan  9 11:56:56 2013 0x160002ef 0  INSERT 3   19   0
    Wed Jan  9 11:56:56 2013 0x160002ef 0  INSERT 2   0    15

The output above indicated why the MAC addresses were not seen in the CAM table. They were continually flushed.

This explained the rogue unicast traffic. What happens to unicast traffic with valid IP ARP entries, when no useable MAC addresses are available for forwarding? They are flooded using a mechanism known as unknown unicast flooding.

With the problem described originally, the MAC flushes also explained the latency spikes, as one of the questionable VLAN’s belonged to a content provider, carrying large amounts of traffic. Every time the CDN hit a specific volume of traffic the unicast flooding increased the queue depths on certain N7Ks trunk links to customers. This, due to the large volumes of traffic, was enough to increase the latency for some customers.

Then to isolate the cause of the flushing MAC entries either the following system internal command:

show spanning-tree internal event-history all brief

Or the normal “show spanning-tree detail” command could be used. This showed the cause of the MAC flushes:

n7k-2# show spann detail | inc exec|from|occurr
 MST0000 is executing the mstp compatible Spanning Tree protocol
  Number of topology changes 18167 last change occurred 0:00:06 ago
          from port-channel753
 MST0001 is executing the mstp compatible Spanning Tree protocol
  Number of topology changes 17726 last change occurred 0:00:06 ago
          from port-channel753
 MST0002 is executing the mstp compatible Spanning Tree protocol
  Number of topology changes 18390 last change occurred 0:01:04 ago
          from port-channel750
--snip--

When an MST switches receive TCN (Topology Change Notification), the associate MAC addresses in the CAM table are flushed. This is done to allow quicker convergence than the traditional STP implementation, but on the flip side, continual TCNs, have negative effect as seen here. In this case the TCNs were generated due to an incorrectly configured switch.

– – – –
For more information about how STP and MST operates be sure to go through
the switching chapter in the Routing-Bits RS Handbook.

Advertisements

2 thoughts on “Troubleshooting MAC-Flushes on NX-OS

  1. Love your blog!
    Very interesting read. Do you have a blogpost with these ‘commands’ revealed as they are poorly documented ?
    Thanks!

Please leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s