Troubleshooting MAC-Flushes on NX-OS

An interesting client problem in one of our multi-tenant data centers came to my attention the other day. A delay sensitive client noticed a slight increase in latency (20 ms) at very intermittent intervals from his servers in our data center to specific off-net destinations. The increase in latency was localized to the pair of Nexus 7000’s functioning as the core switch layer (CSW) and the layer3 edge for this particular data center. Beyond that all appeared normal on the N7K CSWs.

A TCP dump from a normal trunk interface attached to the N7Ks, showed unicast traffic on the N7K-2 device when the N7K-1 device was setup to receive internet traffic inbound and forward it into the data center client VLANs.  The N7Ks are setup using the Cisco VPC (Virtual Port Channels).

Continue reading “Troubleshooting MAC-Flushes on NX-OS”

Advertisements

Detecting Layer2 Loops

We all too familiar with the devastating impact a talented layer 2 loop could have on a data center lacking sufficient controls and processes. If you are using Cisco Nexus switches in your data center, you would be happy to know that NX-OS offers an interesting new tool you should add to your loop detection list. The somewhat undocumented feature is known as (for the lack of a better name)  FWM-Loop Detection. FWM refers to the NX-OS Forwarding Manager. In Syslog it is seen as:

%FWM-2-STM_LOOP_DETECT

Continue reading “Detecting Layer2 Loops”

The Fabric ERA

“Fabric” is a loosely used term, which today creates more confusion instead of offering direction.

What exactly is a Fabric ? What is a Switch Fabric?

Greg Ferro did a post here explaining how Ethernet helped the layer 2 switch fabric evolve. Sadly the use of switch fabric did not stop there. And this is the part where the confusion trickles in.

The term fabric has been butchered (mostly by marketing people) to incorporate just about any function these days. The term ‘switch fabric’ today (in the networking industry) is broadly used to describe among others the following:

  • The structure of an ASIC, e.g., the cross bar silicon fabric.
  • The hardware forwarding architecture used within layer2 bridges or switches.
  • The hardware forwarding architecture used with routers, e.g., the Cisco CRS and its 3-stage Benes switch fabric.
  • Storage topologies like the fabric-A and fabric-B SAN architecture.
  • Holistic Ethernet technologies like TRILL, Fabric-Path, Short-Path Bridging, Q-Fabric, etc.
  • A port extender device that is marketed as a fabric extender (a.k.a. FEX) namely the Cisco Nexus 2000 series.

In short, a switch fabric is basically the interconnection of points with the purpose to transport data from one point to another. These points, as evolved with time, could represent anything from an ASIC, to a port, to a device, to an entire architecture.

Cisco added a whole new dimension to this by marketing a Port Extender device as a Fabric Extender and doing so with different FEX architectures namely VM-FEX and Adapter FEX…. More on that in the next post. :)

BGP between Cisco Nexus and Fortigate

It is not uncommon to find that different vendors have slightly different implementations when it comes to standards technologies that should work seamless.

I recently came across a BGP capability negotiation problem between a Nexus 7000 and a client Fortigate. Today’s post is not teaching about any new technologies, but instead showing the troubleshooting methodology I used to find the problem.

The setup is simple. A Nexus 7000 and a Fortigate connected via nexus layer2 hosting infrastructure, to peer with BGP.
At face value the eBGP session between Nexus 7000 and the Fortigate never came up:

N7K# sh ip bgp summary | i 10.5.0.20
Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.5.0.20   4 65123     190     190        0    0    0 0:12:30  Idle

The first steps should verify the obvious.

  •  Configuration! This check should included checking the ASNs, the peering IP addresses, source-interfaces and passwords matching.

Continue reading “BGP between Cisco Nexus and Fortigate”

Cisco Nexus User Roles using TacPlus

I previously wrote a post about the Nexus Roles and how they integrate with a TACACS server.

Cisco Documentation shows the following format to issue multiple roles from a TACACS/RADIUS server.:

shell:roles="network-admin vdc-admin"

We are using Shrubbery TACPLUS, instead of the Cisco ACS software. Last week I noticed that only one role was assigned when multiples should be assigned. Multiple roles are required when using one TACACS server to issue roles for VDC and non-VDC Nexus switches since they need different default User-Roles.

This was tested on a Nexus 5000, a Nexus 7000 and VDC on the same Nexus 7000. Different codes were tried. This was not a NX-OS bug.

Upon further investigation it was obvious, that the syntax above as provided by Cisco was specific their TACACS software, being the ACS software. But I still required multiple Roles to be assigned for my single TACACS configuration to work across multiple Nexus devices. First attempt was the lazy method. Ask uncle Google for any such encounters with a solution. That yielded no practical results. I then contacting Shrubbery for the solution, after that it became clear that possibly nobody else have experienced this problem before.

So the hunt began to find out exactly what was so different in the AAA response from the Cisco ACS software to the TACPLUS software that it did not yield the required results.

Continue reading “Cisco Nexus User Roles using TacPlus”

Low Memory Handling

Memory problems on routers is nothing new. It is generally less of a problem in current day, but is still seen from time to time.

BGP is capable of handling large amount of routes and in comparison to other routing protocols, BGP can be a big memory hog. BGP peering devices, especially full internet peering devices, require larger amounts of memory to store all the BGP routes. Thus it’s not uncommon to see a BGP router run out of memory when a certain route count limit is exceeded.

A router running out of memory, commonly called Low Memory, is always a bad thing. The result of low memory problems may vary from the router crashing, to routing processes being shut down or if you lucky enough erratic behavior causing route flaps and instability in your network. None which is desired.

Low memory can be caused by any of the following:

  •     Partial physical memory failure.
  •     Software memory bugs.
  •     Applications not releasing used memory chunks.
  •     Incorrect configuration.
  •     Insufficient memory allocation to a Nexus VDC.

Continue reading “Low Memory Handling”