Troubleshooting MAC-Flushes on NX-OS

An interesting client problem in one of our multi-tenant data centers came to my attention the other day. A delay sensitive client noticed a slight increase in latency (20 ms) at very intermittent intervals from his servers in our data center to specific off-net destinations. The increase in latency was localized to the pair of Nexus 7000’s functioning as the core switch layer (CSW) and the layer3 edge for this particular data center. Beyond that all appeared normal on the N7K CSWs.

A TCP dump from a normal trunk interface attached to the N7Ks, showed unicast traffic on the N7K-2 device when the N7K-1 device was setup to receive internet traffic inbound and forward it into the data center client VLANs.  The N7Ks are setup using the Cisco VPC (Virtual Port Channels).

Continue reading “Troubleshooting MAC-Flushes on NX-OS”

Advertisement

Detecting Layer2 Loops

We all too familiar with the devastating impact a talented layer 2 loop could have on a data center lacking sufficient controls and processes. If you are using Cisco Nexus switches in your data center, you would be happy to know that NX-OS offers an interesting new tool you should add to your loop detection list. The somewhat undocumented feature is known as (for the lack of a better name)  FWM-Loop Detection. FWM refers to the NX-OS Forwarding Manager. In Syslog it is seen as:

%FWM-2-STM_LOOP_DETECT

Continue reading “Detecting Layer2 Loops”

FEX Architectures

Here is an old post I never finished. With the benefits of the Nexus 2000 and the FEX architecture (a earlier post), scalability, simplified management, flexibility, Cisco extended its use further into the servers all the way up to the virtual hosts.This allows much greater control and flexibility. After all network guys should look after all aspects of networking, server guys should look after the servers and today virtual hosts.

A summary of the different FEX implementations:
Continue reading “FEX Architectures”

The Fabric ERA

“Fabric” is a loosely used term, which today creates more confusion instead of offering direction.

What exactly is a Fabric ? What is a Switch Fabric?

Greg Ferro did a post here explaining how Ethernet helped the layer 2 switch fabric evolve. Sadly the use of switch fabric did not stop there. And this is the part where the confusion trickles in.

The term fabric has been butchered (mostly by marketing people) to incorporate just about any function these days. The term ‘switch fabric’ today (in the networking industry) is broadly used to describe among others the following:

  • The structure of an ASIC, e.g., the cross bar silicon fabric.
  • The hardware forwarding architecture used within layer2 bridges or switches.
  • The hardware forwarding architecture used with routers, e.g., the Cisco CRS and its 3-stage Benes switch fabric.
  • Storage topologies like the fabric-A and fabric-B SAN architecture.
  • Holistic Ethernet technologies like TRILL, Fabric-Path, Short-Path Bridging, Q-Fabric, etc.
  • A port extender device that is marketed as a fabric extender (a.k.a. FEX) namely the Cisco Nexus 2000 series.

In short, a switch fabric is basically the interconnection of points with the purpose to transport data from one point to another. These points, as evolved with time, could represent anything from an ASIC, to a port, to a device, to an entire architecture.

Cisco added a whole new dimension to this by marketing a Port Extender device as a Fabric Extender and doing so with different FEX architectures namely VM-FEX and Adapter FEX…. More on that in the next post. :)

What is a Fabric Extender

In this post I would like to cover the base of what is needed to know about the Cisco Fabric Extender that ships today as the Nexus 2000 series hardware.

The Modular Switch

The concept is easy to understand referencing existing knowledge. Everybody is familiar with the distributed switch architecture commonly called a modular switch:

Consider the typical components:

  • Supervisor module/s are responsible for the control and management plane functions.
  • Linecards or I/O modules, offers physical port termination taking care of the forwarding plane.
  • Connections between the supervisors and linecards to transport frames e.g., fabric cards, or backplane
    circuitry.
  • Encapsulating mechanism to identify frames that travel between the different components.
  • Control protocol used to manage the linecards e.g., MTS on the catalyst 6500.

Most linecards nowadays have dedicated ASICs to make local hardware forwarding decisions, e.g., Catalyst 6500 DFCs (Distributed Forwarding Cards). Cisco took the concept of removing the linecards from the modular switch and boxing them with standalone enclosures. These linecards could then be installed in different locations connected back to the supervisors modules using standard Ethernet. These remote linecards are called Fabric Extenders (a.k.a. FEXs). Three really big benefits are gained by doing this.

  1. The reduction of the number of management devices in a given network segment since these remote linecards are still managed by the supervisor modules.
  2. The STP footprint is reduced since STP is unaware of the co-location in different cabinets.
  3. Another benefit is the cabling reduction to a distribution switches. I’ll cover this in a later post. Really awesome for migrations.

Lets take a deeper look at how this is done.

Continue reading “What is a Fabric Extender”

N5K Stuck in Boot Mode

Another trivial post. The upcoming posts following this one will take a more in-depth look at the Nexus technologies.

So you do an non-ISSU NX-OS upgrade on a Nexus 5000 switch and something goes wrong. After reload you get the following prompt:

...Loader Version pr-1.3
loader>

The switch did not successfully boot from the images it was suppose to. How to go about restoring it?

Continue reading “N5K Stuck in Boot Mode”

Load-Sharing across ASICs

Port-channels have become an acceptable solution in data centers to both mitigate STP footprints and extend physical interface limits.

One of the biggest drawbacks with port-channels is the single point of failure.

Scenario 1- Failure of an ASIC on one switch, which could potentially bring the port-channel down, if all member interfaces were connected on one ASIC.

Scenario 2- Failure of one switch on either side. The obvious solution available today is multi-chassis port-channels which addresses the problem 95%.

Consider the following topology:

Even with multi-chassis port-channel there is the still the possibility of an ASIC failure.  Although not as detrimental as Scenario-1, there will still be some impact (depending on the traffic load) if both interfaces on one switch happen to connect to the same ASIC.

Thus it only makes sense that the ports used on the same switch, uses different ASICs. How would confirm this on the Nexus 5000 and Nexus 7000?

Continue reading “Load-Sharing across ASICs”

Nexus load intervals

This is a interesting but a trivial post. Everybody know about the interface command “load-interval” that changes the time period over which the interface packet-rate and throughput statistics are averaged.

I discovered an addition to this command on the Nexus the other day while poking around. NX-OS allows multiple counter intervals to be configured on the same interface. This allows different sampled intervals to be listed at the same time.

The configuration is easy:

#interface Ethernet1/19
  load-interval counter 1 40
  load-interval counter 2 60
  load-interval counter 3 180

Continue reading “Nexus load intervals”

Cisco Nexus 7000 upgrade to 8Gb

When upgrading a Nexus 7000 to NX-OS version 5.2 (using more than 1 VDC) or to NX-OS v6+, Cisco claims the need to upgrade the system memory to 8Gb.

Note I have run on v5.2 using only 4Gb per SUP using 2 VDCs and it has worked just fine, but I should mention that the box was not under heavy load.

See how much memory your N7K has on a SUP by using the following command:

N7K# show system resources
Load average:   1 minute: 0.47   5 minutes: 0.24   15 minutes: 0.15
Processes   :   959 total, 1 running
CPU states  :   3.0% user,   3.5% kernel,   93.5% idle
Memory usage:   4115776K total,   2793428K used,   1322348K free

The upgrade per SUP would need the Cisco Bundle upgrade package (Product code: N7K-SUP1-8GBUPG=). One package has one 4Gb module. (see picture below) If you have two SUPs you would need two bundles. Notice the 8Gb sticker on module in the red block.

Continue reading “Cisco Nexus 7000 upgrade to 8Gb”

Cisco and their inconsistencies

Cisco is known for the inconsistencies between platforms and different IOS versions. I came across another that was rather annoying. Now between linecards.

Trying to configuring the following standard sub-interface Ethernet AToM tunnel on a Cisco 7606 with a ES+ linecard:

pseudowire-class CISCO
 encapsulation mpls
!
interface Te2/2.2
 encapsulation dot1Q 2
 no ip redirects
 no ip directed-broadcast
 no ip proxy-arp
 xconnect 10.5.0.99 12345 encap mpls pw-class CISCO

Yields the following misleading error…

7606(config)#int te2/2.2
7606(config-subif)# xconnect 10.5.0.99 12345 encap mpls pw-class CISCO
MPLS encap is not supported on this circuit

Continue reading “Cisco and their inconsistencies”

BGP between Cisco Nexus and Fortigate

It is not uncommon to find that different vendors have slightly different implementations when it comes to standards technologies that should work seamless.

I recently came across a BGP capability negotiation problem between a Nexus 7000 and a client Fortigate. Today’s post is not teaching about any new technologies, but instead showing the troubleshooting methodology I used to find the problem.

The setup is simple. A Nexus 7000 and a Fortigate connected via nexus layer2 hosting infrastructure, to peer with BGP.
At face value the eBGP session between Nexus 7000 and the Fortigate never came up:

N7K# sh ip bgp summary | i 10.5.0.20
Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.5.0.20   4 65123     190     190        0    0    0 0:12:30  Idle

The first steps should verify the obvious.

  •  Configuration! This check should included checking the ASNs, the peering IP addresses, source-interfaces and passwords matching.

Continue reading “BGP between Cisco Nexus and Fortigate”

Cisco Nexus User Roles using TacPlus

I previously wrote a post about the Nexus Roles and how they integrate with a TACACS server.

Cisco Documentation shows the following format to issue multiple roles from a TACACS/RADIUS server.:

shell:roles="network-admin vdc-admin"

We are using Shrubbery TACPLUS, instead of the Cisco ACS software. Last week I noticed that only one role was assigned when multiples should be assigned. Multiple roles are required when using one TACACS server to issue roles for VDC and non-VDC Nexus switches since they need different default User-Roles.

This was tested on a Nexus 5000, a Nexus 7000 and VDC on the same Nexus 7000. Different codes were tried. This was not a NX-OS bug.

Upon further investigation it was obvious, that the syntax above as provided by Cisco was specific their TACACS software, being the ACS software. But I still required multiple Roles to be assigned for my single TACACS configuration to work across multiple Nexus devices. First attempt was the lazy method. Ask uncle Google for any such encounters with a solution. That yielded no practical results. I then contacting Shrubbery for the solution, after that it became clear that possibly nobody else have experienced this problem before.

So the hunt began to find out exactly what was so different in the AAA response from the Cisco ACS software to the TACPLUS software that it did not yield the required results.

Continue reading “Cisco Nexus User Roles using TacPlus”

Low Memory Handling

Memory problems on routers is nothing new. It is generally less of a problem in current day, but is still seen from time to time.

BGP is capable of handling large amount of routes and in comparison to other routing protocols, BGP can be a big memory hog. BGP peering devices, especially full internet peering devices, require larger amounts of memory to store all the BGP routes. Thus it’s not uncommon to see a BGP router run out of memory when a certain route count limit is exceeded.

A router running out of memory, commonly called Low Memory, is always a bad thing. The result of low memory problems may vary from the router crashing, to routing processes being shut down or if you lucky enough erratic behavior causing route flaps and instability in your network. None which is desired.

Low memory can be caused by any of the following:

  •     Partial physical memory failure.
  •     Software memory bugs.
  •     Applications not releasing used memory chunks.
  •     Incorrect configuration.
  •     Insufficient memory allocation to a Nexus VDC.

Continue reading “Low Memory Handling”