Troubleshooting MAC-Flushes on NX-OS

An interesting client problem in one of our multi-tenant data centers came to my attention the other day. A delay sensitive client noticed a slight increase in latency (20 ms) at very intermittent intervals from his servers in our data center to specific off-net destinations. The increase in latency was localized to the pair of Nexus 7000’s functioning as the core switch layer (CSW) and the layer3 edge for this particular data center. Beyond that all appeared normal on the N7K CSWs.

A TCP dump from a normal trunk interface attached to the N7Ks, showed unicast traffic on the N7K-2 device when the N7K-1 device was setup to receive internet traffic inbound and forward it into the data center client VLANs.  The N7Ks are setup using the Cisco VPC (Virtual Port Channels).

Continue reading “Troubleshooting MAC-Flushes on NX-OS”

Advertisement

Detecting Layer2 Loops

We all too familiar with the devastating impact a talented layer 2 loop could have on a data center lacking sufficient controls and processes. If you are using Cisco Nexus switches in your data center, you would be happy to know that NX-OS offers an interesting new tool you should add to your loop detection list. The somewhat undocumented feature is known as (for the lack of a better name)  FWM-Loop Detection. FWM refers to the NX-OS Forwarding Manager. In Syslog it is seen as:

%FWM-2-STM_LOOP_DETECT

Continue reading “Detecting Layer2 Loops”

N5K Stuck in Boot Mode

Another trivial post. The upcoming posts following this one will take a more in-depth look at the Nexus technologies.

So you do an non-ISSU NX-OS upgrade on a Nexus 5000 switch and something goes wrong. After reload you get the following prompt:

...Loader Version pr-1.3
loader>

The switch did not successfully boot from the images it was suppose to. How to go about restoring it?

Continue reading “N5K Stuck in Boot Mode”

Nexus load intervals

This is a interesting but a trivial post. Everybody know about the interface command “load-interval” that changes the time period over which the interface packet-rate and throughput statistics are averaged.

I discovered an addition to this command on the Nexus the other day while poking around. NX-OS allows multiple counter intervals to be configured on the same interface. This allows different sampled intervals to be listed at the same time.

The configuration is easy:

#interface Ethernet1/19
  load-interval counter 1 40
  load-interval counter 2 60
  load-interval counter 3 180

Continue reading “Nexus load intervals”

Cisco Nexus 7000 upgrade to 8Gb

When upgrading a Nexus 7000 to NX-OS version 5.2 (using more than 1 VDC) or to NX-OS v6+, Cisco claims the need to upgrade the system memory to 8Gb.

Note I have run on v5.2 using only 4Gb per SUP using 2 VDCs and it has worked just fine, but I should mention that the box was not under heavy load.

See how much memory your N7K has on a SUP by using the following command:

N7K# show system resources
Load average:   1 minute: 0.47   5 minutes: 0.24   15 minutes: 0.15
Processes   :   959 total, 1 running
CPU states  :   3.0% user,   3.5% kernel,   93.5% idle
Memory usage:   4115776K total,   2793428K used,   1322348K free

The upgrade per SUP would need the Cisco Bundle upgrade package (Product code: N7K-SUP1-8GBUPG=). One package has one 4Gb module. (see picture below) If you have two SUPs you would need two bundles. Notice the 8Gb sticker on module in the red block.

Continue reading “Cisco Nexus 7000 upgrade to 8Gb”

Cisco Nexus User Roles using TacPlus

I previously wrote a post about the Nexus Roles and how they integrate with a TACACS server.

Cisco Documentation shows the following format to issue multiple roles from a TACACS/RADIUS server.:

shell:roles="network-admin vdc-admin"

We are using Shrubbery TACPLUS, instead of the Cisco ACS software. Last week I noticed that only one role was assigned when multiples should be assigned. Multiple roles are required when using one TACACS server to issue roles for VDC and non-VDC Nexus switches since they need different default User-Roles.

This was tested on a Nexus 5000, a Nexus 7000 and VDC on the same Nexus 7000. Different codes were tried. This was not a NX-OS bug.

Upon further investigation it was obvious, that the syntax above as provided by Cisco was specific their TACACS software, being the ACS software. But I still required multiple Roles to be assigned for my single TACACS configuration to work across multiple Nexus devices. First attempt was the lazy method. Ask uncle Google for any such encounters with a solution. That yielded no practical results. I then contacting Shrubbery for the solution, after that it became clear that possibly nobody else have experienced this problem before.

So the hunt began to find out exactly what was so different in the AAA response from the Cisco ACS software to the TACPLUS software that it did not yield the required results.

Continue reading “Cisco Nexus User Roles using TacPlus”

Low Memory Handling

Memory problems on routers is nothing new. It is generally less of a problem in current day, but is still seen from time to time.

BGP is capable of handling large amount of routes and in comparison to other routing protocols, BGP can be a big memory hog. BGP peering devices, especially full internet peering devices, require larger amounts of memory to store all the BGP routes. Thus it’s not uncommon to see a BGP router run out of memory when a certain route count limit is exceeded.

A router running out of memory, commonly called Low Memory, is always a bad thing. The result of low memory problems may vary from the router crashing, to routing processes being shut down or if you lucky enough erratic behavior causing route flaps and instability in your network. None which is desired.

Low memory can be caused by any of the following:

  •     Partial physical memory failure.
  •     Software memory bugs.
  •     Applications not releasing used memory chunks.
  •     Incorrect configuration.
  •     Insufficient memory allocation to a Nexus VDC.

Continue reading “Low Memory Handling”

Nexus’ improved CLI

The Cisco Nexus Series platform has some good things going. Having spent much of my time recently using them, I have come to appreciate some very neat improvements NX-OS is offering over standard IOS. For the most part driving NX-OS is very similar to IOS, but it’s been greatly improved.

One such example is the output from the most used IOS command “show ip int brief”, which on NX-OS only shows ‘IP’ (being layer 3) interfaces. To see the brief state of all types of interfaces use “sh int brief” instead.

N5K-2(config)# sh ip int brief
IP Interface Status for VRF "default"(1)
Interface            IP Address      Interface Status
Vlan19               10.1.19.6       protocol-up/link-up/admin-up
Vlan22               10.1.22.6       protocol-up/link-up/admin-up

N5K-2(config)# sh int brief
--------------------------------------------------------------------------------
Ethernet      VLAN   Type Mode   Status  Reason                   Speed     Port
Interface                                                                   Ch #
--------------------------------------------------------------------------------
Eth1/1        1      eth  trunk  up      none                       1000(D) 51
Eth1/2        22     eth  access up      none                        10G(D) -
Eth1/3        1      eth  trunk  down    SFP not inserted            10G(D) 50
Eth1/4        1      eth  trunk  down    SFP not inserted            10G(D) 50
Eth1/5        1      eth  trunk  down    SFP not inserted            10G(D) -
Eth1/6        19     eth  access down    SFP not inserted            10G(D) -
Eth1/7        1      eth  trunk  down    Link not connected          10G(D) 5
Eth1/8        1      eth  trunk  down    Link not connected          10G(D) 5
Eth1/9        1      eth  fabric down    Administratively down       10G(D) 9
Eth1/10       1      eth  fabric down    FEX identity mismatch       10G(D) 7
Eth1/11       1      eth  fabric down    vpc peerlink is down        10G(D) 34
Eth1/12       1      eth  fabric down    SFP not inserted            10G(D) 12
Eth1/13       1      eth  fabric up      none                        10G(D) 15
Eth1/14       1      eth  fabric down    Administratively down       10G(D) 9

Continue reading “Nexus’ improved CLI”

Nexus defaults to PAP authentication

Ever configured a Nexus switch to use AAA to query a Tacacs+ server? Had some troubles applying standard IOS config to NX-OS?

Possibly if your Tacacs+ server is configured to only allow PAM (Password Authentication Manager) authentication for the users. See when a NX-OS switch sends a AAA authentication packet, by default it is encapsulated using PAP encoding. This is in contrast to normal IOS devices, that use PAM encoding by default.

To illustrate I used the following config:

ip tacacs source-interface mgmt0
tacacs-server host 10.5.0.82 key password
!
aaa group server tacacs+ TAC
server 10.5.0.82
use-vrf management
source-interface mgmt0
!
aaa authentication login default group TAC
aaa authorization config-commands default group TAC
aaa authorization commands default group TAC
aaa accounting default group TAC

Continue reading “Nexus defaults to PAP authentication”

Troubleshooting random Nexus reboots

November last year, a pair of Cisco Nexus 5010 switches, suddenly started rebooting randomly without user intervention.  Since these boxes were a front to a VM environment, stability were of urgent concern. But in order to stabilize the environment, the root cause of the reboots had to be isolated, and quickly.

The Cisco Nexus platform might not be as mature as many would like, but it is quickly becoming a very needed switch in Next-Generation datacenters. Of the things I like most about the Nexus boxes are the readily available local reporting and intuitive system checks.  Obviously there are many other features which is making the platform so popular. I’ll cover some of these in time.

Coming back to the rebooting issue. Unlike IOS devices that looses all local logging info, unless a crash dump was saved to NVRAM, the Nexus writes most of its log information to disk. Thus even after the reboot, you have all the information.
Continue reading “Troubleshooting random Nexus reboots”