Detecting Layer2 Loops

We all too familiar with the devastating impact a talented layer 2 loop could have on a data center lacking sufficient controls and processes. If you are using Cisco Nexus switches in your data center, you would be happy to know that NX-OS offers an interesting new tool you should add to your loop detection list. The somewhat undocumented feature is known as (for the lack of a better name)  FWM-Loop Detection. FWM refers to the NX-OS Forwarding Manager. In Syslog it is seen as:


Continue reading “Detecting Layer2 Loops”


The Fabric ERA

“Fabric” is a loosely used term, which today creates more confusion instead of offering direction.

What exactly is a Fabric ? What is a Switch Fabric?

Greg Ferro did a post here explaining how Ethernet helped the layer 2 switch fabric evolve. Sadly the use of switch fabric did not stop there. And this is the part where the confusion trickles in.

The term fabric has been butchered (mostly by marketing people) to incorporate just about any function these days. The term ‘switch fabric’ today (in the networking industry) is broadly used to describe among others the following:

  • The structure of an ASIC, e.g., the cross bar silicon fabric.
  • The hardware forwarding architecture used within layer2 bridges or switches.
  • The hardware forwarding architecture used with routers, e.g., the Cisco CRS and its 3-stage Benes switch fabric.
  • Storage topologies like the fabric-A and fabric-B SAN architecture.
  • Holistic Ethernet technologies like TRILL, Fabric-Path, Short-Path Bridging, Q-Fabric, etc.
  • A port extender device that is marketed as a fabric extender (a.k.a. FEX) namely the Cisco Nexus 2000 series.

In short, a switch fabric is basically the interconnection of points with the purpose to transport data from one point to another. These points, as evolved with time, could represent anything from an ASIC, to a port, to a device, to an entire architecture.

Cisco added a whole new dimension to this by marketing a Port Extender device as a Fabric Extender and doing so with different FEX architectures namely VM-FEX and Adapter FEX…. More on that in the next post. :)

What is a Fabric Extender

In this post I would like to cover the base of what is needed to know about the Cisco Fabric Extender that ships today as the Nexus 2000 series hardware.

The Modular Switch

The concept is easy to understand referencing existing knowledge. Everybody is familiar with the distributed switch architecture commonly called a modular switch:

Consider the typical components:

  • Supervisor module/s are responsible for the control and management plane functions.
  • Linecards or I/O modules, offers physical port termination taking care of the forwarding plane.
  • Connections between the supervisors and linecards to transport frames e.g., fabric cards, or backplane
  • Encapsulating mechanism to identify frames that travel between the different components.
  • Control protocol used to manage the linecards e.g., MTS on the catalyst 6500.

Most linecards nowadays have dedicated ASICs to make local hardware forwarding decisions, e.g., Catalyst 6500 DFCs (Distributed Forwarding Cards). Cisco took the concept of removing the linecards from the modular switch and boxing them with standalone enclosures. These linecards could then be installed in different locations connected back to the supervisors modules using standard Ethernet. These remote linecards are called Fabric Extenders (a.k.a. FEXs). Three really big benefits are gained by doing this.

  1. The reduction of the number of management devices in a given network segment since these remote linecards are still managed by the supervisor modules.
  2. The STP footprint is reduced since STP is unaware of the co-location in different cabinets.
  3. Another benefit is the cabling reduction to a distribution switches. I’ll cover this in a later post. Really awesome for migrations.

Lets take a deeper look at how this is done.

Continue reading “What is a Fabric Extender”

N5K Stuck in Boot Mode

Another trivial post. The upcoming posts following this one will take a more in-depth look at the Nexus technologies.

So you do an non-ISSU NX-OS upgrade on a Nexus 5000 switch and something goes wrong. After reload you get the following prompt:

...Loader Version pr-1.3

The switch did not successfully boot from the images it was suppose to. How to go about restoring it?

Continue reading “N5K Stuck in Boot Mode”

Load-Sharing across ASICs

Port-channels have become an acceptable solution in data centers to both mitigate STP footprints and extend physical interface limits.

One of the biggest drawbacks with port-channels is the single point of failure.

Scenario 1- Failure of an ASIC on one switch, which could potentially bring the port-channel down, if all member interfaces were connected on one ASIC.

Scenario 2- Failure of one switch on either side. The obvious solution available today is multi-chassis port-channels which addresses the problem 95%.

Consider the following topology:

Even with multi-chassis port-channel there is the still the possibility of an ASIC failure.  Although not as detrimental as Scenario-1, there will still be some impact (depending on the traffic load) if both interfaces on one switch happen to connect to the same ASIC.

Thus it only makes sense that the ports used on the same switch, uses different ASICs. How would confirm this on the Nexus 5000 and Nexus 7000?

Continue reading “Load-Sharing across ASICs”

Smart Port-Channels

Consider the following output.

How is this possible, when no AAA or Privilege Profiles are configured? Have a look at the interface configuration:

Is this a bug/feature/annoyance. Depending on the platform, this is a feature. This test-interface is part of a port-channel. This is a common operational mistake. How many times has it happened in one of your data centers, where an engineer accidentally made a change to an interface which was a member of a port-channel, only to bring the port-channel and possibly any customer data that traversed the link down?

Continue reading “Smart Port-Channels”

Cisco OTV (Part III)

This is the final follow-on post from OTV (Part I) and OTV (Part II).

In this final post I will go through the configuration steps, some outputs and FHRP isolation.

OTV Lab Setup

I setup a mini lab using two Nexus 7000 switches, each with the four VDCs, two Nexus 5000 switches and a 3750 catalyst switch.
I emulated two data center sites, each with two core switches for typical layer3 breakout, each with two switches dedicated for OTV and each with one access switch to test connectivity. Site1 includes switches 11-14 (four VDCs on N7K-1) and switch 15 (N5K), whereas Site2 includes switches 21-24 (four VDCs on N7K-2) and switch 32 (3750).

To focus on OTV, I removed the complexity from the transport network by using OTV on dedicated VDCs (four of them for redundancy), connected as inline OTV appliances and by connecting the OTV Join interfaces on a single multi-access network.

This is the topology:

Before configuring OTV, the decision must be made how OTV will be integrated part of the data center design.

Recall the OTV/SVI co-existing limitation. If core switches are in place, which are not the Nexus 7000 switches, OTV may be implemented natively on the new Nexus 7000 switch/es or using a VDCs. If the Nexus 7000 switches are providing the core switch functionality, then separate VDCs are required for OTV.

Continue reading “Cisco OTV (Part III)”

Cisco OTV (Part II)

This is a follow on post from OTV (Part I).

STP Separation

Edge Devices do take part in STP by sending and receiving BPDUs on their internal interface as would any other layer2 switch.

But an OTV Edge Device will not originate or forward BPDUs on the overlay network. OTV thus limits the STP domain to the boundaries of each site. This means a STP problem in the control plane of a given site would not produce any effect on the remote data centers. This is one of the biggest benefits of OTV in comparison to other DCI technologies. This is made possible because MAC reachability information is advertised and learned via the control plane protocol instead of learned using typical MAC flooding behavior.

With the STP separation between sites, the ability for different sites to use different STP technologies is made possible with OTV. I.e., one site can run MSTP while another runs RSTP. In the real world this is a nifty enhancement.



OTV allows multiple Edge Devices to co-exist in the same site for load-sharing purposes. (With NX-OS 5.1 that is limited to 2 OTV Edge Devices per site.)

With multiple OTV Edge Devices per site and no STP across the overlay to shut down redundant links, the possibility of an end-to-end site loops are created. The absence of STP between sites holds valuable benefits, but a loop prevention mechanism is still required, so an alternative method was used. The boys who wrote OTV, decided on electing a master device responsible for traffic forwarding (similar to some non-STP protocols).

With OTV this master elected device is called an AED (Authoritative Edge Device).

An AED is an Edge Device that is responsible for forwarding the extended VLAN frames in and out of a site, from and to the overlay network. It is a very important to understand this before carrying on. Only the AED will forward traffic out of the site onto the overlay. With optimal traffic replication in a transport network, a site’s broadcast and multicast traffic will reach every Edge Device in the remote site. Only the AED in the remote site will forward traffic from the overlay into the remote site. The AED thus ensures that traffic crossing the site-overlay boundary does not get duplicated or create loops when a site is multi-homed.

Continue reading “Cisco OTV (Part II)”

Cisco OTV (Part I)

OTV(Overlay Transport Virtualization) is a technology that provide layer2 extension capabilities between different data centers. In its most simplest form OTV is a new DCI (Data Center Interconnect) technology that routes MAC-based information by encapsulating traffic in normal IP packets for transit.

Cisco has submitted the IETF draft but it is not finalized yet. draft-hasmit-otv-01

OTV Overview

Traditional L2VPN technologies, like EoMPLS and VPLS, rely heavily on tunnels. Rather than creating stateful tunnels, OTV encapsulates layer2 traffic with an IP header and does not create any fixed tunnels.

OTV only requires IP connectivity between remote data center sites, which allows for the transport infrastructures to be layer2 based, layer3 based, or even label switched. IP connectivity as the base requirement along some additional connectivity requirements that will be covered in this post.

OTV requires no changes to existing data centers to work, but it is currently only supported on the Nexus 7000 series switches with M1-Series linecards.

A big enhancement OTV brings to the DCI realm, is its control plane functionality of advertising MAC reachability information instead of relying on the traditional data plane learning of MAC flooding. OTV refers to this concept as MAC routing, aka, MAC-in-IP routinig. The MAC-in-IP routing is done by encapsulating an ethernet frame in an IP packet before forwarded across the transport IP network. The action of encapsulating the traffic between the OTV devices, creates what is called an overlay between the data center sites. Think of an overlay as a logical multipoint bridged network between the sites.

OTV is deployed on devices at the edge of the data center sites, called OTV Edge Devices. These Edge Devices perform typical layer-2 learning and forwarding functions on their site facing interfaces (the Internal Interfaces) and perform IP-based virtualization functions on their core facing interface (the Join Interface) for traffic that is destined via the logical bridge interface between DC sites (the Overlay Interface).

Each Edge Device must have an IP address which is significant in the core/provider network for reachability, but is not required to have any IGP relationship with the core. This allows OTV to be inserted into any type of network in a much simpler fashion.

Lets look at some OTV terminology.


OTV Terminology

Continue reading “Cisco OTV (Part I)”

Cisco 6500 Cosmetic bugs

Ever had this error before on a Cisco 6500 catalyst?

6500#  sh module
Mod Ports Card Type                              Model              Serial No.
--- ----- -------------------------------------- ------------------ -----------
  1    5  Supervisor Engine 720 10GE (Active)    VS-S720-10G        SAL-------
  2   48  48-port 10/100/1000 RJ45 EtherModule   WS-X6148A-GE-TX    SAL---------
  3   48  CEF720 48 port 1000mb SFP              WS-X6748-SFP       SAL----------

Mod MAC addresses                       Hw    Fw           Sw           Status
--- ---------------------------------- ------ ------------ ------------ -------
  1  001d.45e1.ed48 to 001d.45e1.ed4f   2.0   8.5(2)       12.2(33)SXH1 Ok
  2  001f.9ec6.7d70 to 001f.9ec6.7d9f   1.6   8.4(1)       8.7(0.22)BUB Ok
  3  001b.d4ec.ab60 to 001b.d4ec.ab8f   1.12  12.2(14r)S5  12.2(33)SXH1 Ok

Mod  Sub-Module                  Model              Serial       Hw     Status
---- --------------------------- ------------------ ----------- ------- -------
  1  Policy Feature Card 3       VS-F6K-PFC3C       SAL----------  1.0    Ok
  1  MSFC3 Daughterboard         VS-F6K-MSFC3       SAL----------  1.0    Ok
  3  Centralized Forwarding Card WS-F6700-CFC        SAL----------  3.1    Ok

Mod  Online Diag Status
---- -------------------
  1  Minor Error
  2  Pass
  3  Pass

Continue reading “Cisco 6500 Cosmetic bugs”


Really sad when you have to reboot a production switch that’s been up for this long. Suppose another question is why was the switch never upgraded? Until now not needed.  :)

bry-asw1>show version
Cisco Internetwork Operating System Software
IOS (tm) C2950 Software (C2950-I6Q4L2-M), Version 12.1(9)EA1, RELEASE SOFTWARE (fc1)
Copyright (c) 1986-2002 by cisco Systems, Inc.
Compiled Wed 24-Apr-02 06:57 by antonino
Image text-base: 0x80010000, data-base: 0x804E8000
ROM: Bootstrap program is CALHOUN boot loader
bry-asw1 uptime is 7 years, 48 weeks, 6 days, 6 hours, 19 minutes
System returned to ROM by power-on
System restarted at 12:00:24 SAST Thu Feb 13 2003
System image file is "flash:/c2950-i6q4l2-mz.121-9.EA1.bin"
cisco WS-C2950G-24-EI (RC32300) processor (revision D0) with 20815K bytes of memory.
Processor board ID FOC0633Y2T5

What is the longest your production devices have been up for?

Troubleshooting random Nexus reboots

November last year, a pair of Cisco Nexus 5010 switches, suddenly started rebooting randomly without user intervention.  Since these boxes were a front to a VM environment, stability were of urgent concern. But in order to stabilize the environment, the root cause of the reboots had to be isolated, and quickly.

The Cisco Nexus platform might not be as mature as many would like, but it is quickly becoming a very needed switch in Next-Generation datacenters. Of the things I like most about the Nexus boxes are the readily available local reporting and intuitive system checks.  Obviously there are many other features which is making the platform so popular. I’ll cover some of these in time.

Coming back to the rebooting issue. Unlike IOS devices that looses all local logging info, unless a crash dump was saved to NVRAM, the Nexus writes most of its log information to disk. Thus even after the reboot, you have all the information.
Continue reading “Troubleshooting random Nexus reboots”

Troubleshooting a Cisco 6500 crash

I was asked recently to share some knowledge about the support of the Cisco 6500 switches as the information available on the DOC-CD could be fairly overwhelming.

As it happens a clients Cisco-6509 switch fell over yesterday. I was called out to address the issue of the Cisco-6509 that decided it was tired of life by rebooting itself.  I’ll go through some of the steps I did to find the root cause. Obviously note the steps listed here will not find the cause of every possible issue with a 6500 switch, but can be used as a guideline.

Usually the first thing I would do is to see the reason for the reboot with a “sh version”. Look at the highlighted lines.

ndcbbnpendc0103#sh ver
Cisco Internetwork Operating System Software
IOS (tm) s72033_rp Software (s72033_rp-ADVENTERPRISEK9_WAN-M), Version 12.2(18)SXF6, RELEASE SOFTWARE (fc1)
Technical Support:
Copyright (c) 1986-2006 by cisco Systems, Inc.
Compiled Mon 18-Sep-06 23:32 by tinhuang
Image text-base: 0x40101040, data-base: 0x42D90000

ROM: System Bootstrap, Version 12.2(17r)SX5, RELEASE SOFTWARE (fc1)
BOOTLDR: s72033_rp Software (s72033_rp-ADVENTERPRISEK9_WAN-M), Version 12.2(18)SXF6, RELEASE SOFTWARE (fc1)

ndcbbnpendc0103 uptime is 3 hours, 23 minutes
Time since ndcbbnpendc0103 switched to active is 3 hours, 22 minutes
System returned to ROM by s/w reset at 00:14:27 PDT Wed Sep 20 2006 (SP by bus error at PC 0x402DC89C, address 0x0)
System restarted at 09:13:44 ZA Wed Mar 10 2010
System image file is "disk0:s72033-adventerprisek9_wan-mz.122-18.SXF6.bin"

Obviously it is clear that the switch did a software reset caused by ‘bus error at PC 0x402DC89C, address 0x0‘.

Continue reading “Troubleshooting a Cisco 6500 crash”

R&S Quick Notes – Switching

With the insane amount of theory to go through before the big day comes, it is only normal for a couple of items to get lost in the masses. On top of that, regardless of the material you used to study, you are bound to come across a couple small things that you have not seen before. Apart from my 400 pages of everything there is to know for the R&S, I took the time to compile, format and index a book of my CCIE R&S short notes. While compiling all my notes,  labbing,  and reading the Cisco DOC and other blogs, that I made shorter list of the most important tid-bits and any beeg gothas to look out for on the big day.

Hope these help some of you :)

Switching Notes

  • If different VTP domain names between 2 switches, you cant use DTP. Must use manual trunking.
  • When configuring 802.1x, DO NOT forget to add “aaa authentication login default none”, else you might lock the switch and forfeit any points related to that switch.
  • Always confirm your MD5 to be same when configuring VTP PASSWORDS, with “sh vtp status”
  • To enable WCCP on a 3550, you have to change the SDM template to ‘extended-match’
  • STP Timers question-1: Change the STP timers when a port initially comes up to 44 sec.  Answer: Blocking is always 20 sec, (44-20 = 24/2) each listening and learning timers should be configured at 12 sec.
  • STP Timers question-2: Change the STP timers, that in the event of convergence, delay should be no more than 20 sec. Answer: (20/2) each listening and learning timers should be configured at 10 sec.
  • MAC-ACL’s will only match NON-IP traffic. 3560 sees IPv6 traffic as IP-traffic, but 3550 sees IPv6 traffic as NON-IP-traffic, so a 3550 can use a MAC-ACL for IPv6 traffic.
  • Ethertypes used with MAC-ACL’s not on DOC-CD/CMD-Help :

– 0x0806 : IP ARP
– 0x0800 : IPv4
– 0x86DD : IPv6
– 0x4242 : CST (Common Spanning Tree)
– 0xAAAA : All Cisco proprietary (VTP, STP, CDP, DTP, UDLD, PAgP)
– 0xFFFF : all NON-IP

  • VLAN-ACL’s: ONLY a ACL-Permit performs the “forward”/”drop” function in the access-map. A ACL-deny will be ignored. So to deny traffic with VLAN ACL’s, permit the traffic and use a “drop” action in the access-map.
  • Storm-Control: Multicast amount must be equal or greater that the broadcast amount.
  • Uplinkfast used when a direct link failure is detected.
  • Backbonefast – used to determine indirect link failure.
  • Root Bridge Election: 1-Lowest Bridge-ID (Priority [32768 ] + Sys-Id-Ext[=vlan]) & 2-Lowest MAC
  • Root Port Election: 1-Lowest cost to Root, 2-Lowest upstream Bridge-ID, 3-Lowest Port-ID (Port Priority + Port Number)
  • Influencing local Root Port election – change the Port Cost.
  • Influencing the Root Port of directly connected downstream switch – change the Port Priority.

IP OSPF mtu-ignore alternative

I came across the a command I think would make a great CCIE lab question.

Assume you busy with the lab, and previously a task in the switching section required you to do a dot1q tunnel where you had to change the SYSTEM-MTU on SW1 to 1504.  No beegy.
But you now at the OSPF section, where you have to setup ospf between R1 and SW1, BUT with the following restriction:
(you are not allowed to use the mtu-ignore command)

The usual fix on R1’s interface is prohibited
#interface Fa0/0
#ip ospf mtu-ignore

Hmmm, now what? R1 wont form an adjacency with SW1, due to a MTU mismatch. We obviously cant change the SYSTEM-MTU on SW1, cause that would break a previous question.

Typical behaviour when you have a OSPF MTU mismatch, is a neighbor finite state getting to EXSTART, retrying and eventually giving up.
We can see this on R1 if we do a “debug ip ospf adj”


Continue reading “IP OSPF mtu-ignore alternative”