Cisco OTV (Part II)

This is a follow on post from OTV (Part I).

STP Separation

Edge Devices do take part in STP by sending and receiving BPDUs on their internal interface as would any other layer2 switch.

But an OTV Edge Device will not originate or forward BPDUs on the overlay network. OTV thus limits the STP domain to the boundaries of each site. This means a STP problem in the control plane of a given site would not produce any effect on the remote data centers. This is one of the biggest benefits of OTV in comparison to other DCI technologies. This is made possible because MAC reachability information is advertised and learned via the control plane protocol instead of learned using typical MAC flooding behavior.

With the STP separation between sites, the ability for different sites to use different STP technologies is made possible with OTV. I.e., one site can run MSTP while another runs RSTP. In the real world this is a nifty enhancement.

.

Multi-Homing

OTV allows multiple Edge Devices to co-exist in the same site for load-sharing purposes. (With NX-OS 5.1 that is limited to 2 OTV Edge Devices per site.)

With multiple OTV Edge Devices per site and no STP across the overlay to shut down redundant links, the possibility of an end-to-end site loops are created. The absence of STP between sites holds valuable benefits, but a loop prevention mechanism is still required, so an alternative method was used. The boys who wrote OTV, decided on electing a master device responsible for traffic forwarding (similar to some non-STP protocols).

With OTV this master elected device is called an AED (Authoritative Edge Device).

An AED is an Edge Device that is responsible for forwarding the extended VLAN frames in and out of a site, from and to the overlay network. It is a very important to understand this before carrying on. Only the AED will forward traffic out of the site onto the overlay. With optimal traffic replication in a transport network, a site’s broadcast and multicast traffic will reach every Edge Device in the remote site. Only the AED in the remote site will forward traffic from the overlay into the remote site. The AED thus ensures that traffic crossing the site-overlay boundary does not get duplicated or create loops when a site is multi-homed.

Continue reading “Cisco OTV (Part II)”

Advertisements

Cisco OTV (Part I)

OTV(Overlay Transport Virtualization) is a technology that provide layer2 extension capabilities between different data centers. In its most simplest form OTV is a new DCI (Data Center Interconnect) technology that routes MAC-based information by encapsulating traffic in normal IP packets for transit.

Cisco has submitted the IETF draft but it is not finalized yet. draft-hasmit-otv-01

OTV Overview

Traditional L2VPN technologies, like EoMPLS and VPLS, rely heavily on tunnels. Rather than creating stateful tunnels, OTV encapsulates layer2 traffic with an IP header and does not create any fixed tunnels.

OTV only requires IP connectivity between remote data center sites, which allows for the transport infrastructures to be layer2 based, layer3 based, or even label switched. IP connectivity as the base requirement along some additional connectivity requirements that will be covered in this post.

OTV requires no changes to existing data centers to work, but it is currently only supported on the Nexus 7000 series switches with M1-Series linecards.

A big enhancement OTV brings to the DCI realm, is its control plane functionality of advertising MAC reachability information instead of relying on the traditional data plane learning of MAC flooding. OTV refers to this concept as MAC routing, aka, MAC-in-IP routinig. The MAC-in-IP routing is done by encapsulating an ethernet frame in an IP packet before forwarded across the transport IP network. The action of encapsulating the traffic between the OTV devices, creates what is called an overlay between the data center sites. Think of an overlay as a logical multipoint bridged network between the sites.

OTV is deployed on devices at the edge of the data center sites, called OTV Edge Devices. These Edge Devices perform typical layer-2 learning and forwarding functions on their site facing interfaces (the Internal Interfaces) and perform IP-based virtualization functions on their core facing interface (the Join Interface) for traffic that is destined via the logical bridge interface between DC sites (the Overlay Interface).

Each Edge Device must have an IP address which is significant in the core/provider network for reachability, but is not required to have any IGP relationship with the core. This allows OTV to be inserted into any type of network in a much simpler fashion.

Lets look at some OTV terminology.

.

OTV Terminology

Continue reading “Cisco OTV (Part I)”

Cisco 6500 Cosmetic bugs

Ever had this error before on a Cisco 6500 catalyst?

6500#  sh module
Mod Ports Card Type                              Model              Serial No.
--- ----- -------------------------------------- ------------------ -----------
  1    5  Supervisor Engine 720 10GE (Active)    VS-S720-10G        SAL-------
  2   48  48-port 10/100/1000 RJ45 EtherModule   WS-X6148A-GE-TX    SAL---------
  3   48  CEF720 48 port 1000mb SFP              WS-X6748-SFP       SAL----------

Mod MAC addresses                       Hw    Fw           Sw           Status
--- ---------------------------------- ------ ------------ ------------ -------
  1  001d.45e1.ed48 to 001d.45e1.ed4f   2.0   8.5(2)       12.2(33)SXH1 Ok
  2  001f.9ec6.7d70 to 001f.9ec6.7d9f   1.6   8.4(1)       8.7(0.22)BUB Ok
  3  001b.d4ec.ab60 to 001b.d4ec.ab8f   1.12  12.2(14r)S5  12.2(33)SXH1 Ok

Mod  Sub-Module                  Model              Serial       Hw     Status
---- --------------------------- ------------------ ----------- ------- -------
  1  Policy Feature Card 3       VS-F6K-PFC3C       SAL----------  1.0    Ok
  1  MSFC3 Daughterboard         VS-F6K-MSFC3       SAL----------  1.0    Ok
  3  Centralized Forwarding Card WS-F6700-CFC        SAL----------  3.1    Ok

Mod  Online Diag Status
---- -------------------
  1  Minor Error
  2  Pass
  3  Pass

Continue reading “Cisco 6500 Cosmetic bugs”

Uptime

Really sad when you have to reboot a production switch that’s been up for this long. Suppose another question is why was the switch never upgraded? Until now not needed.  :)

bry-asw1>show version
Cisco Internetwork Operating System Software
IOS (tm) C2950 Software (C2950-I6Q4L2-M), Version 12.1(9)EA1, RELEASE SOFTWARE (fc1)
Copyright (c) 1986-2002 by cisco Systems, Inc.
Compiled Wed 24-Apr-02 06:57 by antonino
Image text-base: 0x80010000, data-base: 0x804E8000
ROM: Bootstrap program is CALHOUN boot loader
bry-asw1 uptime is 7 years, 48 weeks, 6 days, 6 hours, 19 minutes
System returned to ROM by power-on
System restarted at 12:00:24 SAST Thu Feb 13 2003
System image file is "flash:/c2950-i6q4l2-mz.121-9.EA1.bin"
cisco WS-C2950G-24-EI (RC32300) processor (revision D0) with 20815K bytes of memory.
Processor board ID FOC0633Y2T5
....
 

What is the longest your production devices have been up for?

Troubleshooting random Nexus reboots

November last year, a pair of Cisco Nexus 5010 switches, suddenly started rebooting randomly without user intervention.  Since these boxes were a front to a VM environment, stability were of urgent concern. But in order to stabilize the environment, the root cause of the reboots had to be isolated, and quickly.

The Cisco Nexus platform might not be as mature as many would like, but it is quickly becoming a very needed switch in Next-Generation datacenters. Of the things I like most about the Nexus boxes are the readily available local reporting and intuitive system checks.  Obviously there are many other features which is making the platform so popular. I’ll cover some of these in time.

Coming back to the rebooting issue. Unlike IOS devices that looses all local logging info, unless a crash dump was saved to NVRAM, the Nexus writes most of its log information to disk. Thus even after the reboot, you have all the information.
Continue reading “Troubleshooting random Nexus reboots”

Troubleshooting a Cisco 6500 crash

I was asked recently to share some knowledge about the support of the Cisco 6500 switches as the information available on the DOC-CD could be fairly overwhelming.

As it happens a clients Cisco-6509 switch fell over yesterday. I was called out to address the issue of the Cisco-6509 that decided it was tired of life by rebooting itself.  I’ll go through some of the steps I did to find the root cause. Obviously note the steps listed here will not find the cause of every possible issue with a 6500 switch, but can be used as a guideline.

Usually the first thing I would do is to see the reason for the reboot with a “sh version”. Look at the highlighted lines.

ndcbbnpendc0103#sh ver
Cisco Internetwork Operating System Software
IOS (tm) s72033_rp Software (s72033_rp-ADVENTERPRISEK9_WAN-M), Version 12.2(18)SXF6, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2006 by cisco Systems, Inc.
Compiled Mon 18-Sep-06 23:32 by tinhuang
Image text-base: 0x40101040, data-base: 0x42D90000

ROM: System Bootstrap, Version 12.2(17r)SX5, RELEASE SOFTWARE (fc1)
BOOTLDR: s72033_rp Software (s72033_rp-ADVENTERPRISEK9_WAN-M), Version 12.2(18)SXF6, RELEASE SOFTWARE (fc1)

ndcbbnpendc0103 uptime is 3 hours, 23 minutes
Time since ndcbbnpendc0103 switched to active is 3 hours, 22 minutes
System returned to ROM by s/w reset at 00:14:27 PDT Wed Sep 20 2006 (SP by bus error at PC 0x402DC89C, address 0x0)
System restarted at 09:13:44 ZA Wed Mar 10 2010
System image file is "disk0:s72033-adventerprisek9_wan-mz.122-18.SXF6.bin"

Obviously it is clear that the switch did a software reset caused by ‘bus error at PC 0x402DC89C, address 0x0‘.

Continue reading “Troubleshooting a Cisco 6500 crash”

R&S Quick Notes – Switching

With the insane amount of theory to go through before the big day comes, it is only normal for a couple of items to get lost in the masses. On top of that, regardless of the material you used to study, you are bound to come across a couple small things that you have not seen before. Apart from my 400 pages of everything there is to know for the R&S, I took the time to compile, format and index a book of my CCIE R&S short notes. While compiling all my notes,  labbing,  and reading the Cisco DOC and other blogs, that I made shorter list of the most important tid-bits and any beeg gothas to look out for on the big day.

Hope these help some of you :)

Switching Notes

  • If different VTP domain names between 2 switches, you cant use DTP. Must use manual trunking.
  • When configuring 802.1x, DO NOT forget to add “aaa authentication login default none”, else you might lock the switch and forfeit any points related to that switch.
  • Always confirm your MD5 to be same when configuring VTP PASSWORDS, with “sh vtp status”
  • To enable WCCP on a 3550, you have to change the SDM template to ‘extended-match’
  • STP Timers question-1: Change the STP timers when a port initially comes up to 44 sec.  Answer: Blocking is always 20 sec, (44-20 = 24/2) each listening and learning timers should be configured at 12 sec.
  • STP Timers question-2: Change the STP timers, that in the event of convergence, delay should be no more than 20 sec. Answer: (20/2) each listening and learning timers should be configured at 10 sec.
  • MAC-ACL’s will only match NON-IP traffic. 3560 sees IPv6 traffic as IP-traffic, but 3550 sees IPv6 traffic as NON-IP-traffic, so a 3550 can use a MAC-ACL for IPv6 traffic.
  • Ethertypes used with MAC-ACL’s not on DOC-CD/CMD-Help :

– 0x0806 : IP ARP
– 0x0800 : IPv4
– 0x86DD : IPv6
– 0x4242 : CST (Common Spanning Tree)
– 0xAAAA : All Cisco proprietary (VTP, STP, CDP, DTP, UDLD, PAgP)
– 0xFFFF : all NON-IP

  • VLAN-ACL’s: ONLY a ACL-Permit performs the “forward”/”drop” function in the access-map. A ACL-deny will be ignored. So to deny traffic with VLAN ACL’s, permit the traffic and use a “drop” action in the access-map.
  • Storm-Control: Multicast amount must be equal or greater that the broadcast amount.
  • Uplinkfast used when a direct link failure is detected.
  • Backbonefast – used to determine indirect link failure.
  • Root Bridge Election: 1-Lowest Bridge-ID (Priority [32768 ] + Sys-Id-Ext[=vlan]) & 2-Lowest MAC
  • Root Port Election: 1-Lowest cost to Root, 2-Lowest upstream Bridge-ID, 3-Lowest Port-ID (Port Priority + Port Number)
  • Influencing local Root Port election – change the Port Cost.
  • Influencing the Root Port of directly connected downstream switch – change the Port Priority.