It is not uncommon to find that different vendors have slightly different implementations when it comes to standards technologies that should work seamless.
I recently came across a BGP capability negotiation problem between a Nexus 7000 and a client Fortigate. Today’s post is not teaching about any new technologies, but instead showing the troubleshooting methodology I used to find the problem.
The setup is simple. A Nexus 7000 and a Fortigate connected via nexus layer2 hosting infrastructure, to peer with BGP.
At face value the eBGP session between Nexus 7000 and the Fortigate never came up:
N7K# sh ip bgp summary | i 10.5.0.20 Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 10.5.0.20 4 65123 190 190 0 0 0 0:12:30 Idle
The first steps should verify the obvious.
- Configuration! This check should included checking the ASNs, the peering IP addresses, source-interfaces and passwords matching.
The configuration used was;
Nexus 7000# router bgp 65100 neighbor 10.5.0.20 remote-as 65123 password routing-bits ebgp-multihop 3 update-source Vlan631 address-family ipv4 unicast default-originate Fortigate# config router bgp set as 65123 config neighbor edit "10.5.0.18" set ebgp-enforce-multihop enable set interface "port5" set remote-as 65100 set password "routing-bits" end config network edit 1 set prefix 172.16.1.0 255.255.252.0
The second step would be to confirm reachability between the devices and the amount of IP hops required for multihop. A Ping and a Traceroute should generally provide the necessary information.
Optionally it is always good to confirm that nothing between the devices are blocking port 179.
Testing this can be done with a Telnet. But in the case of a Fortigate this by default will not work unless explicitly allowed.
N7K# telnet 10.5.0.20 179 source 10.5.0.18 Trying 10.5.0.20...
If that all seems fine, it is best to start digging a bit deeper to understand where this is failing.
Does the “show logging” have anything of interest? In this case it didn’t.
A “debug ip tcp transactions” can be used next. But I generally steer away from this on production devices. Additionally in this case it is not yet available on the Nexus 7000. A “diagnose sniffer packet” on the Fortigate could be used here, but it will have showed little more information that we already know.
0.865117 10.5.0.18.179 -> 10.5.0.20.15894: rst 3896836347 ack 4155795791 0.870861 10.5.0.20.17188 -> 10.5.0.18.179: rst 3556829865
A RST (reset) for BGP sessions is never good, but we know already that the session does not want to come up. We need to understand why.
The next step should involve looking at the BGP events between the neighbors to determine where this is breaking.
The command to use is “debug ip bgp events“. The Nexus 7000 allow the debug output to be filtered for only one BGP session.
N7K# debug-filter bgp neighbor 10.5.0.20 N7K# debug ip bgp events N7K# EVT: Starting periodic BRIB processing EVT: Connection request from peer 10.5.0.20 port 17429 fd 53, vrf default EVT: 10.5.0.20 went from Idle to Connect (Passive setup) EVT: 10.5.0.20 sending OPEN message to peer EVT: 10.5.0.20 went from Connect to OpenSent (Passive setup) EVT: 10.5.0.20 Wait (30 sec) for session setup response EVT: 10.5.0.20 Process OPEN message from peer EVT: 10.5.0.20 went from OpenSent to OpenConfirm (Passive setup) EVT: 10.5.0.20 Wait (30 sec) for session setup response EVT: Received NOTIFICATION unspecified OPEN error from 10.5.0.20 EVT: Reset by peer (unspecified OPEN error) in session to 10.5.0.20, value 0, state was OpenConfirm EVT: [IPv4 Unicast] 10.5.0.20 GR state: none, saved flags: 0 EVT: 10.5.0.20 cleaning up passive peer setup, thread id 0x0
From the above output it is clear that the BGP negotiation is unsuccessful. Two neighbors must agree on the OPEN message parameters.
Before continuing consider the information contained in the OPEN message that could be causing the error here in OPEN message negotiation.
- BGP Version : Indicates the BGP version the sender. Almost all current implementations today are BGP-4.
- Local ASN : Autonomous System Number of the sender.
- Hold Time : The number of seconds how long a BGP peer will allow the connection to be left silent between receipt of BGP messages. A BGP device may refuse a connection if it doesn’t like the value that its peer is suggesting; usually, however, the two devices agree to use the smaller of the values suggested by each device.
- BGP Identifier : IP address identifying the sending neighbor.
- Optional Parameters: are fields used to specify extra parameter supported by a neighbor.
So far we have confirmed that it is not the first four items, putting all focus on the Optional Parameters which includes the following two:
- Authentication Information : This is obvious. It’s the password for the BGP session. Already checked.
- Capabilities : Is new optional parameters used to extend BGP.
If Authentication was the problem, the “debug ip bgp event” output would have indicated authentication failure and syslog message would have indicated the same in the buffer logg.
So at this stage it is reasonable to assume that might be related to the capabilities. But which one is the problem child. For further reading please refer to RFC-5492 (Capabilities Advertisement with BGP-4). The current assignment of the capabilities is available here http://www.iana.org/assignments/capability-codes/capability-codes.xml
Keep in mind RFC suggest that if two neighbors cannot agreed on the capabilities, the session should be torn down. Hmmm, sounds like our problem? So now lets find the culpirat, by looking at the output of “debug ip bgp keepalives”
N7K# debug ip bgp keepalives (default) ADJ: 10.5.0.20 Sending OPEN, version 4, AS 65100, hold-time 180, router-id 10.1.1.188 (default) ADJ: 10.5.0.20 sending (old) dynamic capability (66/0) to peer (default) ADJ: 10.5.0.20 sending dynamic capability (67/3) to peer (default) ADJ: 10.5.0.20 sending (old) route refresh capability (128/0) to peer (default) ADJ: 10.5.0.20 sending route refresh capability (2/0) to peer (default) ADJ: 10.5.0.20 sending [IPv4 Unicast] capability (1/4) to peer (default) ADJ: my restart time 120 restart state 0 (default) ADJ: 10.5.0.20 sending graceful restart capability (64/6) to peer (default) ADJ: 10.5.0.20 sending 4-byte AS capability (65/4) to peer (default) ADJ: 10.5.0.20 received open message with optional params 2, len 6, from peer (default) ADJ: 10.5.0.20 received capability/len 1/4 from peer (default) ADJ: 10.5.0.20 peer is [IPv4 Unicast] capable (default) ADJ: 10.5.0.20 received open message with optional params 2, len 6, from peer (default) ADJ: 10.5.0.20 received capability/len 1/4 from peer (default) ADJ: 10.5.0.20 peer is [IPv6 Unicast] capable (default) ADJ: 10.5.0.20 received open message with optional params 2, len 2, from peer (default) ADJ: 10.5.0.20 received capability/len 128/0 from peer (default) ADJ: 10.5.0.20 received (old) route refresh capability from peer (default) ADJ: 10.5.0.20 received open message with optional params 2, len 2, from peer (default) ADJ: 10.5.0.20 received capability/len 2/0 from peer (default) ADJ: 10.5.0.20 received (new) route refresh capability from peer (default) ADJ: 10.5.0.20 received open message with optional params 2, len 6, from peer (default) ADJ: 10.5.0.20 received capability/len 65/4 from peer (default) ADJ: 10.5.0.20 received 4-byte AS capability from peer (default) ADJ: 10.5.0.20 Ignoring 4-byte AS capability received from peer (default) ADJ: 10.5.0.20 sending first KEEPALIVE to peer (default) ADJ: Expected first keepalive from 10.5.0.20, got msg type 3
Let go through the output above. The ‘(default)’ indicates the default VRF table on the Nexus 7000.
The first line shows the OPEN message going out with the mandatory fields as discussed above.
Then the capability negotiation is started on the Nexus 7000 by sending its supported capabilities.
The Nexus advertises support for the following:
1 – Multiprotocol support for IPv4 and IPv6
2 – route refresh capability
64 – graceful restart capability
65 – 4-byte AS capability
66 – (old) dynamic capability
67 – dynamic capability
128 – Cisco original version of route refresh capability
The Fortigate responded with the following:
1 – Multiprotocol support for IPv4 and IPv6
2 – route refresh capability
65 – 4-byte AS capability
128 – Cisco original version of route refresh capability
The Fortigate acknowledged support for 4 of the 7 sent by the Nexus. Yet the Nexus only accepted 3 of the 4 capability responses from the Fortigate.
Ok, it’s a mouthful. First consider what the Fortigate ignored. The Dynamic capability. The Fortigate either does not supported it or requires manual configuration. I can’t tell you which it is. But this is easy to correct on the Nexus.
Secondly the 4-byte ASN value received on the Nexus was ignored. I found this to be rather interesting. The Fortigate ,as well as the Nexus, do support 4-Byte ASNs. Either of the two vendors are not complying to the standard or there could be a bug on either side.
Now you can really waste more time at point to figure out which is the guilty party, or you put the required measures in place to bring the BGP session up
After the following updated config was applied the the Nexus 7000, did the BGP session come up.:
N7K(config-router-neighbor)# router bgp 65100 neighbor 10.5.0.20 no dynamic-capability capability suppress 4-byte-as
N7K# sh ip bgp summ | i 10.5.0.20 10.5.0.20 4 65123 2418 2420 870801 0 0 00:07:02 1
I hope that informative.
Thanks, Really informative and nice methodology in Tshooting
Nice post!
Thanks Heaps! I had exactly the same issue and spent a long time looking at logs unable to find the root cause..You saved my day!
Nice one …informative
Fixed my exabgp issues, thanks!