Memory problems on routers is nothing new. It is generally less of a problem in current day, but is still seen from time to time.
BGP is capable of handling large amount of routes and in comparison to other routing protocols, BGP can be a big memory hog. BGP peering devices, especially full internet peering devices, require larger amounts of memory to store all the BGP routes. Thus it’s not uncommon to see a BGP router run out of memory when a certain route count limit is exceeded.
A router running out of memory, commonly called Low Memory, is always a bad thing. The result of low memory problems may vary from the router crashing, to routing processes being shut down or if you lucky enough erratic behavior causing route flaps and instability in your network. None which is desired.
Low memory can be caused by any of the following:
- Partial physical memory failure.
- Software memory bugs.
- Applications not releasing used memory chunks.
- Incorrect configuration.
- Insufficient memory allocation to a Nexus VDC.
I discovered this neat little feature last week on the Cisco Nexus 7000. The feature is surprisingly called Low Memory Handling. Instead of allowing unpredictable behavior, like a process crashing and potentially losing all BGP sessions, the Nexus 7000 introduces some intelligence into the handling of low memory problems.
.
How does it work?
By default, BGP on a Nexus will react in the following way to low memory conditions:
- Minor alert (85%) – BGP will not establish any new eBGP sessions, but will continue to establish new iBGP sessions and confederate sessions. Established sessions will remain up, but reset sessions will not be re-established.
- Severe alert (90%) – BGP shuts down select established eBGP sessions every two minutes until the memory alert becomes minor. For each eBGP peer, BGP calculates the ratio of total number of paths received to the number of paths selected as best paths. E.g., if 60 paths were received and 40 was selected as best, the ratio is 3:2. The peers with the highest ratio are selected to be shut down to reduce memory usage. I.e., those advertising many prefixes with fewest preferred paths will be shut down first.
- Critical alert (95%) – BGP will gracefully shut down all the established sessions.
Once a session is shut down due to severe or critical memory state, the session to those peers must be manually cleared before brought up.
This is really good considering that it is arguably better to lose two or three BGP sessions instead of risking all sessions. But what caught my attention was the ability to exclude important BGP sessions from the Low Memory Check.
For the Severe alert phase you can exempt important eBGP peers (like upstream transits) from this selection process and shutdown less important eBGP peers due to a low memory condition.
The memory alert thresholds indicated above in brackets can be changed, and the Critical alert action can be disabled.
.
Verification
Under strain conditions the following syslog events would be seen at the different memory stages:
%PLATFORM-2-MEMORY_ALERT: Memory Status Alert : MINOR %PLATFORM-2-MEMORY_ALERT: Memory Status Alert : SEVERE %PLATFORM-2-MEMORY_ALERT: Memory Status Alert : CRITICAL
Upon recovery the syslog events would be:
%PLATFORM-2-MEMORY_ALERT: Memory Status Alert : MINOR ALERT RECOVERED %PLATFORM-2-MEMORY_ALERT: Memory Status Alert : SEVERE ALERT RECOVERED %PLATFORM-2-MEMORY_ALERT: Memory Status Alert : CRITICAL ALERT RECOVERED
When a particular BGP session is shut down in preservation of others the following event will pop up:
%BGP-2-PEERSHALTED: bgp-100 [5440] BGP 10.1.5.99 shutdown due to no memory condition (Severe Alert)
When looking at the BGP summary table, it should be clear which neighbor was shutdown:
N7K# sh ip bgp summary | i 10.1.5.99 10.1.5.99 4 300 10 652 0 0 0 00:00:26 Shut (NoMem)
A show command to verify the memory status:
N7K# show system internal memory-status MemStatus: Severe Alert
Other useful show commands:
N7K# show system resources Load average: 1 minute: 0.23 5 minutes: 0.22 15 minutes: 0.20 Processes : 980 total, 1 running CPU states : 3.0% user, 2.5% kernel, 94.5% idle Memory usage: 4115812K total, 2885968K used, 1229844K free N7K# show routing memory estimate Shared memory estimates: Current max 32 MB; 27495 routes with 16 nhs in-use 1 MB; 67 routes with 1 nhs (average) Configured max 32 MB; 27495 routes with 16 nhs
.
Configuration
To exclude important BGP sessions from the Low Memory check use the following configuration:
N7K(config)#router bgp 100 N7K(config-router)#neighbor 10.1.5.99 remote-as 300 N7K(config-router-neighbor)#low-memory exempt
The memory alert thresholds can be changed using the following command:
N7K(config)#system memory-thresholds minor {%} severe {%} critical {%}
The action at the critical stage can be disabled, but I would not recommend it.
N7K(config)#system memory-thresholds threshold critical no-process-kill