NSF and GR are two features in Layer 3 network elements (NEs) that allows two adjacent elements to work together when one of them undergoes a control plane switchover or control plane restart.
The benefit is that when a control plane switchover/restart occurs, the impact to network traffic is kept to a minimum and in most cases, to zero.
- Non-Stop Forwarding
- When a control plane protocol such as BGP, OSPF, or EIGRP restarts and neighbors/adjacencies are reset, NSF will allow the data plane to hold onto the routes that were learned via that control plane protocol and continue to forward traffic while the neighbors/adjacencies are re-established.
- Control plane restarts occur when you have a router or switch with dual route processors or supervisor engines and there is a switchover from the active to the hot standby. When the newly active RP/sup takes over, it has to re-establish neighbors/adjacencies because that information is not part of the synchronization that occurs between the two RPs/sups.
- NSF keeps traffic moving — without the need to reroute — while the switchover is happening.
- NSF happens locally, all within the network element where the switchover is happening.
- Graceful Restart
- GR is the procedure that two neighbors will follow when one of them undergoes a control plane restart.
- The NE where the control plane is restarting will have to re-establish neighbors/adjacencies (as stated above) when the hot standy RP/sup becomes active. When the neighbors/adjacencies are being signalled, the NE will tell its neighbors, “Hey, look, I just had to restart BGP/OSPF/whatever. I don’t want you to flush your routes because my data plane is fine; I’m still forwarding traffic. We do need to re-establish our neighborship though. Cool?” Yes, routers really are that informal with each other 😜.
- The neighbor NE receives this message and, if it supports GR, will take special steps to ensure its data plane forwarding entries are not modified or flushed while the control plane is re-establishing neighborships.
- Once the control plane is re-established, routing information is exchanged and if necessary, data plane forwarding entries are then updated so that everything is consistent.
NSF and GR are separate yet closely related software features that need to work together in order for any benefit to be realized.
The Nexus 5000 has some strict requirements when it comes to NSF and GR:
- NSF is not supported — if the control plane on an n5k restarts, the data plane forwarding entries will be flushed and there will be a Layer 3 reconvergence event.
- GR is supported — the n5k can support a neighbor/peer that is undergoing a restart.
- The only GR method that is supported is the IETF method.
Since this is networking after all, there are of course multiple specifications for NSF and GR.
As stated above, the n5k only supports the IETF spec. Unintuitively, this actually means you configure the neighbor for IETF mode:
router <protocol> nsf ietf ! enable IETF mode on the neighbor NE
Also unintuitively, the command is
nsf and not something like
graceful-restart. NSF and GR are so closely related that the command syntax overlaps in this case.
Unfortunately I do not have a good reference to point you to that says the n5k only supports IETF mode. I’m going to ask you to trust me on this for two reasons: 1) I verified this with the n5k product team and 2) I verified it myself in the lab.
To verify that the n5k is indeed assisting a neighbor going through a graceful restart, turn some debugs on and look for these types of messages during a switchover (my example is specific to OSPF):
debug ip ospf adjacency terse debug ip ospf graceful-restart detail debug-filter ip ospf vrf <vrf> ! if OSPF bound to a VRF
2016 Jun 8 15:45:11.713389 ospf: 10  Received grace LSA on interface Vlan2999 2016 Jun 8 15:45:11.713412 ospf: 10  (VRF1) Enabling flooding on all the active physical interfaces. 2016 Jun 8 15:45:11.713434 ospf: 10  (VRF1) Transition nbr 10.2.60.129 into helper mode(Reason: 3, GP: 117) [...] 2016 Jun 8 15:45:38.564685 ospf: 10  (VRF1) Terminating hitless helper mode for nbr 10.2.60.129
Another quicker but less explicit way to check is to look at the routing table with
show ip route and check the “last update time” of routes learned from the n5k’s neighbor. If the update time has not reset to zero, then GR was successful.
10.48.172.0/24, ubest/mbest: 1/0 *via 10.2.60.97, Vlan2998, [110/520], 01:02:28, ospf-10, type-1