An Introduction to Layer 3 Traffic Isolation

All network engineers should be familiar with the method for virtualizing the network at Layer 2: the VLAN. VLANs are used to virtualize the bridging table of Layer 2 switches and create virtual switching topologies that overlay the physical network. Traffic traveling in one topology (ie VLAN) cannot bleed through into another topology. In this way, traffic from one group of users or devices can be kept isolated from other users or devices.

Traffic Isolation Using VLANs

VLANs work great in a Layer 2 switched network, but what happens when you need to maintain this traffic separation across a Layer 3 boundary such as a router or firewall? Typically, if you have two VLANs that each terminate on a router and the router has an IP address in each VLAN, the devices in those VLANs are free to talk to each other by passing traffic through the router. That traffic isolation that was gained by putting the devices in different VLANs is now lost. In fact, because some network engineers are only familiar with traffic isolation at Layer 2 and not at Layer 3, the overall network design will often be compromised to allow for VLANs to span end-to-end in the network so that traffic separation can be maintained. This of course necessitates bridging everywhere in the network which can lead to serious issues.

There is a way to maintain traffic isolation across Layer 3 devices. It’s called Virtual Routing and Forwarding. Virtual Routing and Forwarding, aka VRF, allows the routing table in a Layer 3 switch or router to be virtualized. Each virtualized table contains its own unique set of forwarding entries. Traffic that enters a router will be forwarded using the routing table associated with the same VRF that the ingress interface is associated with and will be sent out an egress interface associated with the same VRF. Much like VLANs, VRFs ensure logical isolation of traffic as it crosses a common physical network infrastructure.

There are three general concepts behind VRFs:

1. Access Control
2. Path Isolation
3. Shared Services

The next sections look at each of these in turn.

Access Control

Access control refers to how end devices are identified and segmented at the network edge (aka, access layer). Users need to be segmented before they input traffic into the network so that the network knows which virtual network to associate their traffic with. Access control must take into consideration both wired and wireless network access methods.

Two of the most common methods for segmenting wired end devices is by static VLAN assignment and 802.1X Network Access Control. Static VLAN assignment is where a VLAN is configured on an edge port and that VLAN does not change regardless of who or what is plugged in.

switch(config-if)# switchport mode access
switch(config-if)# switchport access vlan 101

This method is simple to implement but is costly to maintain. Every time a new device is plugged into the port the VLAN might have to be changed. If the port is located in someone’s office then this might not happen very often, but if the port is in a meeting room it can be a nightmare if you have a mixture of employees and guests plugging in. Additionally, if the port is left on the employee VLAN and a guest plugs in, they are now on the employee network and can access the same network resources as employees. This is an obvious security risk. Although it’s hard to maintain, static VLAN assignment is easy and inexpensive to implement and requires no additional equipment, tools, or training.

The more advanced alternative to static VLAN assignments is to employ 802.1X on all edge ports. The 802.1X standard is a method of authenticating end devices to the network. Based on the results of the authentication and the policies in place, the network can automatically assign a VLAN to a port (among other things). Presumably, devices being used by employees would successfully authenticate and be placed on the employee VLAN. Devices owned by guests would fail the authentication and be placed into the guest VLAN. Implementing 802.1X however, is rather complex. It requires additional equipment to perform the authentication and run the policy engine that determines things like VLAN assignment. The upside though is that it removes the burden of manually tweaking VLAN assignments and ensures that the end devices are always placed into the correct VLAN, eliminating the risk of guests getting on the employee VLAN.

Access Control on Edge Ports

On the wireless side, end devices can be segmented by way of separate SSIDs for different groups of users. An SSID could be created for employees, guests, and contractors each of which is bound to its own VLAN on the uplink side of the wireless controller. Mechanisms such as 802.1X can also be employed on wireless connections to bind an end device to a certain VLAN after it’s associated with the wireless network.

Wireless Access Control

In the end, all of these solutions segregate end devices by placing them in the appropriate VLAN. VLAN assignment is the first step in segregating end device traffic. Traffic entering the network on that VLAN will eventually hit a Layer 3 device where it will be forwarded based on the routing table that is part of the VRF to which the VLAN is bound.

Traffic Hitting a VRF

Path Isolation

Path isolation refers to the method used within the core of the network to keep each VRF’s traffic isolated. As stated earlier, once traffic hits a Layer 3 device, it will normally be forwarded between interfaces which may allow traffic to route between VLANs. Each of the path isolation methods below keeps traffic inside its assigned VRF as it travels between Layer 3 devices.

Routing Between VLANs/Subnets

The hop-by-hop method creates switched virtual interfaces (SVIs) on top of 802.1q tags between each Layer 3 device in the network. For each pair of connected devices, there is (1) SVI created per device, per VRF. Unlike a Layer 2 network where VLAN tags are bridged end-to-end, each tag is used only on one interconnect and each device acts as a Layer 3 hop in the traffic path. When everything is fully provisioned, you end up with a path through the network that is strung together by SVIs. This can be really cumbersome to manage since for every VRF you have to configure multiple interfaces from edge to edge and manage all that extra IP addressing too. If there are multiple potential paths through the network from edge to edge, then the SVI string needs to be provisioned on the alternate paths as well. The upside is that SVIs are relatively easy to understand and are very well supported on all types of hardware and software versions. Because of the large management overhead though, this method should only be used on very small networks.

Hop-by-Hop SVI Interfaces

A more scalable alternative to hop-by-hop is to encapsulate each VRF’s traffic inside a tunnel. Since a tunnel can be provisioned directly between two edge routers, nothing needs to be touched in the core of the network. In fact, the VRFs don’t even need to be provisioned in the core of the network (assuming there are no edge devices connected to the core). This simplifies the provisioning of paths through the network and eliminates the risk of a mistake being made on a core router during provisioning. If provisioned correctly, a tunnel also provides built-in path redundancy (unlike hop-by-hop which you have to manually account for). Assume an active tunnel is following this path through the network: A->B->C and that router B fails. As long as there is an alternate path through the network between the two tunnel endpoint addresses on A and C, the tunnel will re-route and the VRF traffic inside the tunnel will continue to flow. One caveat with tunneling is that not all devices perform tunneling in hardware and some don’t even support protocols such as GRE at all. You tunnel endpoints need to support GRE and depending on your traffic load, should perform GRE functions in silicon.

Edge-to-Edge Tunnels

The final and most scalable method is full on MPLS. MPLS dynamically creates paths from edge router to edge router for transporting VRF traffic through the network. Although MPLS is the most scalable it is also the most complex. An MPLS network requires BGP and LDP (Label Distribution Protocol) both of which must be well understood by the network provisioning and operations teams. MPLS also has the strictest hardware requirements of any of the path separation methods. Network devices must be capable of running BGP and LDP, must have enough memory on board to handle all the entries in the routing and label forwarding tables, and should be capable of label switching in silicon for best performance.

The hop-by-hop and tunnel methods are what’s known as “VRF lite” — you’re using VRFs but without full-blown MPLS to tie everything together. VRF lite is often found in enterprise networks where the number of VRFs in play are still manageable using these manual path isolation methods.

Shared Services

Shared services are things like DNS, DHCP, and Internet access that are typically common to all VRFs. Rather than running a set of DNS servers and a set of DHCP servers for each virtual network, you stand up one set of servers that can service everyone. Internet access is the same. Running multiple Internet services is costly and time consuming so it’s usually shared among all VRFs.

Shared services are typically located in their own little module that hangs off the edge of the network. This module is one of the trickiest parts of a VRF-enabled network because it’s really easy to accidentally allow traffic to leak between VRFs if proper care is not taken. Since the servers and Internet edge devices that sit in the shared services module need to talk to end devices in all the VRFs, this module needs to contain routes for all of the VRFs. It would be really easy to accidentally allow routes from VRF A to be advertised through the shared services module into VRF B (and vice-versa) thus allowing devices in A and B to freely communicate.

Shared Services

Another big challenge with shared services is the fact that VRFs can have overlapping IP address space. It becomes increasingly difficult to provide services like DNS and DHCP on a single server for overlapping IP networks. In this case it may be necessary to actually have multiple servers that serve a subset of VRFs or even just an individual VRF. The “shared” in “shared services” now refers more to the shared infrastructure which connects these servers to the rest of the network rather than the servers themselves being shared.

Final Word

This is the first post in what I hope to turn into a short series on Virtual Routing and Forwarding. Future posts will discuss how to configure VRFs, practical applications for VRFs in an enterprise network, and go into more detail on shared services.

33 thoughts on “An Introduction to Layer 3 Traffic Isolation”

  1. Very well explained.. Right from the basics pulling the level up..
    I really enjoyed going through it being a security guy..

    1. Hi,

      The best examples I know of are in the Cisco Validated Design doc on campus network virtualization.

      Unprotected Shared Services:

      Protected Shared Services:

      Centralized/shared services are also discussed in the MPLS & VPN Architectures book:

  2. Just coming out of a meeting with our core packet planning folks, I was psyched to learn what VRF was :) That post has been very helpful, thank you.

  3. This is a great explanation on how layer 2 is used to connect VRFs together. I believe that is the missing piece with alot of these types of VRF articles. They assume layer 2 just works, but you need something to transport those frames over. I am lab’ing the GRE method, which seems to be the easiest to implement as VRF lite. The SVI method, correct me if I’m wrong requires end-to-end to end trunks? I would like lab with GNS3 a “provider” network, but I run into the issueof how to transport the different customer VRFs at the CORE without creating another VLAN on COREs WAN link.

    1. Hi Jim. Thanks for the feedback!

      You got it right. You’d need trunks links between all of the provider (P) switches. I’ve attempted to illustrate how that looks here:

      Notice that a different tag is used on each P switch. The reason is that you’re creating _routed_ links between the switches. If you stick the same tag on two trunks of the same switch, the switch will (of course) bridge the frames straight through. By using different tags, you’re forcing a routing action as the frames transit the switch.

      1. Ok! Cool. I’ve read about another way using VRF route leaking. Seems pretty complex. I am trying to find a way to have a single Customer talk to two locations it has presence at, but thru the Provider network. This VRF route leakage sounds promising because the only global route the customer would need to be able to see and traverse is the route from each locations PE. They are the same Customer just in two different locations on the providers network….Again using GRE would be easier

      2. i have vss enabled topology with trunks from spine to 7 leaves(spine/Leaf design),this trunk is also carrying wifi for employees and Guest wifi in two different vlans.I want to convert these trunks links in physical layer 3 links between spine and leaves.but prior to this i need to solution for segregating guest and employee wifi over layer 3.would vrf lite will solve my issue.i have deployed cat4500x in spine an leaf layer.

  4. Well done…simple way to explain traffic isolation at layer two and three.
    I like to review a protocol a month and this was fun to read.

    thanks for taking the time

  5. Hi. Appreciate your time putting this together (albeit some time ago). Regarding your very last point (needing separate shared services if you have CE address space overlap): it should be possible to avoid that by implementing something like whole-subnet static NAT at the CE, right before that traffic ingresses the VRF, MPLS tunnel. Do you agree?

    1. Hi Paul, thanks for the feedback!

      Yeah NAT could work in this situation but like most things, “it depends”. In the case of DHCP it certainly wouldn’t because we can’t NAT DHCP packets. With DNS it might actually work. You really have to understand the services/applications you’re running in the shared services area and whether it matters that the clients will be in overlapping address spaces. Like most complex things, proper planning, engineering and testing will yield the best results.

  6. Joel Great post here really helpful .. what I am really trying to discover here and elsewhere though is why or even when I would want to implement VRF Lite in a network I.e. what would the advantages be say to just running usual L3 routing (OSPF) among 6500’s with standard (extended VLAN) deployment many thanks in advance

    1. Hi Will and thank you :)

      Consider this scenario: you are in VLAN 10 and I’m in VLAN 20. We’re not supposed to talk with each other. However, we do both need to talk to servers that are over in the data center. We each have a default gateway in our VLAN which we send traffic to to reach the servers.

      From a Layer 2 perspective, you and I cannot talk because we’re in separate VLANs. Cool. But, from a Layer 3 perspective, I can reach your IP address by sending traffic to my default gateway which will happily route it to your subnet. You reply back to me by sending traffic to your default gateway and the traffic comes back to me.

      What VRF Lite does is keep the default gateway and all other routers from passing traffic between our two networks. In other words, VRF Lite builds the same segmentation at Layer 3 that we get with VLANs at Layer 2.

      However, the mapping of VLANs to VRFs does not have to be 1:1. Your VLAN 10 could be in a VRF with 30 other VLANs and I might be in a VRF all by myself. Or maybe with 1000 VLANs. The mapping is all configurable.

      Hopefully this is helpful. Let me know!

  7. Joel, really a nice article.

    Could you please advise if VRF recommended for backups, especially when there already exists multiple backup vlans.

    1. Hi Kraju,

      I’m not totally certain what you mean by backup vlan, but I’ll guess you mean a vlan where servers send and receive data backups.

      Most backup vlans I see are not routed, they’re just a flat L2 network between the servers. In that case a VRF does not make much sense since VRFs are a L3 construct. You could put the vlan in a VRF, but it wouldn’t provide any benefit.

      If you are routing in this vlan, then yeah sure, a VRF might make sense to isolate that traffic from the rest of the environment.

    1. Hi Bob,

      The short answer is no, you cannot.

      The examples above (“Hop-by-Hop SVI Interfaces” and “Edge-to-Edge Tunnels”) are actually referring to L3 switches. Without VRFs, you end up with the same situation depicted in the “Routing Between VLANs/Subnets” diagram.

  8. Just want to say thank you. The topic is very well explained in a systematical way with very nice graphs and plain english. Unlike many tutorials (which are good in their own way) that relies too much on config files to progress, this article helps newbies to grasp the idea much faster by visualizing them and connecting the dots between subtopics (such as MPLS, VRF-lite, SVI, hop-by-hop VRF). Thanks you & please keep the great work! By the way the discussion section has a lot of good info as well!

  9. Late comment, but one more way of connecting the VRFs in different routers while still achieving path isolation, is to use routed subinterfaces (with VLAN tagging). The advantage is that you don’t need to allocate a unique VLAN id for each link/VRF combination, but can instead use the same id for all links.

    You can then also get away with using IP unnumbered, and don’t have to allocate IP addresses for each link. (At least Junos doesn’t allow me to use IP unnumbered on IRB interfaces [their name for SVI interfaces].)

    This scales better than using SVI interfaces, but not as well as tunnels or MPLS, but on the other hand you don’t need to set up GRE or MPLS.

    And then of course there is the option of using separate physical links for different VRFs… (This can be useful e.g. in case one user group is paying for expensive high-speed interfaces, and you don’t want other user groups to use up their bandwidth.)

    1. Thanks for commenting, Thomas. Have you done ip unnumbered in Cisco IOS? Did you have to put both the ip unnumbered interface and the “surrogate” interface in the same VRF?

      1. Borrowing the address from another routing instance? No, I’ve never felt the inclination of doing such a perverted thing. :-) That would go against the idea of having separate address and routing spaces, I feel, and would actually expect it to fail.

        We got rid of our last Catalysts more than a year ago (good riddance!), so I don’t have anything running IOS anymore to test on. When I try in Junos, it complains that the Donor and Borrower interfaces are in different routing instances, like I expected, and refuses to commit. HP Procurve supports neither VRFs nor unnumbered interfaces, and HP Comware only supports unnumbered on tunnel interfaces, and I don’t have any tunnelling on my Comware machines, so I can’t test there.

Leave a Reply

Your email address will not be published. Required fields are marked *

Would you like to subscribe to email notification of new comments? You can also subscribe without commenting.