Thursday, December 31, 2015

Running a BGP Free Core - Quick and dirty

So it's the end of the year, and I wanted to do a quick video/post to round of the blog for 2015. A concept that's relatively new to me (sadly) is running a BGP free core, and it's relatively easy to set up. That being said... I know it's kind of a lazy post. Hopefully one that will spark some interesting conversation and inspire new posts in 2016!!

That being said, let's talk about the the concept, and the problem it's trying to address. BGP tables for IPv4 are $@#%@ massive. Over 500,000 prefixes which, obviously, is a lot. Traditional networking would call for every device in your core to have these routes (or summaries of them) in order to ensure reachability. However, what if I told you there was a way to only have edge routers store these massive BGP tables and to keep the core running light and efficient?!

Of course you want to know. MPLS. *Mic drop*. With MPLS the core doesn't have to know anything about the outside world, all it needs is labels for reaching edge routers. But why?! So I have a working environment pictured here:

Ok, to keep things simple I shutdown P3 and P4. Let's see what happens when R1 tries to send traffic to R2's loopback 0 interface.

R1#show ip cef
  nexthop GigabitEthernet0/1
R1#show ip route
Routing entry for
  Known via "bgp 1", distance 20, metric 0
  Tag 1122, type external
  Last update from 20:55:28 ago
  Routing Descriptor Blocks:
  *, from, 20:55:28 ago
      Route metric is 0, traffic share count is 1
      AS Hops 2
      Route tag 1122
      MPLS label: none 

Alright, nothing fancy yet, we see a next of of (PE1's Gi0/3). Moving right along, let's go over to PE1 and see what's happening.

PE1#show ip cef
  nexthop GigabitEthernet0/1 label 26
PE1#show ip route
Routing entry for
  Known via "bgp 1122", distance 200, metric 0
  Tag 2, type internal
  Last update from 20:53:59 ago
  Routing Descriptor Blocks:
  *, from, 20:53:59 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 2
      MPLS label: none

Now things are getting a bit more interesting, checkout out our 'show ip cef' output. Label 26... put a pin in that, we're coming right back to it. Ok, so we have a next hop of (loopback 0 of PE2), but cef shows that we're forwarding this traffic out Gi0/1 towards P1. Let's go to P1 and issue the same commands (show ip cef and show ip route for 172,16.2.2).

P1#show ip cef
  no route
P1#show ip route
% Network not in table

You're reading that right, no route. I can smell your fear from my monitor. Don't panic, allow me to explain. Go back to the output from PE1. remember that label 26? Let's follow the label brick road, and see where that takes us.

PE1#show mpls forwarding-table | i 26
28         26   0             Gi0/1
P1#show mpls forwarding-table | i 26
26         26   9772          Gi0/1
P2#show mpls forwarding-table | i 26
26         Pop Label   73            Gi0/4 

Alright, so PE1 has an LSP (label switched path) to reach the loopback of PE2. For brevity, just take my word that the reverse is true. On the second to the last label hop, per MPLS standard, P2 performs a pop operation and forwards the traffic on to PE2. Alright, to put a bow on this thing let's tie together the final peices of the pie, by answering a question I feel like some of you must be asking yourselves. Why does it matter whether or not PEs have an LSP between their loopbacks? For that matter, how is it remotely related to R1 and R2 communication? To answer that, first let's look at the show bgp ipv4 unicast output and a show run | sec router bgp on PE1.

PE1#show bgp ipv4 unicast | beg Network
     Network          Next Hop            Metric LocPrf Weight Path
 *>           0             0 1 i
 *>i              0    100      0 2 i
PE1#show run | sec router bgp        
router bgp 1122
 bgp log-neighbor-changes
 neighbor remote-as 1122
 neighbor update-source Loopback0
 neighbor next-hop-self
 neighbor remote-as 1

Very similar output can be observed on PE2. The magic that's happening here is PE1 knows to reach via (loopback0 on PE2). PE1 also knows that it has an outgoing label of 26 to reach So instead of just forwarding traffic to, or requiring a label specifically for that prefix, it just encapsulates the traffic MPLS with a label for PE2's loopback 0 interface. Effectively tunneling this communication through the core network to PE2. So in short... magic. Finally, as promise, so wireshark output from R1 pinging R2.

So what you'll see in the following screenshots is

1. R1 forwards traffic PE1, stock standard IP.
2. PE1 forwards traffic to P1 using label 26 (to reach PE2).
3. P1 forwards traffic to P2 also using label 26 (just a coincidence they share the same label for
4. P2 does a pop operation and forwards the original IP packet from R1 to PE2 with no label. Then obviously PE2 forwards the ICMP packet to R2.

Link from R1<>PE1

Link from PE1<>P1

Link from P1<>P2

Finally, P2<>PE2

Well, that's it for this one guys and gals. Happy New Year, and I'll see you in 2016!!

Video here