Switching a VPN to dynamic routing

16 June, 2020 at 10:00 -0400

Background

For a number of reasons, my "internal" network is physically spread over two locations.

One physical location has a pfSense-based x86 router/firewall combo, which was initially setup years ago. The more-powerful router is a good fit at this location as it is where we’ve located most of the development infrastructure. We make use of VLANs for the different zones, Snort for IDS/IPS, Squid for HTTP caching (on some VLANs), and a few other services.
The other location has a much simpler setup, with an EdgeRouter X as the head of network.

Historically, I had a simple policy-based IPSec tunnel between the two devices, with a phase 2 tunnel for each pair of networks that required connection. While it wasn’t the most ideal situation, as it required intervention on both ends to make any changes, it worked.

Until a few weeks ago.

For some reason, the phase 1 tunnel would establish itself, and the phase 2 tunnels would report to be established, I could observe no IKE traffic on the WAN. Some days would have traffic arriving but not leaving, or vice-versa.

Having talked about it in the past, I figured it would be the perfect time to configure a point-to-point tunnel and do dynamic routing instead.

Goal

My goal out of this was to have a point-to-point link between the two routers, with dynamic route distribution. That would allow us to eventually install a back up link between the locations should the primary get disconnected for any reason. It would also allow for seamless transition between different technologies for the tunnel, e.g from an openvpn site-to-site to a wireguard or ipsec tunnel.

Implementing the VPN

The first attempts were a somewhat horrible failure.

The first difficulty was in terms of configuration. pfSense, EdgeOS (EdgeRouter), and OpenVPN all have slightly different names or configuration patterns for the same concepts. Converting the various idiosyncrasies of the various implementations took the better part of a day, which finally resulted in a working point-to-point (or site-to-site) tun device (layer 3) vpn. One interesting thing is that on one end of the tunnel, it’s configured as a /30, with two usable IP addresses, while on the other end, it’s configured as a /32, with a manual route to reach the /32 of the other endpoint.

Implementing the routing

We settled on OSPF, as I didn’t want to setup BGP and have to start managing internal AS numbers. EdgeOS includes the Quagga suite of routing deamon, while pfSense made the switch to FRR, a fork or Quagga.

However, their configurations are wildly different, as EdgeOS uses their own distribution of Vyatta, while pfSense has its own WebUI for configuration.

In the first iteration, the two OSPF routers were configured without an "Area 0". The assumption was that the ID of the areas does not matter. However, in OSPF, area 0 (or 0.0.0.0 in dot-decimal representation) has special meaning: it is the backbone area, over which the routing updates are shared. It seems that without an area 0 defined somewhere in the OSPF network, routes are not properly distributed.

With that figured out, next was getting the networks that needed to be advertised to show up at either end. Instead of adding all the networks to a single area, which would increase the load of that area. It took a while to come up with a valid configuration, but the final solution was to create an OSPF area for every network, and have the routers advertise on that area. This way there’s no fiddling with stub areas or anything like that, and it allows us to deploy the backup links as backup routers anywhere within the existing network instead of necessarily at the head of network.

At this point, we’ve had the site-to-site tunnel up for a little over 2 weeks, and we’ve successfully tested a manual tunnel swap (set up an alternate, make the original more expensive than the alternate, remove the original)!