Home 🏠 is where the MC-LAG is — multi-vendor EVPN adventures with #SRLinux
A recent blog reminded me of network migrations and vendor proprietary versus open standards based architectures, and while the author does a decent job of walking through the mechanics of a migration strategy, I was left wondering: Isn’t there a better, multi-vendor way?
After all, virtual Port-Channel (vPC) is still a vendor proprietary mechanism; moving from FabricPath to BGP/EVPN/VxLAN while keeping VPCs effectively moves and perpetuates that dependency, as opposed to solving it. As Ivan likes to point out, I suppose it depends on the problem you are trying to solve — and if your business is centered around a given vendor, eliminating the need for their special features is not your highest priority. I totally get that. Still, the article does mention EVPN Multi-Homing as a standards-based alternative — and it so happens there is a NOS for that.
Delivering a hands-on experience 👏
Blogs are great, and solution guides are useful — but nothing quite gets a new technology across like a hands-on lab experience. The Nokia SR Linux Advanced Solutions Guide has an entire section about various multi-homing deployment options, and I decided to pick one as a first proof-of-concept: Using multi-homing as all-active MLAG for non-EVPN layer-2 BDs
My auto-config agent already performed most of the groundwork, so I decided to extend it to support L2-only EVPN on the leaves: https://github.com/jbemmel/srl-self-organizing/tree/main/labs/evpn-mh-as-mc-lag
Full transparency: This did open what one might call a bit of a can 🥫 of worms 🐛:
- The additional multi-threaded gNMI provisioning crossed a threshold and started to throw errors due to concurrent access; I had to modify pyGNMI to ensure correctness by serialization through locking
- Lags require the provisioning of a port member speed, and different types of routers have different speeds on different ports. I settled on “all ports are 100G” for now, but obviously YPSMV…(Your Port Speeds May Vary)
- The provisioning of leaf pairs required the extension of MC-LAG discovery to the “V-shaped” topology illustrated above, including the case of 2 (or more, up to 4) local links per LAG, correlated to the same construct on the paired switch.
- Ethernet Segment Auto-Discovery as described in RFC7432
Multi-vendor interop 🧩
Even when using the same baseline specifications, vendors may still find ways to put their unique spin on a given technology. For example, Cumulus (NVidia) uses Type-3 ESI values and various implementation choices that made for an interesting exercise of “find that knob”. Still, the goal was achieved: Undeniable working proof that EVPN Multi-Homing is a standards based technology that can work across vendors.
The ones that are #truly open, at least…