Troubleshooting VXLAN MTU issues with SR Linux

Giving your network engineers a fighting chance with the industry’s most truly open NOS

Jeroen Van Bemmel
3 min readOct 10, 2022
VXLAN over UDP adds 50 bytes to every packet

A recent blog reminds us that VXLAN overlay networks can be tricky to troubleshoot, as a typical encapsulation header adds 50 bytes per packet and RFC7348 simply assumes that the network is configured to support such larger payloads. And unlike most other IP packets, typical VTEPs will simply discard frames that are too large — without informing the sender.

Tailored troubleshooting commands in Linux

A common VXLAN underlay network is an IP fabric with IPv4 VTEPs at the leaves, using multiple spines with ECMP for redundancy and capacity. In Netlab such a topology might look like this:

netlab up -d srlinux -p clab vxlan-bridging-leaf-spine.yml

Spine-leaf VXLAN topology with ECMP

To troubleshoot suspected MTU issues in this context — say when a ping from h1 to h2 fails — one would verify the MTU on all paths between the VXLAN VTEPs l1/l2, i.e. l1->s1->l2 and l1->s2->l2. To simplify this task on SR Linux, we can add a custom /tools command for traceroute:

/tools vxlan-traceroute mac-vrf vlan1000

/tools vxlan-traceroute output confirming MTU 1550

As can be seen from the above screenshot, this custom vxlan-traceroute command takes a single parameter: The target L2 mac-vrf (vlan) to check. Given a vlan which corresponds to a particular 24-bit VXLAN Virtual Network Identifier (VNID), it lists all the VTEP IPs that are participating in this service (as signaled by the EVPN control plane using BGP). It then uses the standard Linux ‘traceroute’ command to check the MTU towards each VTEP, using the current leaf VTEP IP as source. Similar to VXLAN, traceroute uses UDP packets to validate the data path, and can discover the maximum usable MTU. It also discovers multiple ECMP paths (10.1.0.2 and 10.1.0.6 respectively, via each spine).

None of the above is rocket science — we’re talking about a ~100 line Python script that queries a YANG data model and invokes a Linux CLI command — but it illustrates the simple power you get from having a NOS that allows you to customize things this way.

Note that all the software used in this blog is freely available for download, without registration or licensing requirements. So if you want to experience how this works for yourself, or have a better idea for the perfect troubleshooting tool: Go for it!

Netlab installation instructions
Containerlab resources

--

--

Jeroen Van Bemmel

Sustainable digital transformation at Webscale — real life stories about our discoveries in the world of networking. Views represented are my own.