“😮pen” networking thoughts
I have written about “open” in the context of networking quite a bit, it is a theme that persistently comes back. For example, today we’re having an open discussion about how to best organize and evolve the configuration of devices in the context of Netsim-Tools, an open source networking project.
In this blog post I’d like to elaborate on my thinking, to invite feedback in order to improve our collective improvement efforts.
OpenConfig and model-driven configuration
My PR to add support for SR OS and SR Linux was recently merged with the development branch; it adds support for gNMI-based provisioning using a mix of OpenConfig and platform-native YANG models, with Containerlab as the virtualization provider.
OpenConfig has been around for a while, championed by the likes of Google who had a need to configure a multi-vendor network at scale. It provides a generic configuration model for common features and settings required for networking (think interfaces, IP addresses, BGP parameters, and the like). It does not cover all possible situations, also because vendors have a need to differentiate their platforms — but it should cover 80–90% or so.
Today, OpenConfig remains less commonly used (to my knowledge) — in part because you can do everything you need (and more) with each vendor’s proprietary configuration models/methods, in part because OpenConfig itself is somewhat cumbersome to use and superfluous in its representation of configuration parameters, in part because most vendors prefer to promote and teach their own proprietary ways — for various reasons.
In my PR I illustrate how to use OpenConfig for SR OS, starting with interfaces and basic BGP configuration. In theory, it could be used for other vendor’s devices as well — for example, Junos OS and IOS XR support it (see also this elaborate SR example by Anton, from 2019). As Ivan explains here, getting rid of “snowflakes” and moving towards a uniform multi-vendor service data model leads to a better, simplified network that is easier to configure and test. Less variation means more stability, less change.
Consistency through incremental transactional changes
Using the Nokia gRPC driver for Ansible, OpenConfig or native device configuration becomes a list of (C)RUD operations (replace, update, delete). For each invocation, the driver first collects a snapshot of the current config tree(s) being targeted, such that it can evaluate whether the input command(s) represent a change. Then, it generates a single gNMI SET transaction to apply the requested changes atomically (i.e. apply all changes successfully, or none of them)
YANG data model consistency checks
YANG data models contain functionality to express and enforce consistency of configuration. For example, BGP peering is layer 3 functionality; it cannot be applied to layer 2 services. Hence, the SR Linux YANG model for BGP checks that the network-instance type is not “mac-vrf” using a MUST statement:
This is but a simple example of the benefits of using a model-driven approach to configuration.
Change control and initial state
All networks evolve over time: New connections or devices are added, software is upgraded, customers come and go, requirements change. Changes happen at various time scales, planned or unplanned; even a link failure can be considered a (temporary, unintended) change.
As a recent example of a change in SR Linux: In release 21.6.3 support for MPLS LDP was introduced. LDP uses TCP/UDP port 646, so the default system CPM ACLs were modified to allow this protocol through. This caught some Containerlab users off guard, as it was providing a fixed startup config based on an earlier platform release (with ACLs that blocked LDP). Subsequently, a more dynamic way to update the existing base platform config was introduced, to try and avoid such issues going forward.
Dilemma: The evolving scope of initial state and technical debt
The CPM ACLs on SR Linux are but one example of the complex, evolving system state that needs to be captured and defined. Having a tool like Netsim-Tools generate a ‘startup-config’ that (tries to) completely define a given platform’s state introduces technical debt, as the scope of the initial config is sure to change across releases.
On the other hand, it is impossible to avoid the dependency of the correctness of a sub-system’s state on the behavior of an over-arching system; there has to be some starting point. And while it is possible to take the entire contents of a virtual file system as ‘initial state’, it is much cleaner and manageable to start from a single file as a baseline.
There is no need to standardize on a single approach here — every platform module can implement its own preferred approach to this ‘initial state dilemma’. We can leave it open.
A choice between what is fixed, and what is variable?
A former professor of mine who taught Object Oriented modeling using Design Patterns, often framed things in terms of “what is fixed versus what is variable”. Over the years, I have learnt that everything is variable; some things are just more variable than others.
Open (also) means “accessible” and interchangeable
As explained here, part of my motivation to contribute to Netsim-Tools was a desire to help more people learn about networking more easily. The original “BGP Add Path” example uses Cisco vIOS devices, which presents a hurdle in terms of resource requirements and licensing constraints. I tried to address that using an alternative based on SR OS and SR Linux; this reduces the resource requirements, but still requires a license.
In fairness and full transparency, I have since discovered that the shortest path to “open” in this case may actually be Cumulus Linux; its open source FRR package supports BGP Add Path by default. However, given my previous experience working with these virtual devices, I feel less inclined to spend cycles exploring how to make that work. I’ll admit I may be a little biased, but net result remains that it’s probably better for someone else to drive that.
The broader question for me is this: How can we enable mix-and-match topologies in Netsim-Tools, such that users can evaluate and compare different options? I’d like for us to reach a point where there is just a single ‘Add Path’ example, and one can use flags to pick different devices for different roles.