On finding🔍balloons🎈 in data center networks 🖧 micro-detection using adaptive ➰ feedback loops

Using #SRLinux for flexible decentralized DDoS attack detection #SecDevOps

Jeroen Van Bemmel
5 min readMar 16, 2022
A distributed problem (“attack”) may require a distributed solution (source)

Distributed Denial of Service (DDoS) attacks continue to be a major problem for many network operators and their customers. As in most networking problems, the key issue is scale: Attackers are able to mount an amplified attack using many (N) sources to send large (M) payloads to a single (1) target server, causing link and CPU saturation and system overload

N*M >> 1 ~> overloaded system(s) and unhappy customers

Much like security in general, solving DDoS attacks is a continuous process, not a one-time product or solution deployment. While most operators have deployed DDoS mitigation solutions, there will — unfortunately — always come a time where the current solution falls short, and something else or more is needed.

#DevSecOps: Shortening feedback loops

Feedback loops (credit: Peter Phaal / Tim Cochran)

Back in 2011 Peter Phaal wrote a blog about “Delay and stability”. Even though more than a decade has passed since, one can easily see how triggers like AWS outages haven’t changed — these points remain relevant and valid today:
✅ Measurement(observability) plays a critical role in data centers; it is the
foundation for automation (more on this topic)
✅ Adaptive systems require low measurement delays and short feedback loops to achieve good overall stability of the system (network +its customers/users)

Tim Cochran wrote a related story in 2021, about “Maximizing Developer Effectiveness”. He highlights that
✅ People under-estimate the cost of small inefficiencies
✅ Adoption of #DevSecOps principles is a cultural change, more than a technological one; it is about removing friction

“They also might want to improve their agile and product techniques to respond faster to feedback and signals in the market.” DDoS attacks are one such signal.

The underlying point I’m trying to make, is that dealing with DDoS attacks involves much more than just the small technical parts discussed here. I am well aware of that, and this is certainly not the only or “best” solution you would ever use; far from it. Rather, think of it as a potential piece 🧩 in a larger puzzle.

DDoS attack mitigation

DDoS attacks have been around since the beginning of networking, and recent events and geopolitical conflicts seem to have made matters worse.

Example DDoS attack from December 2021

To some extent, defending against a past DDoS attack risks putting the horse behind the cart; by the time a solution gets deployed, things may already be different. Still, we can only start from a (recent) historic perspective:

Anatomy of a recent botnet DDoS attack (source: Nokia Deepfield State of DDoS 2021)

Note the focused targeting of a small set of target IPs, outnumbered by a 5000:1 ratio. Whether this is considered as 1 attack or 3 separate attacks, the nature of the attack (a TCP botnet using valid client IPs and indistinguishable packet headers) makes it hard to deal with.

DDoS attack mitigation stages

Conceptual model of DDoS mitigation stages, using SR Linux nodes for decentralized detection

Divide and conquer DDoS — how distributed eats centralized for breakfast

A fully distributed approach is the only way to go” — the only scalable way to deal with a distributed attack, is to distribute(spread out) the responding processing resources. In the context of a leaf-spine CLOS fabric, this means moving the detection points down towards the servers, and (potentially) increasing the number of ECMP paths and inserting (some might say “service chaining”) additional resources to handle the incoming flood.

sFlow based DDoS detection

Response by Peter; recommended settings are here

One mechanism that can be leveraged, is software or hardware based sFlow sampling. Different parameters in sampling rate and interval represent different trade-offs between accuracy and responsiveness of detection, versus resource usage.

DDoS detection at various layers in the data center network

As illustrated above, DDoS attack detection can conceivably be performed at multiple different points in the network, ranging from the edge routers / DC Gateways (1) down to the leaf ports facing the servers (5) — or even the servers themselves. In case of Nokia 7750, the Edge routers have dedicated silicon to handle high traffic volumes; the features and available ‘logic’ are relatively fixed. Down towards the first spine layer, the Nokia 7250 IXR platform supports hardware based sFlow (using Broadcom silicon); this sample stream could be sent to an external analyzer, or to a local agent. Finally, the last layer of Nokia 7220 IXR D2(L)/D3(L) has software based sFlow support; this is more flexible, but less performant.

BGP Flow spec

Many DDoS solutions use BGP Flowspec (RFC8955) to signal filtering and rate limiting rules to upstream routers. Open source projects like Go BGP support these standards, and the Nokia 7750 Edge Routers can apply these dynamic instructions to assist in attack mitigation.

Prototype

Over the next couple of weeks we will be working on a prototype to explore and validate these concepts, to create a broad spectrum of usable options:

✅ Enhanced Subscriber Management (ESM) for dynamic rate limiting

✅ Go BGP using BGP FlowSpec towards the 7750s, based on attacks

✅ DDoS attack detection using sFlow and/or other performance metrics (e.g. 1:25000 with 20s intervals to detect flows that use 10% of 25Gbps)
+ Note that sFlow works both ways, and can also see server-initiated DDoS

Stay tuned for more technical details!

--

--

Jeroen Van Bemmel

Sustainable digital transformation at Webscale — real life stories about our discoveries in the world of networking. Views represented are my own.