The consumable data center network – bon appétit!
Data center and cloud networking have entered an interesting phase.
Private clouds still play an important role and many application workloads that used to run solely in on-premises enterprise data centers are being shifted to the public cloud. That public cloud space is, in turn, seeing a new breed of upstart webscalers challenging the hyperscalers for a piece of this lucrative action. Add to this the enterprises that want to simplify their data center and cloud networking environments, and they want to do this by adopting the NetOps approaches used by hyperscalers.
Whether implementing private or public clouds, if data center operations teams are going to succeed, they need to make their networks as consumable as those of the hyperscalers.
Automation of day 2+ operations
Making the network consumable has several dimensions to it. The most important is automation, which is increasing across all parts of the data center. We may never be able to automate racking a switch, plugging in cables, and powering it on, but almost everything else is on the table – from fabric design to bootstrap and workload deployment.
It’s Day 2+ operations where I see automation heating up the fastest. Central to the idea of consumability is making the network fabric more event-driven than ever. Rather than out-of-band service configuration, the deployment of workloads on the surrounding compute stacks will trigger automated service configuration. This will allow the fabric to be consumed through the same interface used to deploy workloads.
Increased competition and the drive for operations efficiency will drive cost reductions – and thus automation. Operational tasks such as upgrade projects, which are notably expensive, will be prime targets. Operators traditionally tend to avoid them if they can, staying on a specific software release to maintain stability and avoid the expense. This will have to change. Expect to see operators follow the application world where upgrades are smaller, more frequent and – you guessed it – more automated. Look for GitOps to trend and for CI/CD to finally be embraced.
The corollary of more frequent upgrades is more frequent testing. Test automation will be easier with more frequent upgrades, because the blast radius is smaller. In other words, the scope of changes is less, which creates a smaller delta in functionality that needs to be tested.
Troubleshooting the network and mitigating outages will also see changes. Following the DevOps world, pipelines will be used to introduce changes, automatically remediate outages, and handle the low-hanging fruit of event management. Machine learning and AI will play a key role here, with training to automatically mitigate some events.
Embracing openness and extensibility
Having workloads on surrounding compute stacks successfully trigger network service configurations will require stringing together new functionality and integrations. We are already seeing the deployment of small quality-of-life functions or workflow automations. To go further, the network operating system (NOS) and the automation stack need to be extensible in a manageable way.
YANG model augmentations and Custom Resource Definitions in Kubernetes will provide the needed schema and API extensibility to ensure that extensions are managed at both layers. For the NOS specifically, schema normalizations such as OpenConfig are finally getting the needed traction to become viable in the wider market. Support is no longer bolt-on and streaming telemetry now provides support for config and state normalization layers. This kind of manageable extensibility will lower the barrier of entry for new vendors and help promote healthy competition in the ecosystem.
Digital twins gaining mindshare
How many times have outages occurred because the test lab didn’t match the production network? Even with perfect alignment, if the network is in a constant state of flux, there is always risk. Thus, the growing need for a digital twin or sandbox to test topologies, configurations, and the general state of the network. With the adoption of CI/CD into the data center network, digital twins will be on the minds of operators, much as they are on ours.
Digital twin technology is maturing as it undergoes development in other industries. The cost of compute and storage resources also continues to drop, making twinning more affordable. And digital twins can be readily adapted to the network space with the arrival of containerized NOSs. Kubernetes plays a big part in making digital twins accessible by providing the needed orchestration layer. It does a very good job orchestrating containers and stitching them together in relevant topologies that reflect the actual state of the network. Being more dynamic than typical applications, it is critical to run a digital twin of the network as close as possible to what the real network will look like.
Twins will enable operators to run continuous integration, before continuous deployment automatically deploys changes into production — perhaps upgrading a pair of switches first as a canary and letting them soak for a week, before upgrading the rest.
Automating the edge cloud
Ultra-low latency, especially in industrial applications, has been a key theme of 5G and reflects the increasing importance of machine-to-machine communications – thus, the current focus on edge computing and the edge cloud. The use of edge compute resources is highly task dependent, which often means that the relationship between far edge clouds and the data center must be highly dynamic, requiring the service to be rapidly stitched from one to the other. At one moment there might be two containers supporting the service, and the next there might be four — a degree of dynamicity never seen before in networks.
Given the number of different orchestrators at the far edge, central data center, and DCI, orchestration must be multi-domain and seamless, which is a challenge. Initially, automations may be able to manage this with the click of a button, but it will eventually have to happen automatically. Vendors and operators will need to innovate new workflows, so that spinning up additional capacity at the far edge automatically creates all the corresponding pieces in the fabric for a truly self-driving, consumable network fabric.
This is an enormous paradigm shift. We will go from changing the configuration from maybe once a week to thousands of times per day, or even an hour, in the form of new services being deployed and new attachments to those services. Moving services closer to users and applications to decrease latency will also increase edge locations from a handful today to hundreds and even thousands over time.
This shift will create a very heavy dependency on automation when building and deploying fabrics. As workloads expand in capacity, the network must automatically connect them to where they need to be in a fashion that is invisible to the consumer or latency-sensitive industrial process. These are unique problems for both vendors and operators to solve. As operations teams continue to support evolving needs, especially for hosting enterprise workloads, they will have to innovate solutions that make the network rapidly consumable and, ultimately, invisible, constantly moving in lockstep with both consumer and enterprise applications.
Learn more
The Nokia next-generation Data Center Fabric solution helps network operations teams to meet these needs. Driven by SR Linux – a uniquely open, extensible and consumable Network Operating System (NOS) – the solution includes the Fabric Services System, which provides an intent-based NetOps toolkit to automate all phases of the data center fabric lifecycle.
Please refer to the Nokia Data Center Fabric solution page for insights into our unique design, architecture, and extensive network automation capabilities.