Human error zero: The path to reliable data center networks
Behind every truly great what is a really important why.
It's hardly controversial at this point to suggest that a company's underlying mission is part and parcel of the products and services they deliver. I'd go one step further: If forced to choose between assessing a company's portfolio and its purpose, it's purpose that will be a better indicator of long-term trajectory.
So what is driving Nokia's pursuits in data center networking? Three words: human error zero.
Networks versus networking
Now while it's three simple words, there is a lot to unpack. Let's start with the foundation. Networks are fragile because networking is hard. The distinction between networks and networking is subtle but crucial. Networks are defined by the hardware, the software, the protocols and the technologies that enable connectivity. Networking, on the other hand, is defined by the people, the process, the tools and the techniques.
Our industry is good at building networks. Data centers are built with fabrics. These fabrics are composed of predominantly merchant-silicon-based devices that run a standard set of protocols. And while there are differences in how these devices and protocols are implemented, the best practices that underpin data center architectures are mostly well established and accepted across the industry. It should be no surprise that the Nokia data center portfolio includes these devices and capabilities.
Networking, however, is a different story entirely.
Networking is a practice dominated by specialists well trained in stitching together the devices and software necessary to build a fabric. It's an extremely hard proposition. You have to get literally thousands of commands distributed across dozens or hundreds of devices set just right. Then, and only then, do you have a functional network. And the larger the network and the more diverse the requirements, the more complex this becomes.
And complexity is a killer.
We know this, of course. It's bad enough that we implement draconian change controls. Change windows and holiday freezes are artifacts of this complexity. And I dare you to ask your teams if they want to push changes to production on a Friday afternoon.
Speed isn’t everything
Talk to almost any IT executive about their strategic priorities, and they will mention agility or speed or perhaps even automation. But how can you automate anything when just making it work requires herculean efforts?
The short answer is that you can't.
And yet our industry is full of companies with missions that are all about delivering agility or allowing enterprises to move at the speed of business. They are obsessed with speed, and their products promise faster this or more efficient that. We see this in the proliferation of workflow engines and APIs and all manner of things required to build a bigger automation engine.
The fastest way to break things at scale
But what if our industry's collective challenges in solving for operations are anchored to something deeper? What if we have been pursuing the wrong why all along?
Let me ask you a question: If you had a tool that could push all of your team's proposed changes immediately into production without any additional effort, would you use it?
The right answer here is unquestionably no. Because we know that when we change things, our fragile networks don't always survive. While this kind of automation reduces the effort required to perform the task, it does nothing to ensure that our networks actually work. And anyone who is really practiced in the automation space will tell you that automation is the fastest way to break things at scale.
Predictable automation can eradicate human error
Don't get me wrong—I am not down on automation. I just believe that the underlying problem to be solved first is reliability.
We have to eradicate human error.
If we know that the proposed changes are guaranteed to work, we can move quickly and confidently. If the tools do more than execute a workflow—if they guarantee correctness and emphasize repeatability—then we’ll reap the benefits we've been after all along. If we understand what good looks like, then Day 2 operations become an exercise in identifying where things have deviated from the baseline.
It's not that speed isn't good. But the path starts with a different purpose. Indeed, the byproducts of driving human error to zero are speed and efficiency combined. And Nokia is delivering on that promise with predictable automation that delivers reliability across data center networks and networking.