Autonomous Operations - is it time for a new Network Operating System?
Introduction: the Quest for Network Nirvana
Imagine a network that can sense, think, and act. Imagine a network that seamlessly orchestrates its resources to deliver optimal performance and customer experience. Imagine a network that automatically delivers what your customers want, all the time.
You might call this state of telecoms enlightenment ‘Network Nirvana’ – a system that can fulfill zero wait, zero touch, and zero trouble experiences for your customers. That means delivering services without delay, without the need for human intervention, and with flawless performance. At Nokia we are working towards this vision, building what we call AVA Autonomous Operations. Our AVA software is both an operating system (OS), and a set of apps designed to boost CSP productivity. A good analogy here would be Microsoft Windows (an OS) that provides a common set of re-usable capabilities, and Microsoft 365 (a suite of apps formerly known as Office 365).
Are we there yet?
The TM Forum has proposed a 5-step framework to chart the journey of CSPs towards Network Nirvana (AKA Level 5, full Autonomous Operations!).
So how close are CSPs to Level 5, full Autonomous Operations? Nowhere near according to a 2022 TM Forum survey. Despite investing heavily in automation and AI for several decades, CSPs remain a long way short of their ultimate destination. The survey revealed that nearly 85% of CSPs were stuck at Level 1 (Assisted Operations) or Level 2 (Partial Autonomous Operations).
This stark picture begs the question: why has progress been so slow? The reality is that progress towards Autonomous Operations has been hindered by a Triad of Troubles: Cloud Complexity, Data Diversity, and Automation Adversity. We need to overcome these challenges to achieve Network Nirvana – that is, a network that can sense, think and act.
Cloud Complexity – the need for sensing
Of course, telecoms has always been complicated – but this has been exacerbated by the move towards cloud-native architectures and the resulting dis-aggregation of networks. Network functions are now split into multiple microservices running in distributed containers. Appledore Research posited that the adoption of cloud-native technology would result in a 100-fold increase in network events. This raises many questions. How can CSPs cut through the noise to gain a real-time view of network performance? How can CSPs understand the dependencies between different cloud infrastructures, platforms and applications – and in turn how those impact the delivery of services? How can they troubleshoot these complex cloud networks to maintain the Quality of Service that customers expect?
Digital twins and observability are already key components of the new network OS, enabling CSPs to ‘sense’ and better understand their environment. Digital twins play a leading role in unravelling cloud complexity, making it easier to visualize the complex relationships between cloud infrastructures, platforms and applications. Observability provides enhanced contextual awareness and understanding, using technologies such as eBPF. For example, picture a scenario where memory usage is growing excessively in a 5G network function. Without urgent action, the Kubernetes pods hosting the network function could run out of memory with negative impacts on service quality and customer experience. Observability would provide a ‘full stack’ view including memory usage trends, dependent services and impacted KPIs. Consequently, the root cause is attributed to increasing network traffic with a recommendation to scale up the capacity of the pods to assure service availability.
Data Diversity – a case for clear thinking
Effective AI is central to Autonomous Operations, enabling networks to ‘think’ and make intelligent decisions – for example on the optimal allocation of resources. But even the most sophisticated AI algorithms are useless without a continual stream of high-quality data to fuel them. According to a recent survey by Analysys Mason, 56% of CSPs view data preparation as a significant challenge.
The extreme diversity of telecoms data compounds the challenge, leaving CSPs to grapple with an unprecedented variety of data sources and vendor proprietary formats that make effective governance a nightmare. For example, a simple software upgrade implemented on base stations in just one part of the network can easily break a finely-tuned AI model.
So, what's the remedy? Well, a data mesh architecture is part of our proposed Network OS to help tame data diversity. Data meshes replace hard-to-manage data lakes and generate carefully-curated data products. These data products are exposed to a catalogue via APIs, allowing data scientists to create new AI use cases in a matter of weeks, not months. Better quality data also increases the accuracy of ‘what if?’ simulations used by digital twins. For instance, a CSP may want to model the cost and environmental impact before rolling out a new cluster of base stations. A digital twin would empower a CSP to make better decisions by simulating alternative traffic profiles and forecasting energy consumption for different designs and hardware configurations.
Automation Adversity – joined up action required
To realize Autonomous Operations we must act efficiently on the insights provided by AI – and for that we need ubiquitous automation to deliver zero wait, zero touch and zero trouble services at scale. This requires us to overcome the challenge of automation adversity, which includes siloed tools, legacy processes and the ‘change fatigue’ that has built up over the years. Indeed, automation adversity is one of the toughest challenges for CSPs to overcome, with only 6% so far achieving zero touch automation according to the Analysys Mason survey .
I believe that a single, unified inventory system could be one way to help close this automation gap. Automation needs to be built on a solid foundation and ensuring that you have an accurate and up-to-date view of all network resources, topologies and services seems like a good starting point! Cross-domain orchestration would be another key capability in our Network OS ‘toolbox’. For example, in Telenor’s 5G VINNI network, cross-domain orchestration automates the slicing lifecycle for more than 30 different use cases. Zero touch automation is used to design, deploy and assure network slices – and this is done end-to-end, across Telenor’s multi-vendor network.
Sensing, thinking, and acting in harmony: a network security scenario
How might this all work together and enrich an application used by a CSP? Let me give you the example of a security analyst who wants to preserve the integrity of the network. Firstly, a machine-learning model builds up a baseline of what ‘normal’ looks like, assigning dynamic threat scores to every single network function. Subsequently, an anomaly is detected and flagged to the analyst: an unknown entity is trying to establish a TCP/IP session. Now an eBPF program is deployed to investigate further, and reports that a potential hacker is trying to access sensitive data from a suspicious IP address. The analyst requests a change to the firewall settings, and this is sent to the firewall policy manager via an API. Within seconds a new policy is applied to block the rogue IP address and the threat is mitigated.
Conclusion: time for a new Network Operating System
Autonomous Operations promises zero wait, zero touch, and zero trouble services. However, the path ahead is not without obstacles. CSPs must tackle cloud complexity, data diversity, and automation adversity. This will require a new Network Operating System whose key components and capabilities should include observability, digital twins, data mesh, AI, unified inventory and cross-domain automation.
At Nokia, we've used our expertise to develop AVA Autonomous Operations software that helps you harness the exponential potential of your network. I hope that you are as excited as I am about the possibilities of fully Autonomous Operations. Take a look at this short video to see how our AVA software is already enabling networks to sense with observability, think with AI and machine learning, and act with closed loop automation.