Detecting and Diagnosing Anomalous Behavior in Large Systems with Change Detection Algorithms

01 October 2019

New Image

Large telecommunications networks are designed to achieve high reliability with hardware and software redundancy that is managed through complex fault-tolerant mechanisms for error detection and recovery. Because of the fault-tolerant mechanisms, when errors do occur they do not always cause failures and, hence, it can be difficult to detect anomalous behavior of the system and to determine its root cause. In this paper, using sequential system performance data, we present the application of multivariate change detection algorithms and visual analytics methods for detecting and diagnosing anomalous behavior with low latency in telecommunications systems. Such methods, coupled with domain knowledge, are efficient and effective for detecting and diagnosing anomalies as compared to log analysis. We demonstrate our methods with real data from a large system.