A fast and compact method for unveiling significant patterns in high speed network
06 May 2007
The identification of significant patterns in the network traffic such as IPs or flows that contribute large volume (heavy hitters) or introduce large changes (heavy changers) has many applications such as accounting and network anomaly detection. As the network speed and the number of flows grow rapidly, tracking per-ip or per-flow statistics becomes infeasible due to both the computation and memory requirements. In this paper, we propose a novel sequential hashing scheme that requires only $O(Hlog N)$ both in memory and computation that are close to being optimal, where $N$ is the the number of all possible keys (e.g., flows, IPs) and $H$ is the maximum number of heavy keys respectively. Moreover, the generalized sequential hashing scheme makes it possible to tradeoff among memory, update cost, and detection cost in a large range that can be utilized by different computer architectures for optimizing the overall performance. In addition, we also propose statistically efficient algorithms for estimating the values of heavy hitters and heavy changers. Using both theoretical analysis and experimental studies of Internet traces, we demonstrate that our approach can achieve the same accuracy as the existing methods do but using much less memory and computation.