Skip to main content

An O(NlogN) algorithm for clustering mixture of two Gaussians

01 April 2019

New Image

Mixtures of Gaussians are ubiquitous for modeling high dimensional data. In this paper we tackle the problem of clustering to recover the assignment of points to different Gaussians. We show that in the 2-Gaussian mixture a very efficient algorithm can be used to cluster the data in 1D while using only a handful of random projections to recover the clusters with high accuracy. We provide rigorous probabilistic analysis and guarantees on the error that can be achieved by our algorithm as well as the number of projection required. Our analysis is validated with empirical results in which our algorithm is shown to perform within the provided bounds with running time that is linear in the dimension and number of data points.