A Nonparametric Bayesian model for the Multiple Annotators problem
02 October 2014
We propose a generative classification algorithm for a multiple annotators problem, in which the training examples have been simultaneously labeled by a set of imperfect annotators. This algorithm allows us to infer the characteristics (sensitivity and specificity) of each annotator, the ground truth of the training set and build a classifier for test examples. In addition, we consider that the performance of the annotators can be in-homogeneous across the instance space due to several factors like his past experience with similar examples. To capture this behavior, our algorithm uses a Dirichlet Process Mixture Model to divide the instance space in different areas across which the annotators are consistent and resort to variational inference to approximate the posterior of the parameters of the model. Several experiments with synthetic and real databases are performed to prove that the method exhibits high accuracy outperforming state-of-the-art algorithms. In addition, the method offers an interpretable solution and provides an estimation of the performance of the annotators in each of the components, allowing to better understanding the decision process undertaken by the annotators.