An Approach for Call Logs Anonymization Using Machine Learning
27 June 2014
Today it is difficult and in most of the cases it is impossible for a data owner to release call logs in a completely safe environment without any risk of individuals re-identification. However, call logs analysis is very important for research purposes or for marketing applications. Moreover, since the arrival of the Web2.0 external information available on the Internet significantly increased the risk to allow adversaries to perform re-identification in anonymized data. Methods used to anonymize call logs for a third party release are usually using naive anonymization combined with noise adding or to data modification in order to be robust to a certain type of attack. In a context where the attacks can be very diversified and the measures to be preserved on the data are complex, we proposed in a previous paper a generic method for anonymization using machine learning techniques applied with success on simple graphs. We now propose an approach for anonymization based on two different machine learning techniques applied with success on graphs with oriented multiple timestamped labeled edges. Our solution aims to automatically learn an anonymization parameterized function in a given context and allows a safer release for call logs issued from social networks or from telecommunication networks.