Balancing Data Protection and Model Accuracy : An Investigation of Protection Methods on Machine Learning Model Performance for a Bank Marketing Dataset
MetadataShow full item record
- Master Thesis 
The practice of sharing customer data among companies for marketing purposes is becoming increasingly common. However, sharing customer-level data poses potential risks and serious problems for businesses, such as substantial declines in brand value, erosion of customer trust, loss of competitive advantage, and the imposition of legal penalties (Schneider et al. 2017). These may eventually lead to financial loss and reputation damage for the companies. With the growing awareness of the value of personal information, more companies and customers are concerned about protecting data privacy. In this paper, we used marketing data from a Portuguese bank to explore methods for balancing prediction accuracy and customer data privacy using various machine learning and data privacy techniques. The dataset includes observations from 45211 respondents and the observation period is from May 2008 to November 2010. Our goal is to find a method that enables third parties to share data with the bank while safeguarding customer privacy and maintaining accuracy in predicting customer behaviour. We tested several machine learning models: Logistic Regression, Random Forest, and Neural Network (feedforward) on original data and then chose Random Forest, which gave the best prediction performance, as the model to proceed to explore. After using two different data privacy methods (Sampling and Random Noise) on the original data, we found the Random Forest model gives us accuracy levels that are very close to the accuracy before using the privacy methods. By doing this, we demonstrated a method for companies to protect customer data privacy without sacrificing predictive accuracy. The results of this study will have significant implications for companies that seek to share customer data while maintaining high levels of privacy and accuracy.