dc.description.abstract | Prediction results from complex machine learning models can be challenging to interpret. Understanding these models is essential when trusting results in decision-making. In this master thesis, we will utilize Shapley values to explain individual predictions from a complex machine learning algorithm. Our aim is to explain why prediction models obtain their results, so people can interpret them better.
The chosen case is based on a thesis called “Predicting Financial Distress in Norway” by Zhang and Ye (2019) where they used logistic regression and random forest models. Their thesis predicts whether a company enters financial distress within the next two years or not. In this thesis, we will take advantage of the powerful algorithm in xgboost (extreme gradient boosting). To illustrate the benefits of using a complex model versus a simple model, we will also present a decision tree as our baseline.
Our explanation analysis shows that predictions made by xgboost can be explained with the Shapley value framework to obtain clear and intuitive explanations. Calculating Shapley values for a larger group of predictions enables proper understanding of the model by investigating which feature values lead to what probability increase or decrease of distress. The explanation framework enables detection of possible model bias which sometimes can lead to discrimination. We conclude that using Shapley values as an explanatory framework enables decision-makers to continue using complex machine learning models. This is important, as we find the tool satisfying relevant regulations for decisions made by automatic systems to be explained upon request. | en_US |