Interpreting machine learning models: An overview with applications to German real estate data
Abstract
Machine learning models have demonstrated huge improvement in examining complex patterns, which allow them to make predictions about the unobserved data. While the accuracy of these models increases over time, so does complexity, which makes them extremely difficult to interpret.
There are many problems where accuracy is the main focus of machine learning applications, but some cases also require model interpretability. This thesis seeks to present and apply some of the most prominent methods in the relatively new field of interpretable machine learning. In our application, we use these methods to interpret a random forest model which is predicting the monthly rent in a dataset about German real estate. Through this interpretation, we discovered that methods such as Permutation Feature Importance, Partial Dependence Plots, and ALE Plots visualize the mechanisms of the random forest in an easily understandable way. We also analyzed individual predictions with the LIME algorithm and Shapley Values and found that they can provide interpretable explanations of how those predictions were produced. However, while experimenting with the LIME model, we have noticed slightly unstable results produced by this algorithm. So, we offer our solution to this problem by using K-Nearest Neighbours as a sampling method for LIME instead of its own random perturbation technique for sampling observations.
In summary, based on our findings, we conclude that the interpretable machine learning methods can provide comprehensible explanations of model mechanisms, but they still have some limitations when it comes to explaining the more complicated processes in the model.