Predicting defaults in the automotive credit Industry : an empircial study using machine learning techniques predicting loan defaults
Abstract
This master thesis explore the potential of Machine Learning techniques in predicting default of
vehicle loan applicants. Usually, banks or other financial institutions utilize the Logistic
Regression algorithm to support their decisions-making process, however more advanced
methods has been proven to advance in classifying default predictions. The data set applied in
this are collected from several institutions, contained contract information, historical credit
information and status, and demografical information of more than 240 000 granted loan
applicants.
The results from four different machine learning techniques; Random Forest, Gradient Boostin
Machines, Support Vector Machines and Neural Networks, were compared to the benchmark
model; Logistic Regression. From the study, the Neural Network were found marginally better
than the Logistic regression. Notably, all models were trained and tested on identical data set,
however separated the fitting, validation and the testing in three data sets with similar features.
However, due to time- and computational constraints, the models was not fully exploited in
terms of tuning the hyperparameters.
The best performing model, Neural Network, achived an AUC of 0.6349, followed closely by the
Logistic Regression with an AUC of 0.6325. Based on the performance and knowledge of the
models, a conclusion that the Logistic Regression is the best, however the Neural Network has
the best potential in towards future research due when data qualty.