Modelling probability of default with machine learning :  how well does machine learning perform and can it replace the standard methods?

Jæger-Pedersen, Ruben

Jæger-Pedersen, Ruben

Master thesis

Åpne

masterthesis.pdf (1.130Mb)

Permanent lenke

https://hdl.handle.net/11250/2739010

Utgivelsesdato

2020

Metadata

Vis full innførsel

Samlinger

Master Thesis [4372]

Sammendrag

In this master thesis we apply a variation of different machine learning techniques on a dataset for credit card clients in Taiwan to model the probability of default.

In this master thesis, we apply machine learning techniques on a dataset for credit card clients in Taiwan to model the Probability of Default (PD). The machine learning methods used were the Logistic Regression, Decision Tree, Random Forest, XGBoost, K-Nearest Neighbor (KNN) and Neural Network. We use Receiver Operating Curve Area Under the Curve (ROC AUC) and Confusion Matrix to assess the performance of each of the models, where the ROC AUC is used as our main performance measurement.

We look into the standard methods of assessing credit and how the General Data Protection Regulation (GDPR) affects machine learning now and in the future.

Random Forest performed the best followed by XGBoost and Neural Network. The difference in ROC AUC score between the top four models were only 0.023, while the worst performers KNN and Decision Tree were far behind.