The Impact of Machine Learning and Aggregated Data on Corporate Insurance Modelling: An Empirical Study on the Prospective Gains of Machine Learning Techniques Using New Data Sources In the Insurance Industry

Hellestol, Tonje; Eriksen, Petter

dc.contributor.advisor	Andersson, Lars Jonas
dc.contributor.author	Hellestol, Tonje
dc.contributor.author	Eriksen, Petter
dc.date.accessioned	2022-09-06T10:38:07Z
dc.date.available	2022-09-06T10:38:07Z
dc.date.issued	2022
dc.identifier.uri	https://hdl.handle.net/11250/3015966
dc.description.abstract	This thesis investigates the potential applicability of machine learning techniques m predictive modelling on corporate insurance customers. The focus is on predicting a binary classification of claim occurrences and a customer's total claim size. Additionally, to illustrate practical usage, the respective best performing models were combined in an experimental setting to predict total expected cost and to identify good customers. The data set is supplied by Frende Forsikring and consist of aggregated customer data. The aggregated data summarizes a company's characteristics, total premiums, number of claims, claim sizes and the policies they hold. Prior to data preprocessing the data consist of 26 293 different companies totaling 116 219 observations and 436 variables. The study is split in two. First, the machine learning techniques CART, Random Forest, XGBoost and Neural Networks are compared with a benchmark GLM. Secondly, the thesis explores the predictive gain of aggregated data by using three input groups: the premium, using the initial aggregated data and using aggregated data with feature engineered time variables. The results show that all machine learning models outperformed GLM when classifying claim occurrences. Additionally, all models showed an increase in predictive capabilities when including aggregated data, but little to no gain including time variables. XGBoost was the best performing model with an ROC-AUC of 0.8457. Resampling techniques did not contribute significantly to the performance to any of the models. In terms of predicting total claim size, no models produced satisfactory results. XGBoost performed best with a RMSE of 271725. The majority of the models performed best with premium as the only feature, indicating that the usage of aggregated data is not suited for predicting the response. Overall, this study shows that machine learning can increase the predictive performance compared to GLMs. The results also indicate that aggregated data have the potential in terms of predicting claim occurrences, and can be used as a supplement in the actuarial world of risk assessment.	en_US
dc.language.iso	eng	en_US
dc.subject	financial economics	en_US
dc.subject	business analytics	en_US
dc.title	The Impact of Machine Learning and Aggregated Data on Corporate Insurance Modelling: An Empirical Study on the Prospective Gains of Machine Learning Techniques Using New Data Sources In the Insurance Industry	en_US
dc.type	Master thesis	en_US
dc.description.localcode	nhhmas	en_US

Tilhørende fil(er)

Filnavn:: masterthesis.pdf
Størrelse:: 3.407Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Master Thesis [4372]

Vis enkel innførsel