Predictive modelling of customer claims across multiple insurance policies : an empirical study of how individual customer insurance data can be used to assess customer risk across multiple insurance products by employing machine learning and advanced ensemble techniques

Høysæter, David; Larsplass, Endre

Høysæter, David; Larsplass, Endre

Master thesis

Åpne

masterthesis.pdf (3.120Mb)

Permanent lenke

https://hdl.handle.net/11250/2679811

Utgivelsesdato

2020

Metadata

Vis full innførsel

Samlinger

Master Thesis [4372]

Sammendrag

In this master thesis, we have analysed how individual insurance customer data can be used to

assess customer risk across multiple insurance policies. Our dataset contains 63 variables

about the characteristics of each customer and five associated response variables provided by

Frende Forsikring. We have modelled the responses for claim propensity, claim frequency,

and total claim size for each customer. To evaluate the value of this customer data, we have

used multiple machine learning algorithms. These include XGBoost, LightGBM, random

forest, GLM and deep neural networks. We have also used different ensemble techniques to

gain further performance improvements from these models.

By comparing results achieved using customer insurance premium as the only explanatory

variable to the results achieved using all the additional customer characteristics we could

observe a considerable increase in predictive performance. Our findings show that gradient

boosting techniques can increase performance compared to generalized linear models. We

also observed that using multiple models in ensembles can increase performance compared to

any single model when assessing customer claim propensity and frequency. Although we

found stacked ensembles using multiple underlying models to provide increased performance

when used on claim propensity and frequency, we found a strong case for the use of

generalized linear models when modelling total claim size. Our thesis proposes a novel threestep ensemble model that uses claim propensity and claim frequency to determine the total

claim size of a customer, which may improve performance of total claim predictions.

Overall, our results show promise in using individual customer data to supplement the

traditional individual policy risk assessments. The results also underline the potential of

advanced ensembles to increase predictive performance on the individual customer data. The

results accentuate the importance of selecting the appropriate models and suitable error

metrics to achieve good predictive performance across different response variables. Our

findings illustrate the transparency issues associated with using highly flexible statistical

learning tools when compared to generalized linear models.