Method for Fusing Predictor Levels with Application to Insurance Data
MetadataShow full item record
- Master Thesis 
The aim of this thesis is to determine whether the prediction accuracy of a model can be improved by using a data-driven method to bin continuous variables and group the levels of categorical variables. We use data on the policyholders of one of Gjensidige's insurance products to perform our analysis, and specifically aim to improve Gjensidige's Poisson regression model for predicting claim frequency, where the predictors are binned and grouped manually today. We analyze the effect of using a regularization framework that combines the Lasso method and generalizations of the method that have been adapted to nominal and ordinal predictors. These generalizations constrain coefficients and the differences between them, effectively fusing and selecting predictor levels. By optimizing the resulting objective function in R using the newly developed smurf package (Reynkens, Devriendt & Antonio, 2018), we estimate a penalized Poisson regression model. We reestimate a Poisson regression model using the selected and fused predictor levels as input in order to reduce the bias of the estimates. The resulting model is compared with the model Gjensidige currently uses for predicting claim frequency, to determine the effect of using the data-driven approach. We validate the performance of the prediction models using MSE and AIC as performance measures and find that our reestimated model performs slightly better in terms of prediction accuracy, in addition to reducing the number of parameters used in the model. We conclude that regularization can be used as a data-driven method of binning and grouping predictor levels to improve prediction accuracy.