Machine Learning in Application-Based Case Management: A study on using machine learning to predict decision making in case management processes
Master thesis
Permanent lenke
https://hdl.handle.net/11250/3026864Utgivelsesdato
2022Metadata
Vis full innførselSamlinger
- Master Thesis [4379]
Sammendrag
This thesis studies the possibility of using machine learning to predict the outcome of applications processed by the Regional Committees for Medical and Health Research Ethics
(REK) in Norway. More specifically, the purpose is to predict rejections of medical research applications. Four supervised prediction methods are used to achieve this: Logistic regression, Naive Bayes, Random Forest, and XGBoost. Before training the models, a Latent Dirichlet Allocation topic model is implemented to extract structured features from the textual project description data, making it suitable for the supervised prediction models. The prediction models are evaluated and compared using metrics derived from the confusion matrix, namely Accuracy, ROC AUC, and Cohen’s Kappa. The results show that the methods are suitable for predicting application outcomes, and XGBoost proves to have the best overall performance based on the selected metrics. Moreover, the topic variables from the LDA model prove to be influential to the predictions.
Based on the results, the thesis discusses some use cases of the XGBoost methodology, investigating the possibility of flagging applications predicted by the model to be rejected. Such an implementation aims to help case officers quickly identify applications that likely should be rejected, simplifying the work related to the initial assessment. The thesis finds this feasible but discusses some challenges of implementation. Subsequently, a discussion is made regarding the possibility of using the methodology to reject applications automatically. This is a more radical intervention in the case management system, and further clarification with REK is essential before real-world implementation.
Furthermore, the thesis looks at the weaknesses of the results. A discussion is made regarding the model’s ineffectiveness in adapting to rapid changes in the environment, which is an inevitable issue when it comes to predicting the future based on historical data. In addition, the thesis examines which variables are ethically sound to include as predictors in predicting application rejections, and reflecting upon this issue before real-world implementation is advised.