The use of textual data analysis and machine learning in bankruptcy prediction : evaluating the predictive power of sentiment scores and ratios from news articles for bankruptcy prediction in the Norwegian market using machine learning
Master thesis
Permanent lenke
https://hdl.handle.net/11250/2738638Utgivelsesdato
2020Metadata
Vis full innførselSamlinger
- Master Thesis [4380]
Sammendrag
In this thesis, we investigate whether there is predictive power in sentiment scores and
ratios derived from news articles with regards to bankruptcy prediction of Norwegian
private limited companies. Our analysis is based on Norwegian news articles and annual
accounts from the Brønnøysund Register Centre. We derive sentiment scores and ratios
by performing lexicon-based sentiment analysis on the news articles. The sentiment scores
and ratios are averaged for four different time observation periods and are then matched
with their belonging companies. Furthermore, we utilize Altman’s five financial ratios to
form our financial variables. Our models including both Altman’s financial ratios and
sentiment variables are in our analysis compared to a reference model only including the
financial ratios.
In order to assess the problem we develop models using two different techniques,
Generalized Linear Modelling and xgboost. Our emphasis is on comparing models with
sentiment variables to reference models without sentiment variables in order to examine
the potential predictive power of sentiment. We assess different model configurations,
taking into account both different news observation periods and bankruptcy prediction
horizons. The scores and ratios from the news observations are included on different time
lags, ranging from 1 to 12 months prior to the announcement of annual accounts. The
performance of the models is measured in AUC and balanced accuracy. In addition, we
examine the average marginal effects in the developed GLMs and variable importance in
the xgboost models.
The results of the applied methodology indicates that there is no significant improvement
when including sentiment variables. The reference models utilizing only financial ratios
tend to perform better than the models including sentiment variables in terms of AUC and
balanced accuracy. In terms of marginal effects and variable importances, the financial
ratios also tend to outperform the sentiment variables. Furthermore, we provide a nuanced
discussion based on the presented approach and results, and point to further research
approaches that we find promising.
Keywords – Bankruptcy Prediction, Textual Data Analysis, Sentiment Analysis,
Predictive Analytics, Machine Learning, Big Data, xgboost, GLM