dc.description.abstract | This thesis studies the impact of sentiment on the prediction of volatility for 100 of the largest
stocks in the S&P500 index. The purpose is to find out if sentiment can improve the forecast
of day-ahead volatility wherein volatility is measured as the realized volatility of intraday
returns.
The textual data has been gathered from three different sources: Eikon, Twitter, and Reddit.
The data consists of respectively 397 564 headlines from Eikon, 35 811 098 tweets, and 4
109 008 comments from Reddit. These numbers represent the uncleaned data before
filtration. The data has been collected for the period between 01.08.2021 and 31.08.2022.
Sentiment is calculated by the FinBERT model, an NLP model created by further pre-training
of the BERT model on financial text. To predict volatility with the sentiment from FinBERT,
three different deep learning models have been applied: A feed forward neural network, a
recurrent neural network, and a long short-term memory model. They are used to solve both
regression and classification problems.
The inference analysis shows significant effects from the computed sentiment variables, and
it implies that there exists a correlation between the number of text items and volatility. This
is in line with previous literature on sentiment and volatility. The results from the deep
learning models show that sentiment has an impact on the prediction of volatility. Both in
terms of lower MSE and MAE for the regression problem and higher accuracy for the
classification problem.
Moreover, this thesis looks at potential weaknesses that could influence the validity of the
results. Potential weaknesses include how sentiment is represented, noise in the data, and the
Absftarcatc tthat the FinBERT model is not trained on financial oriented text from social media. | en_US |