Prediction of Stock Market Volatility Utilizing Sentiment from News and Social Media Texts : A study on the practical implementation of sentiment analysis and deep learning models for predicting day-ahead volatility
MetadataShow full item record
- Master Thesis 
This thesis studies the impact of sentiment on the prediction of volatility for 100 of the largest stocks in the S&P500 index. The purpose is to find out if sentiment can improve the forecast of day-ahead volatility wherein volatility is measured as the realized volatility of intraday returns. The textual data has been gathered from three different sources: Eikon, Twitter, and Reddit. The data consists of respectively 397 564 headlines from Eikon, 35 811 098 tweets, and 4 109 008 comments from Reddit. These numbers represent the uncleaned data before filtration. The data has been collected for the period between 01.08.2021 and 31.08.2022. Sentiment is calculated by the FinBERT model, an NLP model created by further pre-training of the BERT model on financial text. To predict volatility with the sentiment from FinBERT, three different deep learning models have been applied: A feed forward neural network, a recurrent neural network, and a long short-term memory model. They are used to solve both regression and classification problems. The inference analysis shows significant effects from the computed sentiment variables, and it implies that there exists a correlation between the number of text items and volatility. This is in line with previous literature on sentiment and volatility. The results from the deep learning models show that sentiment has an impact on the prediction of volatility. Both in terms of lower MSE and MAE for the regression problem and higher accuracy for the classification problem. Moreover, this thesis looks at potential weaknesses that could influence the validity of the results. Potential weaknesses include how sentiment is represented, noise in the data, and the Absftarcatc tthat the FinBERT model is not trained on financial oriented text from social media.