Portfolio Modelling: Man vs Machine : An empirical study of machine learning efficiency and variable selection in portfolio modelling on the Oslo Stock Exchange

2022

We investigated and compared the performance of machine learning methods in the context

of empirical asset pricing. We used seven different algorithms and 83 firm characteristics,

comparing the models’ monthly predictive accuracy and variable importance on Norwegian

stock and accounting data. Additionally, we investigated the models’ ability to generate

excess returns in monthly-rebalanced, long-short and long-only portfolios.

We found that the XGBoost algorithm has the highest prediction accuracy of 53.16%,

and that it more heavily weights momentum variables. Furthermore, we found excess

risk-adjusted returns when constructing portfolios free of market frictions. A long-only

portfolio with predictions from the XGBoost model outperformed the index, on average,

by around 0.5% each month in the out-of-sample period. When accounting for market

frictions an institutional investor might encounter, the returns are diminished to the point

of significantly underperforming. When presenting a strategy that a retail investor could

implement, we found excess returns. The XGBoost model’s net returns outperformed the

index by 0.16% and 0.67% over the period, after excluding the largest 25% and 50% firms,

respectively. Upon investigating the explanation for this possible market inefficiency, we

found that the returns are largely driven by highly illiquid stocks. We suggest that these

returns likely are unattainable because of the high degree of illiquidity, and therefore

could be impossible to arbitrage away in the way we would expect the market to do when

it discovers an inefficiency. We call this phenomenon "rainbow-returns", as they are likely

only observable and unattainable.

Our findings support the efficient market hypothesis, in that one cannot beat the market

using public available information, and adds to existing literature in the emerging field of

empirical asset pricing through machine learning.