Predicting Private Equity Fund Performance with Machine Learning
Master thesis
Permanent lenke
https://hdl.handle.net/11250/3055171Utgivelsesdato
2022Metadata
Vis full innførselSamlinger
- Master Thesis [4380]
Sammendrag
This paper has the objective of applying machine learning models to predict the
performance of private equity funds, to allow for more effective fund selection for investors in
the private markets. Prior research has mainly focused on determining a probability of private
equity funds exceeding a pre-defined rate of return, or on examining factors which influence
the returns of said funds. We instead utilize the factors previously determined to influence
private equity fund returns to train machine learning algorithms predicting the returns investors
can expect to receive from the moment of making a primary investment into the fund, until the
fund’s liquidation. Due to it being the measure of choice for both general partners (GPs) and
limited partners (LPs) in the private equity industry, we selected the Net Internal Rate of Return
(NIRR) as our measure of return. We mainly source our data from PitchBook, which allows us
to form a more extensive set of predictor variables, while supplementing this data with
macroeconomic variables collected from public sources. To estimate predictor models, we
apply machine learning methodologies including stepwise regression methods, such as the
Akaike Information Criterion and Ridge, as well as more advanced methods consisting of
Support Vector Machine and Bayesian Regularized Neural Networks. The latter enables us to
add flexibility into our models by considering interaction effects between predictor variables.
Our models show favorable results, with the Support Vector Machine giving the strongest
performance on in-sample data, delivering a mean squared error (MSE) value of 0.0072. This
does however come at the expense of weaker performance on the out-of-sample data, with the
model achieving an MSE of 0.0538 on the test set, likely implying that the model overfits the
data when calculating the algorithm for the training set. This is compensated by the linear
Akaike Information Criterion model performing quite strongly on the out-of-sample,
displaying an MSE value of 0.0370.