Predicting Private Equity Fund Performance with Machine Learning

2022

This paper has the objective of applying machine learning models to predict the

performance of private equity funds, to allow for more effective fund selection for investors in

the private markets. Prior research has mainly focused on determining a probability of private

equity funds exceeding a pre-defined rate of return, or on examining factors which influence

the returns of said funds. We instead utilize the factors previously determined to influence

private equity fund returns to train machine learning algorithms predicting the returns investors

can expect to receive from the moment of making a primary investment into the fund, until the

fund’s liquidation. Due to it being the measure of choice for both general partners (GPs) and

limited partners (LPs) in the private equity industry, we selected the Net Internal Rate of Return

(NIRR) as our measure of return. We mainly source our data from PitchBook, which allows us

to form a more extensive set of predictor variables, while supplementing this data with

macroeconomic variables collected from public sources. To estimate predictor models, we

apply machine learning methodologies including stepwise regression methods, such as the

Akaike Information Criterion and Ridge, as well as more advanced methods consisting of

Support Vector Machine and Bayesian Regularized Neural Networks. The latter enables us to

add flexibility into our models by considering interaction effects between predictor variables.

Our models show favorable results, with the Support Vector Machine giving the strongest

performance on in-sample data, delivering a mean squared error (MSE) value of 0.0072. This

does however come at the expense of weaker performance on the out-of-sample data, with the

model achieving an MSE of 0.0538 on the test set, likely implying that the model overfits the

data when calculating the algorithm for the training set. This is compensated by the linear

Akaike Information Criterion model performing quite strongly on the out-of-sample,

displaying an MSE value of 0.0370.