Machine Learning for Predicting Voluntary Audits : How do loan and risk factors influence small private commercial U.S. banks’ decision to get no audit?
Abstract
Commercial banks and other financial institutions are essential to the modern economy,
and government agencies and regulators strive to identify and counteract risks in banking
institutions. With an emphasis on loan and risk based factors, this thesis explores what
influences small private commercial U.S. banks' decision to get no voluntary external
audit. Using bank regulatory data spanning 10 years from 2010 to 2020, we predict
audit choice using four machine learning algorithms for classification; logistic regression,
LASSO, random forest, and LightGBM. The models make use of 16 specially selected
independent features. This thesis analyzes the machine learning algorithms based on
various performance metrics (accuracy, specificity, precision, recall, and F l ) and studies
the feature importance measured by each model. To verify the results, the thesis uses two
methods of feature selection; ANOVA and Mutual Information.
Our findings suggest that the proportion of agricultural loans to the total sum of loans
is an important factor in predicting audit choice. Bank size and asset quality are also
important factors in the banks' audit decisions. The best models are the tree-based
models, with random forest being considered the best. Random forest predicts with a
high level of accuracy and argues that the relationship between audit choice and the bank
data is nonlinear.