Machine Learning for Predicting Voluntary Audits : How do loan and risk factors influence small private commercial U.S. banks’ decision to get no  audit?

2022

Commercial banks and other financial institutions are essential to the modern economy,

and government agencies and regulators strive to identify and counteract risks in banking

institutions. With an emphasis on loan and risk based factors, this thesis explores what

influences small private commercial U.S. banks' decision to get no voluntary external

audit. Using bank regulatory data spanning 10 years from 2010 to 2020, we predict

audit choice using four machine learning algorithms for classification; logistic regression,

LASSO, random forest, and LightGBM. The models make use of 16 specially selected

independent features. This thesis analyzes the machine learning algorithms based on

various performance metrics (accuracy, specificity, precision, recall, and F l ) and studies

the feature importance measured by each model. To verify the results, the thesis uses two

methods of feature selection; ANOVA and Mutual Information.

Our findings suggest that the proportion of agricultural loans to the total sum of loans

is an important factor in predicting audit choice. Bank size and asset quality are also

important factors in the banks' audit decisions. The best models are the tree-based

models, with random forest being considered the best. Random forest predicts with a

high level of accuracy and argues that the relationship between audit choice and the bank

data is nonlinear.