Predicting patent litigation : a comprehensive comparison of machine learning algorithm performance in predicting patent litigation
Master thesis
Permanent lenke
https://hdl.handle.net/11250/2679398Utgivelsesdato
2020Metadata
Vis full innførselSamlinger
- Master Thesis [4380]
Sammendrag
Patents are designed to act as an incentive for innovation by awarding exclusive property
rights to the inventor. And as such, patents are one of the main driving forces behind
innovation, and ultimately economic growth (Lanjouw and Schankerman, 2004). Patent
litigation, the legal process associated with legal disputes regarding patent rights, is hard
to predict, surrounded by uncertainty, can be ruinously expensive, and very difficult to
insure. Previous research has shown that there is potential for predicting patent litigation,
however based on limited data and limited algorithm sophistication.
The purpose of this thesis is to evaluate the extent of which patent litigation can
be predicted, what machine learning method is most appropriate, and what are the
characteristics that is important for the classifier. The goal is to contribute to reducing the
uncertainty that threatens the incentives of innovation by introducing more information
through better patent litigation prediction. In particular we focus on the patent litigation
insurance market as the most direct application for our research.
This thesis is inspired by the work of Lanjouw and Schankerman (2001) which forms the
basis of our research. Building on their work, more data and characteristics are added to
the analysis, before other more sophisticated machine learning algorithms are employed
and compared. The work relates to anomaly detection, and face similar challenges unique
to this area of research.
We find that patent litigation can to a large extent be predicted. Furthermore, adding
more characteristics and information increase the predictive power. The largest gains in
predictive power stems from the use of appropriate algorithms. Using the right algorithm
is much more important than using a more advanced or newer algorithm. The Random
Forest classifier is found to be the preferred method of predicting patent litigation on our
data, as it yields models with high levels of predictive power. We find that patent family
size, whether or not the patent is owned by a US company, and the number of backward
citations to be the most important characteristics that drives the prediction of litigation.
Keywords – NHH, Master Thesis, Patent Litigation Data, Patent Litigation Prediction,
Predictive Analysis, Logit, Random Forest, XGBoost, SVM