Predicting patent litigation :  a comprehensive comparison of machine learning algorithm performance in predicting patent litigation

Follesø, Henrik Størksen; Kaminski, Maria

dc.contributor.advisor	Juranek, Steffen
dc.contributor.author	Follesø, Henrik Størksen
dc.contributor.author	Kaminski, Maria
dc.date.accessioned	2020-09-24T08:42:51Z
dc.date.available	2020-09-24T08:42:51Z
dc.date.issued	2020
dc.identifier.uri	https://hdl.handle.net/11250/2679398
dc.description.abstract	Patents are designed to act as an incentive for innovation by awarding exclusive property rights to the inventor. And as such, patents are one of the main driving forces behind innovation, and ultimately economic growth (Lanjouw and Schankerman, 2004). Patent litigation, the legal process associated with legal disputes regarding patent rights, is hard to predict, surrounded by uncertainty, can be ruinously expensive, and very difficult to insure. Previous research has shown that there is potential for predicting patent litigation, however based on limited data and limited algorithm sophistication. The purpose of this thesis is to evaluate the extent of which patent litigation can be predicted, what machine learning method is most appropriate, and what are the characteristics that is important for the classifier. The goal is to contribute to reducing the uncertainty that threatens the incentives of innovation by introducing more information through better patent litigation prediction. In particular we focus on the patent litigation insurance market as the most direct application for our research. This thesis is inspired by the work of Lanjouw and Schankerman (2001) which forms the basis of our research. Building on their work, more data and characteristics are added to the analysis, before other more sophisticated machine learning algorithms are employed and compared. The work relates to anomaly detection, and face similar challenges unique to this area of research. We find that patent litigation can to a large extent be predicted. Furthermore, adding more characteristics and information increase the predictive power. The largest gains in predictive power stems from the use of appropriate algorithms. Using the right algorithm is much more important than using a more advanced or newer algorithm. The Random Forest classifier is found to be the preferred method of predicting patent litigation on our data, as it yields models with high levels of predictive power. We find that patent family size, whether or not the patent is owned by a US company, and the number of backward citations to be the most important characteristics that drives the prediction of litigation. Keywords – NHH, Master Thesis, Patent Litigation Data, Patent Litigation Prediction, Predictive Analysis, Logit, Random Forest, XGBoost, SVM	en_US
dc.language.iso	eng	en_US
dc.subject	business analytics	en_US
dc.title	Predicting patent litigation : a comprehensive comparison of machine learning algorithm performance in predicting patent litigation	en_US
dc.type	Master thesis	en_US
dc.description.localcode	nhhmas	en_US

Files in this item

Name:: masterthesis.pdf
Size:: 2.051Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Master Thesis [4209]

Show simple item record