Unsupervised machine learning for conference scheduling : a natural language processing approach based on latent dirichlet allocation

2020

Academic conference scheduling is the act of organizing large-scale conferences

based upon the submission of academic papers in which the author will provide

a talk. Traditionally each speaker is placed into a session where other similarly

themed talks will take place. To create an appropriate conference schedule, these

talks should be organized by thematic similarity. This requires conference organizers

to read through abstracts or extended abstracts of submissions to understand how

to place these papers together in a cohesive manner. In very large conferences

where the number of submissions may be over several hundred, this proves to be a

demanding task as it requires considerable time and effort on behalf of organizers.

To help automate this process, this thesis will utilize a form of topic modeling

called latent Dirichlet allocation which lies in the realm of natural language

processing. Latent Dirichlet allocation is an unsupervised machine learning algorithm

that analyzes text for underlying thematic content of documents and can assign

these documents to topics. This can prove to be a tremendously beneficial tool for

conference organizers as it can reduce the required effort to plan conferences with

minimal human intervention if executed correctly. To examine how this method of

topic modeling can be applied to conference scheduling, three different conferences

will be examined using textual data found within the submitted papers to these

conferences.

The goal of creating these topic models is to understand how latent Dirichlet

allocation can be used to reduce required effort and see how data set attributes

and model parameters will affect the creation of topics and allocation of documents

into these topics. Using this method resulted in clear cohesion between documents

placed into topics for data sets with higher average word counts. Improvements

to these models exist that can further increase the ability to separate documents

more cohesively. Latent Dirichlet allocation proves to be a useful tool in conference

scheduling as it can help schedulers create a baseline conference with considerable

speed and minimal effort. With this baseline conference created, schedulers are then

able to expand upon the results to help create the full conference schedule.

Keywords: natural language processing, conference scheduling, machine

learning, latent Dirichlet allocation