Unsupervised machine learning for conference scheduling : a natural language processing approach based on latent dirichlet allocation
Master thesis
Permanent lenke
https://hdl.handle.net/11250/2679248Utgivelsesdato
2020Metadata
Vis full innførselSamlinger
- Master Thesis [4490]
Sammendrag
Academic conference scheduling is the act of organizing large-scale conferences
based upon the submission of academic papers in which the author will provide
a talk. Traditionally each speaker is placed into a session where other similarly
themed talks will take place. To create an appropriate conference schedule, these
talks should be organized by thematic similarity. This requires conference organizers
to read through abstracts or extended abstracts of submissions to understand how
to place these papers together in a cohesive manner. In very large conferences
where the number of submissions may be over several hundred, this proves to be a
demanding task as it requires considerable time and effort on behalf of organizers.
To help automate this process, this thesis will utilize a form of topic modeling
called latent Dirichlet allocation which lies in the realm of natural language
processing. Latent Dirichlet allocation is an unsupervised machine learning algorithm
that analyzes text for underlying thematic content of documents and can assign
these documents to topics. This can prove to be a tremendously beneficial tool for
conference organizers as it can reduce the required effort to plan conferences with
minimal human intervention if executed correctly. To examine how this method of
topic modeling can be applied to conference scheduling, three different conferences
will be examined using textual data found within the submitted papers to these
conferences.
The goal of creating these topic models is to understand how latent Dirichlet
allocation can be used to reduce required effort and see how data set attributes
and model parameters will affect the creation of topics and allocation of documents
into these topics. Using this method resulted in clear cohesion between documents
placed into topics for data sets with higher average word counts. Improvements
to these models exist that can further increase the ability to separate documents
more cohesively. Latent Dirichlet allocation proves to be a useful tool in conference
scheduling as it can help schedulers create a baseline conference with considerable
speed and minimal effort. With this baseline conference created, schedulers are then
able to expand upon the results to help create the full conference schedule.
Keywords: natural language processing, conference scheduling, machine
learning, latent Dirichlet allocation