In Lab 5a, we learned about topic modeling, specifically utilizing the Topic Modeling Tool. This particular tool is a Java-based implementation of the well-known MALLET topic modeling tool. As we first began this lab, we were asked to try out this tool on the mini corpus we created for a previous lab, so I used my mini corpus of eight different Children’s Fiction plain text files.
This is a screenshot showing my interaction with the Topic Modeling Tool using my Children’s Fiction mini corpus. The input file I selected was the folder containing all eight texts as plain text files. The output I selected was my ENGL 4590 folder on my computer because that is where I save all of my work from this course on my computer. I selected 15 as the number of topics that I wanted to be searched for using my mini corpus.
This screenshot shows the list of 15 topics from my mini corpus generated via the Topic Modeling Tool.
By clicking on the first topic in the previous “List of Topics” screenshot, I was taken to this screen, which shows how the plain text files from my mini corpus rank from highest to lowest for Topic 1 in terms of how often the topic occurs in a given text.
This is a screenshot of the TopicsInDocs Excel sheet generated by the Topic Modeling Tool for my mini corpus, which shows the weight of each topic in each document (which topics are important for each document).