For this lab, I used the mini-corpus I created in Lab 3, called Humor (Bookshelf).
- To get started, I input the plain-text files from that mini-corpus into the Topic Modeling Tool. I did this by clicking “Select Input File or Dir” before navigating to my desktop folder, which contained my Lab 3 folder, which in turn held the plain-text files from the Humor bookshelf. Then, I individually selected each of the eight plain-text files to input them into the Topic Modeling Tool.
- After inputting the plain-text files and clicking “Advanced,” I was able to configure specific settings. I decided to change the “No. of topic words printed” from 10 to 15, so that I would be presented with a wider array of words to view pertaining to the topics of my mini-corpus.
- After changing the settings of what I wanted the Topic Modeling Tool to do, I selected “Learn Topics”. After processing my plain-text files, the Topic Modeling Tool provided me with 10 topics of my mini-corpus, and in each topic line there were 15 words associated with that particular topic. By studying these topics and words, I could get a sense of the more prominent subjects and descriptors in my mini-corpus.
- In order to help one study the topics more easily, the Topic Modeling Tool creates an HTML file with the topics, that can be opened in Google Chrome and has links.
- When clicking on any of the topics, the link takes you to another page that ranks the documents in the mini-corpus according to their relevance to that topic (by the number of words in the document assigned to the topic).
- Clicking on any of these docs takes it a step further and shows the top three topics in the document, assigning the topics a percentage based on the number of words in the document that go with the topic.
The Topic Modeling Tool makes it easier for those who want to not only identify the most important topics of a corpus, but also gain an understanding of the words associated with the topics, and how integral the topics and words are in different documents.