For lab 5a, we studied more methods of topic modeling. Specifically, we used the Topic Modeling Tool, a Java-based implementation of the well-known MALLET topic modeling tool. In addition, I previously created a mini-corpus of plain texts from Project Gutenberg.


This is a screenshot of the Topic Modeling Tool I used. I allow the tool to search and find 10 topics throughout my mini corpus. My mini corpus was comprised of the fantasy texts from Project Gutenberg.



This is a screenshot of the list of topics found through the Topic Modeling Tool. They are really interesting because some of them are topics expected to be found in fantasy fiction, but others probably are not.



More details about topic #1 can be seen in this screenshot. This Tool gives its users the ability to see which documents (plain-text versions of the fantasy fiction literature) are ranked the highest for this topic and which ones are ranked the lowest.



“TopicsInDocs” is an spreadsheet from the Topic Modeling Tool that can be used to analyze and discover more about the topics in the various texts. In this particular screenshot, you can see that the Tool labels the plain-text literary documents with id numbers and then lists the topics that are the highest ranked for each.



The Topics_Words excel spreadsheet simply lists the words that are involved in each topic.