In Lab 5a, we used the topic modeling tool to create topics from a corpus.
Unfortunately, wordpress wouldn’t let me upload a picture of the screenshot I took of the topic model tool however you can kind of see it in the background of this picture after it’d been run. After selecting learn topics and setting it to 15 topics, there were several files output. This is one of them, in which there is all topics, docs, a css file, and topics. This file is called output_html.
When you click on the all topics.html link it brings up this, which is a list of all topics from your corpus (I used the gothic text corpus used in lab 3).
If you click on one of these topics, it takes you to a list where you can see specifically which document dominates the certain topic. For example, this is topic 1, and Dracula is the #1 text that falls into this topic.
This is the view from another folder created after running the topic modeling tool. It is labeled output_csv. In this folder there are three documents: docsintopics, topicwords, and topicsindocs.
This is a screenshot of TopicsinDocs. What this page is showing us is the amount of each topic that is found in each document using percentages and such.
This next document is simply topic words and lists out the topics created from the topic modeling tool in a different way than the html version in the other folder.
Finally, this is the DocsinTopics file. It ranks each document in each topic by how much of that document is related to that topic. For example, we can see that The Phantom of the Opera is #4 in topic 1, Frankenstein is #3 in topic 2, and so forth. This allows us to see which texts are most and least relevant to the different topics.