For Lab 5a, I downloaded the Topic Modeling Tool and used my Lab 3 folder titled “Children’s Lit-PlainText” for this lab. I downloaded it into the “Select Input File or Dir.” My “Select Output File” is under my document’s section, so I can find it easily on my computer. I choose to select 15 topics for Lab 5a. This is what I got when I clicked “Learn Topics”:
My next step was making sure the “stopwords” were applied. Luckily, Topic Modeling Tool already applies MALLET stopwords for you, so I did not have to change anything. However, I did change the number of topic words printed to 20, like we did in class. I hit the “OK” button to apply my changes.
Next, I went to my files and searched for the “output_csv” folder created by the Topic Modeling Tool. In this folder contains: DocsInTopics, Topics_Words, and TopicsInDocs.
My next step was to click through each individual excel file. First is the DocsInTopics. This excel spreadsheet shows my plain text corpus’ (containing 8 plain text files from my Children Lit-Plaintext folder) rankings.
Next, I went to Topics_Words excel spreadsheet. This spreadsheet showed my 15 topics that I set in the Topic Modeling Tool.
Next, I went to the last excel spreadsheet in the folder. This is the TopicInDocs spreadsheet. This shows what topics in the documents have the most relevancy by using percentages to give a detail account into the topics’ relevancy.
Next, I went to the output_HTML folder. In this folder, you find a Docs and Topics folder. As well as an all_topics and malletgui icon.
When I click on the “Docs” folder it takes me to 8 different icons because I had 8 items in my corpus. For the sake of time, I will show you one of the results from the icon under the “Docs” folder.
I clicked on the first icon under the “Docs” folder, and this is what appeared. It shows one of my plain text files and the top topics in this doc.
Next, I went back and clicked on the Topics file under the “output_HTML” folder. I clicked on the first Topic and found this:
This shows me the first topic and the top-ranked docs in this topic. It ranked the docs from 2-9.
After that, I went back and clicked on “all topics.” This shows me a list of my 15 topics (like it did in the excel spreadsheet).
That is all of my steps for Lab 5a. For some reason, the malletgui icon was acting up when I tried to click on it, and I could not open it.