Lab 5b
Because I couldn’t get MALLET to work, I further explored the TopicModelingTool. I am using my mini corpus that I used in a previous lab. My mini corpus is composed of eight plain text, children’s literature files published prior to 1923. First, I uploaded my mini corpus into the TopicModelingTool, and then I clicked “Learn Topics.” Ten is the number of topics that the TopicModelingTool is set to generate unless you enter a different number in the “Number of Topics” blank. So, when I clicked “Learn Topics,” the output folders were downloaded onto my computer, and I opened them to explore and attempt to analyze the data. First, I opened the All Topics html from the output html folder, and then I clicked on each topic starting with the first in order to view the top-ranked docs for each topic. Then, I went back to the TopicModelingTool and repeated that same process first typing in the number “15” into “Number of Topics,” then typing in “20” into “Number of Topics,” and then typing in “50” into “Number of Topics.” I gradually increased the number of topic words from 10 to 15 to 20 and then jumped to 50 because I was just feeling crazy. There was honestly no method to my analyzation. Well, I didn’t think there was. I didn’t even really know how I was analyzing this corpus by changing the number of topic words, but then I realized that as I increased the number of topic words, some of the topic words that didn’t make so much sense when there were so few topics made a lot more sense when I could see more words associated with the topics. I also noticed throughout this process that certain topics seemed to be more relevant than others for certain authors. For example, when there were 10 topic words, topic 4 read “sylvia estralla grace flora mr fort fulton girl father missy.” As you can see, this is just a list of names, so it didn’t really provide me with a complete picture of that topic. In order to gain a more complete understanding of that topic, each time I increased the number of topic words, I was able to learn a little bit more about that topic. When I typed “15” into the “Number of Topics” blank, topic 4 no longer read “sylvia estralla grace flora mr fort fulton girl father missy.” Now, topic 4 read “oonomoo de huron hans dat lieutenant shawnees vanderbum dey canoe.” So, this is not at all what topic 4 looked like when the number of topics was 10…Apparently, this was a new topic completely. And this topic was completely unrelated to the topic 4 from the previous iteration of learned topics. Anyway, I ended up realizing that the number I typed into the number of topics blank dictated how many lists of topics were generated from the eight different documents in my corpus that I inputted into the TopicModelingTool. Anyway, I think I might have gotten a little bit confused in the beginning, but I think that I better understand how to use the TopicModelingTool now. However, I’m not sure that I fully understand how to use the TopicModelingTool to analyze literature. I see the potential of utilizing the TopicModelingTool to analyze literature, but I’m not sure that I can fully appreciate all the gifts that it could offer.