Lab 5b

Since I was not able to complete Lab 5b as it was intended because of Java issues, I experimented deeper into the Topic Modeling Tool. I imported my corpus into the program and did not alter anything, but simply clicked “Learn Topics.” I was given ten topics with 10 topic words in each. By looking at them alone, it is clear based on the language that certain topics must be prominent for a certain author. In addition, the topics did not seem quite complete. For example, one topic read, “uc buck dogs thornton cave half till ustane day dog” when there were only 10 topics words chosen. This topic seems a bit confusing and hard to understand. To see if more words would illuminate the topic’s cohesiveness, I changed the number of topic words to fifteen. Once I did that, all of the topics expanded and the theme of the topic became a bit more clear. The topic looked like this after: “uc man years poor great job dead life beautiful world fact ye kallikrates cave beauty.” Not only were there more words, but the words had changed. This was unexpected for me to see. I thought that increasing the number of words would simply add five words to the topic, however, it seems like a new topic entirely. I decided instead of looking at the overall trends of all topics, I would continue to focus on this one and keep increasing the topic words to see how it changed the topic. Once I entered twenty words into the topic word model, it became hard to tell which topic was the one I was following. The topic no longer had the word “uc” in it, which before had been very easy to pick out. Again, I was surprised. It made me wonder how “uc” could be important when there were only ten words to present and less important when there were twenty. I decided to take a different approach and clicked “Learn Topics” after entering only two topic words. When I did this, the topic said “uc thee.” However, I noticed another topic said just, “project gutenberg.” What that tells me is that project gutenberg was not added as a stop word as it should have been. I went into the TopicsinDocs excel document to investigate which text had “project gutenberg” as it’s most prominent topic. Jack London’s The Call of the Wild ended up being the document and showed this topic as being 34.9% of the text. To me, it disappointing to see that the Topic Modeling Tool did not include that as a stop word. If I had been able to participate in the MALLET lab, maybe I could have added it as a stop word. Looking at the way the topics change with the increase or decrease of the number of topic words allowed me to reflect on the benefits of using this program. I believe that it can help people to see what theme is prominent in text, but the program seemed inconsistent to me. I think that we need to learn how to use the computer to our advantage, which is why this tool is helpful, but I also think we need to recognize the limitations of machines.