I chose to model my Lab 3 corpus for this lab. My Lab 3 corpus contained 8 children’s plain text literature. Since it was a small sample, the challenge I faced was being able to get an accurate sample of the 8 texts. However, the biggest challenge I faced was actually learning mallet and being able to create a topic model. I found learning the ins and outs of mallet more difficult than anything else. In the Jocker’s reading, we discussed in class how text must be in plain text form and stopwords must be removed to be able to form your topic model. I found it helpful that the topic modeling tool removes the stopwords for you to get a more accurate analysis.
In addition, I believe my fifth topic is the most coherent because it shows that it is contained in 15% of the model. I believe it is the most coherent because the book is Tarzan of the Apes, and it was a popular topic of the time. Tarzan being one of the most famous children literature books also helps with this theory. It probably also contains a lot of relevant words that the topic modeling tool picked up on, which makes it the most relevant in the corpus. Another topic that was coherent was topic number two, The Hosts of the Air. I never read the book, so I cannot give an accurate representation as to why it the topic modeling tool would rank this as number two, but I assume it has relevant words that the tool picked up on.
The least coherent topic was number one, The Cash Boy. I am assuming the topic modeling tool could not find many key words that made the text relevant in comparison to the other texts in the original corpus.
I believe topic modeling show researchers the correlational between novels and their time. Like we discussed in class about the Jockers, topic models will sometimes spit out topics with no clear meaning. So I believe it is important to know a little bit more about your corpus and try to create a usable corpus to avoid it spitting out useless information. I believe Jockers advice on topic modeling would help eliminate this problem as well. He suggests that you should decide how you chunk your corpus together, remove the stopwords, and use an accurate topic modeling tool like mallet. (131). In addition, I believe this brings a different type of close reading into play where you can look at multiple texts at one time and compare them to each other. It allows us to answer questions about texts in relation to the time in which they were written and the impact it had on society. It also allows us to see what “topics” were important during that time and compare them to other texts. It shows what authors and people were concerned with during the time the text was written and what type of topics in general were popular at that time, which I find to be very interesting.