For this lab, I used the corpus from lab 3, which had eight texts from the “movie books” fiction bookshelf. These texts included Peter Pan, Wuthering Heights, Tarzan, Son of Tarzan, The Scarlet Letter, The Hunchback of Notre Dame, The Jungle Book, and Frankenstein. I used the PowerShell to create twenty topics with twenty words in each in mallet. It gave me two text files, a keys file that contained the topics and their percentages in the corpus overall, and a composition file, which displayed the texts and the prominence of each topic within those selected texts. I copied these plain text files into excel files to be able to read them better.


Out of these twenty topics, I found topics twelve and six the most interesting. Topic twelve had words such as, “time,” “project,” “eyes,” “made,” “work,” “heard,” “gutenberg,” “looked,” “full,” “make,” “sat,” “knew,” and “electronic.”


This topic occurred quite a bit across the texts, occurring in 22.8% of Peter Pan, 21% of The Jungle Book, and between 12% and 15% of Wuthering Heights, Tarzan, Son of Tarzan, Frankenstein, and The Scarlet Letter. Actually, the only novel this topic didn’t occur much in was The Hunchback of Notre Dame, only talking up 1.4% of that novel, which I found odd.


Although this topic had quite a bit of prominence in seven of the eight novels in my corpus, I thought it was the least coherent topic of the bunch.  The words “project” and “gutenberg” obviously were connected to the font matter that stated that each of these works came from Project Gutenberg,—I think “electronic” fits in with those as well—and the rest of the words like “time,” “sat,” and “eyes” didn’t really seem like they went together in any specific way. I think this topic only had so much prominence because these were words that are often used in many texts.


The other topic I noticed though, topic six, I thought had a much higher coherence.


This topic contained words such as “night,” “mind,” “life,” “death,” “happy,” “friends,” “creature,” “earth,” “possessed,” “country,” “horror,” and “hope.” In one of my other classes right now we’re reading Jane Eyre and these words reminded me a lot of some of the gothic trends I saw while reading it, especially “night,” “death,” “life,” “creature,” and “horror.” Because of this, I thought these words might possibly be grouped for belonging to a sort of gothic tradition with its focus on small moments of happiness and a lot of dark and horrific other parts. When I looked at the composition file, this hypothesis seemed even more possible.


Topic six occurred in 29% of Frankenstein, which is probably one of the most famous Gothic novels. It also occurred in 9.4% of Wuthering Heights and 5.9% of The Hunchback of Notre Dame. It also occurred in 6.4% of The Scarlet Letter, but I don’t really know much about that novel, so it might disrupt my gothic idea. The other three novels, though, have pretty strong gothic themes, so I though perhaps I was right about this being a gothic topic, especially when you see that the topic only occurs in 0.3% of The Jungle Book, which would clearly not be gothic.


After finding this trend, I actually found mallet a lot more valuable than I had originally thought. I think the main turn off for many of my class members was the extreme difficulty we had figuring out how to work mallet. While I do think it was really hard, I can see how it would be a really valuable tool once you knew what you were doing with it.