Topic Modeling MALLET – lab 5b | Digital Literary Studies ENGL 4590 Spring 2016

For lab 5b, I chose to upload my corpus of plain text from lab 3. I already uploaded my corpus folder on Wednesday, but I think I remember modeling 8 plain text and setting the number to 20 like we did in class. We did 20 in class and I wanted to keep with the example so I wouldn’t get confused if it came out differently. 20 also seemed like a good number to use since it’s high enough to get a good amount of information, but not too high to overwhelm me too much information. The following is a list of the items in my corpus: The Wonderful Wizard of Oz, In the Court of King Arthur, Robin Hood, The Well at the World’s End: A Tale, Otto of the Silver Hand, The Ruby of Kishmoor, Twilight Land, and Irish Fairy Tales. All these texts are located in the Gutenberg Press under the “fantasy” bookshelf category.

The challenges I faced for this lab were the most out of all the labs combined this semester. The main errors I had for the lab were in the terminal. The lab itself probably wouldn’t be as intimidating if people used the terminal command more often on their computers, but it’s rarely used. The steps themselves were easy enough with just typing in the commands word for word and then hitting enter for the computer to do the rest, but I think it’s more the knowing how to set up MALLET and how to navigate it correctly. I had the hardest time trying to figure out why I couldn’t place the tutorial in mallet, and that was because I didn’t have the folder in the correct place. The terminal is very specific and every piece of data needs to be in the right area for the device to work. The other confusing piece was to get Java started and which one to choose from with the vast selection of links. But overall, the setup was the most difficult part. It’s the little details that are the challenging aspects of this lab and that I had a hard time with, but once the setup was over with, the lab wasn’t too difficult then.

After the setup, topic modeling through mallet is easy enough. Researchers can use topic modeling trough mallet and the terminal faster and more accurately depending on if they enter the commands correctly. Topic modeling provides researchers with information in a list form about specific items. This information consists of words that recur or that are used frequently and often within a corpus. The corpus information would then be examined to make conclusions about the corpus as a whole and even link that to other studies and examples. Researchers can use this form of information gathering to make connections between words in certain time periods or related novels. Topic modeling is all about finding patterns that connect the words in a corpus in some form or fashion and then using that information to make more connections to literary studies.