For Lab 5b, I chose to model the texts from the mini-corpus I made for Lab 3. The following texts exist in this corpus: The Brass Bottle, A Christmas Garland, Mr. Dooley Says, Adventures of Bindle, Laughable Lyrics, Once a Week, Comic History of England, _and _Eliza. All of these come from the Gutenberg Press website and are under the “humor” category.
I had MALLET model 20 topics. I chose to do 20 topics because I wanted to get a basic amount of topics while still covering a decent amount of information. 20 seemed like a wide enough range without taking on too much information.
I didn’t face too many challenges in creating my own corpus since I already had the plain text files from lab 3 saved into a folder. However, because the individual plain text files in my corpus were not of equal length, my results were a little skewed. Instead of having topics that were distributed more equally, some pieces of literature that I put through the system had incredibly high numbers for certain topics while others had the complete opposite. For example, for the second topic, Bill Nye’s Comic History of England logs 59% in the document, yet in the same topic Max Beerbohm’s A Christmas Garland logs so little space the number is an decimal to a negative exponent. These results are skewed so greatly because of the vast differences in length of each piece that I chose for my corpus when I compiled it in lab 3. The most issues I had with MALLET itself came when I actually had to work with the file paths and the directories within terminal. Unfortunately, I am not as technically savvy as I would hope for myself, so I was frequently prone to making simple mistakes that, while easily fixable, were an added difficulty nonetheless.
Because topic modeling is such a new and innovative method of literary analysis, there are still so many possibilities as to the possible uses and applications. Topic modeling ultimately allows researchers and analysts a greater point of view. By studying corpora with a significant amount of literature in them, analysts can look at trends over time because topic modeling offers a more comprehensive look. For example, when you compare close reading as a method of textual analysis to topic modeling, the results are incredibly different. Close reading forces an in depth relationship with one text, while topic modeling forces a look at a group of texts in a more stepped-back point of view in order to gain more understanding about overall trends of a time or trends within a document. The weight of topics in a given document can be very telling about the purpose or even the main idea of said document when modeled correctly.
I think the most useful way to apply topic modeling to literary analysis is when studying a group of texts within a certain time period, since you can then find out what topics were either important, prolific, or hot-button topics at the time of publication. By honing in on one specific time period or one specific literary movement, researchers will be able to get a better glimpse into the literature as a whole. For example, studying a collection of texts written during the Civil War could produce topics about slavery, human rights, government, and war, and that would be expected. However, if topic modeling discovered that most of the literature produced during this time was about humanity, love, religion, and unity, we would be exposed to an entirely different manner of thinking when studying this era of history and literature.