In this lab, I again used my gothic plain text files from lab 3. I chose to select 5 topics. Last time when I worked with this, I used 10, and I thought it’d be interested to narrow it down to only 5 so that the topics were more broad and all encompassing. That’s sort of the outcome I got but I’ll expand on that later. I used the topic modeling tool because for some reason, although I have installed java (like 3 different times) it will not work and I am unable to use mallet. You told me in class that I could just use the topic modeling tool and that they were essentially two means to the same end. I couldn’t use mallet–my computer was literally incapable. In class I was able to get to a certain point but eventually it just wouldn’t let me go any further and I couldn’t create any files or anything. That’s why my box uploads are different from everyone elses. I learned a lesson from this though–computers are hard. I understand the concept behind mallet I believe, and I learned how to navigate through terminal a bit throughout this lab, so even though I didn’t get the desired outcome of this lab, I still learned a lot while working through it.
On to topics…
My 5 topics were
catherine mr heathcliff ll mrs miss linton master young time
[gutenberg project tm mind work thought heart day found works
](///Users/gabriellecothran/Desktop/output_html/Topics/Topic4.html) life man dorian henry lord made eyes gray long death
All of these topics were interesting to me for different reasons. First of all, I thought it was interesting that so many names were topics. Obviously the names are only relevent to certain works (i.e. Dorian — The Picture of Dorian Gray, Heathcliff — Wuthering Heights). I think that as a topic, this isn’t really a great thing to have names in there. This is because it really doesn’t tell us anything about the texts. It is specific to only certain works and doesn’t relate to any theme or give us information about what the corpus has in common. The first and last topics seemed to be the most telling, although they both contain a couple names, because they contain words that one would sort of expect to see in gothic literature and kind of give off a gothic vibe themselves. They seemed most coherent. Also Gutenburg Project was a topic which is weird. That’s the site that the texts came from and every text said that but I believe only once. It’s interesting that it was deemed useful enough to be considered a part of a topic.
I think in general, topic modeling is useful for researchers because it allows them to see overarching themes throughout a large number of texts. This is useful for analyzing and for researching and for just really learning about the specific texts that are being studied. However, there are some flaws in topic modeling, like I’ve mentioned with the names that are really not coherent to the whole corpus. Though topic modeling has its advantages, it has disadvantages too and could mislead those studying it into thinking that certain things are important when they’re really not and leading to generalizations that are incorrect.