Group 3

Project Overview:

For our part of the project, our group focused on the overarching question: Does the word “violence” appear in Gothic and Horror texts? When trying to figure out this answer we first had our own thoughts that yes the word violence would be everywhere in our 18^th through 19^th century Gothic and Horror literature genre corpus. We researched information about violence and found that violence, the term, does not specifically have a pattern of appearing, but more subtle words that surround the idea of violence are what are most prevalent based on the research. We have used topic modeling tools, text analysis tools, and data visualization tools to analyze represent our findings.

We started with a corpus of more than 200 texts and each individual group member analyzed the texts in different ways. Both genres are analyzed in AntConc, Lexos, and Voyant. Our group was curious about the results and the peculiar hypothesis that we had stumbled upon. We hoped that by analyzing the findings on these texts our hypothesis that violence, the term, is not used frequently, but subtle words that usually surround violence are what are incorporated into these texts.

Our individual findings are below.

Citations

cai.ucdavis.edu/waters-sites/gothicnovel/155breport.html

aijcrnet.com/journals/Vol_2_No_4_April_2012/16.pdf

Voyant

We chose Voyant to use to represent the word “violence” in our Gothic and Horror corpus of 18^th-20^th century literature. We chose Voyant to analyze our data because it was a simple data visualization tool and produced easy and interpretable information to help us understand the data that we collected. We used the word cloud to look at words that were the most popular in our corpus, the graph to view trends of the most popular words, and looked at the over word count at the end of Voyant. We believed these visualizations could give us a better overview of our corpus as it relates to “violence” in 18^th-20^th century literature.

Gothic Corpus (Voyant):

Gothic Word Cloud 2

We applied the stop word list from lab two and came up with this word cloud for our gothic corpus. We found that the largest words were time, own, house, little, am, etc. We believe this is because these words occurred the most frequently throughout the novels in our corpus, which contained works from the 18^th-20^th century. We found it interesting that these words did not necessarily represent what we considered to be “gothic.” They were seemingly basic words that were pulled out of each novel.

Screen Shot 2016-04-18 at 11.16.33 PM

Next, we looked at the graph Voyant provided. We found it interesting that the largest peaks were the words little and time. We believe that time was a “scary” concept in older gothic novels (the passing of time, for example, could be a startling concept). The word “little” confused us because it has little to do with our overall group’s analysis of the word “violence.” However, we believe that “little” was a popular word in our gothic corpus because the word “little” could have bad connotations such as having “little” hope or faith.

Screen Shot 2016-04-18 at 11.17.18 PM

This shows the number of texts in the corpus, the total number of words, and the unique word forms. Our corpus contained 87 documents with 3,712,946 total words and 50,071 unique word forms. We just thought that this was interesting information to include in our group’s page to compare to the Horror corpus and to compare to the other data visualization tools were included in our project’s page.

Horror:

Screen Shot 2016-04-18 at 11.20.36 PM

Next, we entered our Horror corpus into Voyant. We found it interesting that the Gothic and Horror corpus had almost the exact same word cloud. However, “night” is seen more frequently in the Horror corpus than the Gothic corpus. We think this is because “horror” has more relevancy to nighttime and nighttime settings. We also found it interesting that the word “house” was a large word in the cloud. The Horror corpus was closer to our expectations of “violence” than the Gothic corpus was with words related around night, house, life, door, strange, etc.

Screen Shot 2016-04-18 at 11.25.33 PM

We found it interesting that the largest peak was the word “time” like it was in the Gothic corpus. Again, we believe this is because time was a “scary” concept in both the Gothic and Horror corpus. We found it odd that the word “night” was not a high trend in the Horror corpus, since it was a large word in the word cloud.

Screen Shot 2016-04-18 at 11.26.03 PM

The Horror corpus was a larger corpus but had fewer total words and unique word forms than the Gothic corpus, which we found interesting. These may be due to the Horror corpus having short words in each novel than the Gothic corpus. According to the article we found on Horror as a genre, it stated that “The beginnings of the genre of literature of horror can be derived from the formation of the tradition of gothic literature, which rose in the 18century. However, the very beginning of the horror is connected with the works of English authors, who followed and developed the gothic tradition” (Viktória Prohászková, 134).

Lexos & MALLET

We chose to use Lexos to research the word “violence” in our Gothic and Horror corpuses of 18th-20th century literature. Lexos is able to perform text analyses (through topic-modeling) of the corpuses, and then is able to create subsequent data visualizations. With the help of MALLET, we were able to use Lexos to its fullest extent. The data of the two corpuses we created could be thoroughly analyzed because MALLET provides word topics files, composition files, keys files, and word topic files to its users. When using Lexos, we were given the option to use the word topics counts files in order to use the helpful MultiCloud tool. Specifically, we used this MultiCloud tool to analyze the topics we were presented with through MALLET, and in what texts that they appear. We used these topic and visualizations to research the idea of violence in our corpuses, and the results were surprising.

Gothic Corpus:

These examples come from our Gothic Corpus generated from the MultiCloud tool in Lexos.

gothic 1

These graphs were made from the scrubbed Gothic Corpus because we applied the stop words list. These examples do not use the word violence, but they subtly use words that usually surround violence. Words that are used are anguish, fear, shock, murder, terrible and many more to make the point that violence is being described in a subtle way. In the initial research we expected to come up with many examples of scholarly articles about the violence in the horror and Gothic genres. We were mistaken and there was actually not much information on this subject. What we did find was the idea that violence in the Gothic and horror genres in the 18^th through the 20^th century was described in a subtle way just like the subtle words that are seen in these MultiClouds. The composition model and the keys model will show the same idea with the amount of topics that usually surround violence.

These are the MultiCouds generated with 150 topics that are put into Lexos and the results are the same. The word violence is not used, but the subtle words surrounding violence are prevalent.

[Gothic Corpus 150 Topic MultiCoulds]

gothic 2

We found it interesting that in the keys model of both 100 topics and 150 topics Mallet files had the same findings from our initial research. The key models have the heaviest weighted topics that have dialogue surrounding the word violence are more subtle. These topics in one through ten of the keys model have the words coffin, death, sorrow, grief all within the sentences. This further helps our team’s argument because these are palpable examples of how violence in literature in the Horror and Gothic genres in the 18th through 20th had a subtle way of incorporating violence into their texts.

[Key Models for Gothic Corpus]

excel1 excel2

The Gothic corpus is a large corpus and the texts in it were mostly novels. We learned something new through our initial research because we had all come into the project thinking that violence would be prevalent in these texts. The research made us change our argument and through analysis the idea that violence is more subtle in these texts became evident. This may be because open and direct violence was seen as something that was unbecoming thus they had to write about violence in a more subtle way as to not turn off their audience. These genres today are known for their violence and this research and analysis show that these genres have not always been so open about violence and the way that it is written.

Horror Corpus:

We were able to discover several new and interesting features about the horror corpus using the Lexos tool. Lexos, as a topic modeling program, provided us with lots of chart-based and graphic models of topics that appear throughout the entire corpus.

We were able to study the horror corpus based off of three different amounts of topics created through MALLET. After using Lexos to cut the texts to be uniform in size and then “scrub” them using a stop words list. The first amount of topics we specified using Lexos was 100. While Lexos could not process these texts from various technological issues, you can see the non-visualized version of the information here:

keys file

This Excel spreadsheet lists all 100 of the topics created by MALLET. The highlighted information in the sheet was some of the more surprising information from this section of the research. While sorting through this material, I focused on the word “violent,” and some related ideas, and I found a wide variety of information. All of the information highlighted in yellow was what I was expecting to find when looking at a horror corpus in the context of violence and the ideas surrounding that word—topics containing words such as “death,” “victim,” “seized,” “horror,” “gloomy,” “haunted,” “ghosts,” “spirits,” etc. The topics highlighted in blue are great examples of the surprising information we have found while studying both the horror and gothic corpuses. For example, these topics include words such as “morning,” “curious,” “church,” “heart,” “beautiful,” “sweet,” “happy,” “loved,” “affection,” etc. This information is specifically important. While one would typically think of a group of horror texts to produce topics in the categories of violence, supernatural activities, and death, there are some major outlying topics. From this discovery, it can be clear that horror texts in the 18^th-20^th centuries do not always contain horror features we would typically associate with that genre in the 21^st century.

Just as we found an interesting mixture of information related to violence in the information from our 100 topic list, there was very parallel information found in the list of 150 topics created by MALLET. I was able to process our information using Lexos, so below are samples of the data visualizations the program produced.

typical3

This is just one example of many topics that fit the “typical” criteria of what we, in the 21^st century, would consider to be featured in a horror corpus. The words such as “haunted,” “scared,” “ghostly,” “black,” and “phenomena” it displays align with a theme of violence and fear we expected to discover throughout this process.

atypical

This visualization is one example of an outlier that we discovered. We would considered this to be atypical to the violent topical theme we expected to discover in the horror corpus. The words such as “life,” “feeling,” “loved,” “satisfied,” and “sweet” definitely make this topic visualization stand out from the norm. Again, this topic helps display the surprising information we discovered about horror texts in the 18^th through 20^th centuries. Themes about life and happiness seem to be a reoccurring pattern in our research.

The final amount of topics we researched was 200. Because this amount is so large, it clearly produced the most detailed and specific information out of the three. Before running this list of topics through Lexos, we were very interested in seeing if it would produce the same type of results as the other two.

typical2

This screenshot definitely fits into the pattern of violence we were anticipating within the horror corpus. The words such as “blood,” “ceased,” “seized,” “kill,” “end,” and “escape” are great examples of the themes many would expect of the horror literary genre.

atypical

This topic, among other very similar, was very interesting! Again, we were presented with this outlying theme within the corpus. This topic was especially unique. The words it features, such as “happy,” “lovely,” “uplifting,” “holiness,” “gentle,” and “softness,” continue with this surprising theme of positive topics within the horror genre.

AntConc Screenshots (Horror Corpus):

We decided to use AntConc for our research regarding violence in 18^th–20^th century literature mainly because of the collocate and concordance plot features. In lab four, we were able to frame our search around one word, so we decided to test this out with this project by using the word “violence.” After analyzing what we found, it was easy to see that although we expected the horror genre of this period to contain many references to the word “violence,” almost the complete opposite was true. The concordance plot highlights the fact that the word is found in the horror corpus only 109 times, and this is only within 31 stories that were collected. This is a much lower number than we expected, but because of our initial research, we learned that a lot of the violence that was expressed in 18^th–20^th century literature was only represented by symbols, like blood drops on the floor or torn garments or cloth.

Collocates for “violence”:

Collocates1 Collocates2 Collocates3

These three screenshots represent what we found when looking for collocates for the word “violence.” According to Jockers, collocates are words that are statistically likely to occur together (or near each other) in a given document. Because we were focusing on violence, that is the word that we used in the search box for each tab. The collocate that appears the most statistically in the horror corpus next to the word “violence,” then, is “sudden.” This basically means that within this specific corpus, the collocate “sudden violence” is the one that appears the most often, statistically.

Bigrams for “violence” (on left):

Bigrams1 Bigrams2

Bigrams for “violence” (on right):

Bigrams3 Bigrams4

These screenshots show us the top bigrams (both on the left and the right) for the word “violence” in the horror corpus. On the left, mostly articles are found next to “violence.” It is similar for the words found with “violence” on the right, but we are also given bigrams like “sudden violence,” “gentle violence,” and “personal violence.” These three bigrams, in particular, are interesting to us because they show us that although violence was usually represented by symbols and signs, authors during this period did not shy away from presenting violence plainly to the reader.

Concordance plot for “violence”:

ConcordancePlot1 ConcordancePlot2 ConcordancePlot3 ConcordancePlot4 ConcordancePlot5 ConcordancePlot6

The concordance plot is one of the main reasons why we decided to use AntConc for this project. It shows how many times a chosen word (in this case, “violence”) appears in a given corpus (in this case, our horror corpus), in which documents this word can be found, and how many times within each document this word is found. These screenshots show that the word “violence” was found 109 times within the horror corpus, and this was split between 31 documents within the corpus. The highest number of times that the word appears in a single document is 30 times (this occurs in two documents), thus reinforcing the fact that violence was a lot more subtle in 18^th–20^th century horror novels and stories.

AntConc Screenshots (Gothic Corpus):

Applying the same methods that were employed to explore the horror corpus in AntConc, we found similar results for the gothic corpus. After also using Lab 4 as a guide, and the word “violence” as the search term, AntConc revealed that “violence” was used across the corpus at an astonishingly low rate. Much like the horror corpus, “violence” appeared only 239 times across 86 texts, an average of less than three times per text. This information indicates, like the other research conducted by our group, that violence, instead of being outrightly presented, was depicted more implicitly and subtly than we expected in gothic and horror works of the 18th through 20th centuries.

Collocates for “violence”:

Collocates1 Collocates2 Collocates3 Collocates4

This quartet of screenshots exhibits our findings after searching for collocates for the word “violence.” As aforementioned, collocates are words that are statistically likely to occur with or near each other in a given document. The collocates that appear the most are surprising, at least on the surface. The most frequent collocate is “passions,” a word one wouldn’t typically associate with violence, though, by nature, an act of violence is usually committed with some degree of passion. “Passionate violence” makes more sense when considering that passion, aside from its most prominent definition of “strong or overpowering feeling or emotion,” can also mean “a suffering or affliction of any kind” (Oxford English Dictionary).

However, the second-most frequent collocate, “gentle,” is surprising altogether. “Gentle” is a word contradicting violence, and thus, “gentle violence” even seems like an oxymoron. Nevertheless, the word “gentle” fits precisely with our findings that violence was primarily portrayed subtly, or through symbols.

Bigrams for “violence” (on left):

Bigrams (on left) Bigrams (on left)2

Bigrams for “violence” (on right):

Bigrams (on right) Bigrams (on right)2

These screenshots show the bigrams for the word “violence,” both on the left and right, and are akin to those found in the horror corpus. As was the case with the horror corpus, the bigrams on the left are almost entirely articles. Moreover, like the horror corpus, the bigrams on the right are pairings such as “great violence,” “self-violence,” “personal violence” and “crude violence.” These bigrams support the previous interpretation that authors of gothic and horror novels and stories in the 18th-20th centuries did present violence in a forthright manner at times, albeit much more sporadically than we anticipated heading into our research.

Concordance plot for “violence”:

Concordance Plot1 Concordance Plot2 Concordance Plot3 Concordance Plot4 Concordance Plot5 Concordance Plot6 Concordance Plot7

Finally, these screenshots are of the concordance plot for the word “violence.” As stated above, there were 239 total hits for “violence” in the corpus. The highest number of times that the word appears in a single work is 25 (two times), while only seven contained the word 10 or more times. Furthermore, 21 of the documents used the word “violence” two or fewer times. This emphatically follows the overall theme of our group’s research and conclusion, that there was not a substantial pattern in appearance of the term “violence” in 18th-20 century gothic and horror works. Instead, our research evidenced more subtle words that surround the idea of violence to be more prevalent.

——

Tracy Bennett, Morgan Derrick, Edith Dunlap, Teylor Newsome, Gavin Oliver