Corpus Analysis – lab4 | Digital Literary Studies ENGL 4590 Spring 2016

I had the collection of 2001-2014 list to go off of. This was probably the most challenging lab for me so far. I might not even have my stats right, but I tried my best to figure it out.

For “the future” under the category of “collocates,” I got a of rank of 2, frequency of 34, frequency(L) of 0, frequency(R) of 34, and a stat of 2.88325 (I just chose the numbers next to the word “future” that showed up in the collocates tab after doing a search).

I found three other frequencies consisting of:

“In future”: rank = 4, frequency = 12, frequency(L) = 10, frequency(R) = 2, and stat = -2.59352

“We future”: rank = 6, frequency = 11, frequency(L) = 5, frequency(R) = 6, and stat = -2.90903

“To future”: rank = 7, frequency = 16, frequency(L) = 11, frequency(R) = 5, and stat = -3.06521

I searched “future” under clusters/N-Grams and my results were as follows: future appeared on the left side of the cluster with a rank of 11, the most being in the form of “future of” (frequency = 16), “future generations” (frequency = 11), and “future is” (frequency = 9). The right side had an equal rank of 11, the top three being “the future” (frequency = 34), “a future” (frequency = 14), and “our future” (frequency = 14).

Selecting N-Grams gave me a frequency of “a future” = 14 and a frequency of “future is” = 9.

My bigram I chose for the next step was “a future.” This showed up in the files: 2003-Bush, 2007-Bush, and 2009-Obama. With the file tab, I found it interesting that it gave the bigram in each of the selected files, which would be hard to determine if we weren’t using the AntConc as a guide.

Expected words:

Nation (frequency = 174)
Government (frequency = 159)
Economy (frequency = 154)

Unexpected word:

Continue (frequency = 69)

Report:

As I said in the beginning, this lab was really confusing for me to figure out. It took me far too long to understand what I was doing and I had to look up a lot of tutorials for AntConc. This was the hardest lab for me in the semester so far. I tried my best and I hope my stats came out correctly, but I felt pretty overwhelmed with the amount of steps in this lab. I’m not sure what the differences are between the State of the Union addresses and the “standard” American English in the Brown corpus. They are both made up of articles, conjunctions, and prepositions for the most part. I believe one difference might be that the State of Union addresses were concerned more with references towards law and government terms than what regular English language concerns itself with. There is a more aggressive tone in the State of the Union addresses regarding war whereas the English language does have war vocabulary, but doesn’t focus mainly on that aspect and contains the whole.

AntConc taught me that there is a way to count the frequency of certain words through a whole text in better understanding of the important information. I think of Mahlberg’s article of Corpus Linguistics when she stats, “Corpus software is used to quantify linguistic phenomena and display data so that the researcher can investigate linguistic patterns” (Mahlberg, 292). There is a certain pattern to the similarities between the Brown corpus and the State of Union addresses. These patterns tell us that software such as the AntConc can really help to shed some light on revealing frequencies and groupings of the important information to a text. Jockers is another I would reference here in terms of categorizing important information away from the simplicity words such as “the” and “to.” Jockers discusses theme through the argument of word clusters. “A typical about computational stylistics is that such studies fail to investigate the aspects of writing that readers care most deeply about, namely, plot, character, and theme” (Jockers, 118). By using something like AntConc, it is easy to find structures such as these that Jockers mentions. Instead of searching the whole text, we can search for key words and main themes about a text.

AntConc is a great tool to be used for literary studies for the brilliant design of keeping a word count on any search by picking out the important information and discarding the use of filler words, meaning “the,” “therefore,” etc. The thing about AntConc; however, is that (from my experience with it) it only shows word counts and not necessary what those words mean to the text as a whole unless you’ve already done research on the text in question. For example, I’m not big into politics and couldn’t tell you anything about it if asked, but I do have a general understanding of it. Focusing on a word like “future” in a presidential speech made sense to me because many speeches contain talk of the future; however, if the topic was different, say, discussions on the technique terms of what makes up cat food or something, then I’d be lost as to what words were important to look up in AntConc for that text to make sense. The topic in question needs to be researched first and the researcher needs to have an understanding of the theme to gain the most from a corpus software such as AntConc.