Lab 4 – EMD
I am using the 1951-2000 subset of the State of the Union corpus.
Collocate for “the future”:
Rank: 39
Frequency: 317
Frequency (L): 220
Frequency (R): 97
Stat: 4.01962
Three other frequently occurring and statistically significant collocates are: “in future” (freq. L 73, freq. R 43, stat 3.99101), “and future” (freq. L 60, freq. R 55, stat 3.27353), and “to future” (freq. L 66, freq. R 46, stat 3.31430).
When I searched for bigram clusters that correspond to bigrams where the word “future” was on the left, I found that the number one bigram was “future of,” which had a frequency of appearing 22 times within this corpus. The rest of the top three were as follows: “future is” (freq. 11) and “future generations” (freq. 10).
When I searched for bigram clusters that correspond to bigrams where the word “future” was on the right, however, I found that the number bigram was “the future,” which had a frequency of appearing 144 times. Overall, the top five frequencies of bigrams that held the word “future” on the right were a little larger or the same as those in which “future” was on the left. The rest of the top three were as follows: “our future” (freq. 35) and “for future” (freq. 14).
The two bigrams for the next step: “Future of” and “the future”
“Future of” n-gram = 22
“The future” n-gram = 144
The bigram for the next step: “The Future”
File names of three specific documents where “the future” occurs:
- 1951-Truman.txt
- 1953-Eisenhower.txt
- 1979-Carter.txt
Something interesting that the file view reveals is the actual and almost complete context of the bigram (or unigram, trigram, etc.). This is something that you cannot determine when just looking in the clusters/n-grams, collocates, or even the concordance tabs.
Three expected words:
- Administration (freq. 317)
- Allies (freq. 131)
- America (freq. 849)
One unexpected word:
- Again (freq. 189)
REPORT:
Lab four was a lot more difficult (or maybe, tedious) than any of the other labs that I have done so far this semester. Antconc presented a lot of information, but it took me a while to understand the purpose of it. Even while I was completing the lab, I felt overwhelmed and ultimately just wanted to be done with all of the steps.
The State of the Union addresses compare in their uses of language to the “standard” American English found in the Brown corpus in the fact that the words with the higher frequencies in these addresses are mainly articles, prepositions, and conjunctions (which is to be expected). I cannot think of a way in which these two corpora differ from each other, which could tell us something in itself — that the language used in both the State of the Union addresses and the Brown corpus are very similar, which (again) should not be surprising. The language used by the “leader of the free world,” the United States, should reflect that used in “standard” American English, which is represented by the Brown corpus in this lab.
Antconc has revealed to me that in the 1951-2000 subset of the State of the Union corpus, the bigram “the future” was used most frequently in President Carter’s 1981 State of the Union address. By viewing the concordance plot, it can be seen that the phrase appeared 13 times (or got 13 hits) within this speech. Just with the introduction of the lab in class on last Wednesday (since we were able to compare our findings to groups who had different time eras), this tool for text analysis has made me more interested in paying attention to the types of words that are more frequent in different eras. For example, the frequency of the word “future” varies depending on which time period one is analyzing or looking deeper into. The thing that I am most interested in after playing around and experimenting with Antconc is the use of the concordance plot because not only can one view in general terms the frequency of a particular unigram, bigram, trigram, etc., with the use of Antconc, but he or she can also find the exact context of them by just clicking on the hits found in the concordance plot.
Antconc and corpus analysis can more generally do many things for researchers in literary studies. As stated previously, it can be used to both view the frequency of a particular unigram, bigram, trigram, etc., and Antconc can also be used to find the exact context of these n-grams by just clicking on the hits found in the concordance plot that correspond with the phrase that is being searched. Another advantage of corpus analysis is the fact that corpora can be compared to each other in order to observe more general ideas and themes that could be explored across different texts, time periods, etc. Among the answers that Antconc provides are: the frequency of n-grams in a text and/or texts, the most significant words that appear around a specific search word, and the context in which a specific n-gram can be found directly in the text.