Lab 4

I used the 1901-1950 State of the Union corpus for this lab. First, some requested stats:

The bigram “the future” has a frequency of 279, but a stat value of 3.77. This is barely statistically significant, despite the high frequency in which it appears.
Some statistically significant collocates were “future generations”, as we discussed in class, “near future”, and “immediate future”. This might show that the 1901-1950 speeches were more concerned with the near future than the far future.
The most frequent bigrams where future is on the left are “future of” with a frequency of 20, “future generations” with a frequency of 7, and “future years”, also with a frequency of 7. On the right, they are “the future” with a frequency of 123, “near future” with a frequency of 12, and “of future”, also with a frequency of 12.
In the n-grams search, “future generations” has a frequency of 7, just like in the words search. “Near future” is the same with a frequency of 12.
The term “near future” appears in 1906-Roosevelt.txt, 1910-Taft.txt, and 1947-Truman.txt. 1947-Truman.txt has 2 hits, one referring to an investigation of labor disputes and the other saying he will communicate with Congress to create a single Department of National Defense. Most of the hits that I found refer to something the president plans to do in the “near future”.
When comparing with the Brown frequency list, some expected words that pop up more often than normal are “government”, “congress”, and “war”. One word that’s a little surprising is the keyness of “I”. It has a keyness of 345.

As you can see, I used keywords in a similar way to what Michaela Mahlberg did in her study of corpus linguistics/stylistics. She says, “Key words can provide a first overview of a text, pointing to words that are potentially useful for more detailed analysis” (“Corpus” 296). We generated a list of words in Antconc to do just that. As she looks at the context of “civility” in Austen’s works, so too do I look at the context of “future” in the State of the Union addresses.

What I learned from this analysis of the term “future” is that it can be concerned with either the far future (“future generations”) or the “near future”. When “near future” is invoked, the president is usually referring to something ongoing in the government or something he plans to do soon.

Unsurprisingly, there are far more references to government and law-related terms in the State of the Union addresses than in regular English. War is also a lot more frequent than peace in these texts, which also makes sense as this time period encompasses two World Wars. I’d like to explore the usage of war and peace more in the future perhaps (heh, future).

Malhberg also writes, “…the value of applying corpus methods is defined through the links that can be made between quantitative findings and qualitative analysis” (295), which is true. For example, I can tell you that “I” has a keyness of 345, but it’s effectively meaningless right now unless I make it clear what keyness represents. According to Heather Froehlich, “Keyness is a statistical measure of “the frequency of a word in the text when compared with its frequency in a reference corpus.” So that keyness of 345 says that “I” is more frequent in the State of the Union addresses than the Brown frequency list. But if I really wanted to say what “I” was doing in the speeches, I’d have to look at specific instances of the term.