Lab 4

For this lab we had to use Antconc to analyze large collections of texts; in this case, a corpus of State of the Union addresses from 1790-2014. (I used the just the texts from 1951-2000.)

The first thing I did was look at the collocates of the word “future” and found the frequency and statistical significance of “the future.”

“The future”
- Frequency: 317 (220(L), 97(R))
- Stat. value: 4.01962

Then, I found these three other significant collocates:

“generations”
- Frequency: 11 (1(L), 10(R))
- Stat. Value: 8.30176
“near”
- Frequency: 9 (9(L))
- Stat. value: 8.20490
“present”
- Frequency: 11 (9(L), 2(R))
- Stat. value: 7.35690

I then looked at clusters of words that contained “future” and occurred at least 3 times. These were the top three bigrams I found (both to the left and the right) along with their frequencies.

Left:
- “future of” (22)
- “future is” (11)
- “future generations” (10)
Right:
- “the future” (144)
- “our future” (35)
- “for future” (14)

I used “future generations” and “our future” as a focus for the next step of searching for common “future” bigrams by sorting them by word. I found “future generations” occurred 10 times and “our future” 35 times.

Then, I used “our future” to find three specific texts where it occurs within the certain texts by using the concordance tab. “Our future” occurred in these three texts (and many more).

1953-Trueman.txt
1997-Clinton.txt
2000-Clinton.txt

To get a closer look, I then went to the concordance plot tab and saw that the 1997-Clinton.txt had five hits, so I clicked on the forth one for more context. This then took me to the forth instance in the 1997 document where “our future” was used. It was from a selection that said, “education is a critical national security issue for our future and politics must stop at the school house door.”

Finally, to get an idea of what words occurred fore frequently in these 1951-2000 State of the Union texts, I used the Brown corpus as a reference corpus to compare frequently used words. These were three expected words and their frequencies that came up more often in the State of the Union texts than in normal English language.

“America” (849)
“Government” (803)
“Congress” (901)

One word that came up more frequently that I though was unexpected was “children,” occurring 335 times.

Through the process of using Antconc to compare and analyze data across a large corpus, I’m finally starting to really see how using digital methods like this one can be very valuable in studying huge amounts of text. The Antconc program is definitely my favorite so far, and though I liked how the other tools like Voyant and Lexos from lab 2 can give more of a visual experience of looking at what’s in the text, I found that with Antconc, it was easier for me to understand what was going on (granted I’m not really much of a visual person, and I also have some experience in statistics, so the some of the elements like statistical significance made a bit more sense to me).

While the other tools give a big picture of specific works, like Only Revolutions, I think Antconc is a lot more useful when looking for general trends that you might be interested in engaging further with and looking at critically later. Rather than looking for instances of words such as “us,” “together,” or “both” just in Only Revolutions, you could look at a large corpus of contemporary works (assuming you could find a suitable contemporary fiction corpus) to see how the uses of these words in Only Revolutions measure up compared to the general trend of their use in contemporary works as a whole. This way, you could see if Only Revolutions is unique in this trend of using such words (which I would assume it would be) and decide you might want to explore it further, or if it’s actually a common occurrence among contemporary fiction.

Another thing I really liked about Antconc, which Michaela Mahlberg references in her article “Corpus Linguistics and the Study of Nineteenth-Century Fiction,” is that with this program you can focus on clusters of words that occur with frequency together rather than just individual words. One of the features I found particularly helpful was that you could adjust the minimum frequency of a cluster of words before pressing start so that it would only give you a list of word clusters that actually did occur more than once (or, as in the case of this lab, five times). The statistical significance numbers also make sure that you can recognize which clusters of words you’re looking at are significant, and you can learn even more when you compare them to other corpora (like we did with the Brown text in the lab).

Mahlberg emphasizes how these clusters can work in analysis, and while she focuses on using clusters to identify “suspensions” where the narrator interjects in fiction novels, I can see how they are also have value in other kinds of literary analysis, like looking at the State of the Union addresses as we did in the lab. By looking at some of the bigrams that I identified, like “near future,” “present future,” and “our future,” and how frequently and significantly they come up in these speeches, it’s easy to start hypothesizing things about these time periods; like perhaps, from 1951-2000, presidents used these words in their speeches that suggested focusing on us in the present to promote ideas of how they were going to promote positive changes within the country, quickly, and in the present. You could then look further into the history of this time and find out what sorts of national problems were occurring that would require such proactive measures toward quick improvement. Using the clusters and other resources that Antconc offers allows for ideas and hypotheses like these to be generated quickly and eaily, which is one reason why I think it can be so valuable for all kinds of literary analysis.