Lab 4

For this lab, I used the 1790-1850 corpus of State of the Union Addresses, so all of the results that I will be posting below are a reflection of that specific corpus of State of the Union Addresses.

In part four of Lab 4, we were asked to find collocates for the word “future.” Next, we were instructed to record the frequency and stat column values for the bigram “the future.”
- The frequency of the bigram “the future” was 151, and the stat column value was 3.33, proving to be significant, but only slightly as it is barely over 3.0.
We were then asked to record three other frequently occurring and statistically significant bigrams collocates. Here are the ones I chose to record:
1. The collocate “future of” had a frequency of 105 and a stat column value of 3.41.
2. The collocate “future in” had a frequency of 58 and a stat column value of 4.20.
3. The collocate “future and” had a frequency of 57 and a stat column value of 3.47.
In step five of the lab, we were told to search for clusters of words that contain the word “future” and note the results (my way of “noting the results” is to insert screenshots of the results). We were then asked to find the top three most frequent bigrams for “future” for both the right and left sides and record them and their frequencies in our lab report. The first screenshot shows the results for the “on right” search term position, and the second photo shows the results for the “on left” search term position.

As you can see, the three most frequent bigrams for the right side are as follows
1. “the future” with a frequency of 48
2. “in future” with a frequency of 21
3. “of future” with a frequency of 11

The three most frequent bigrams of “future” on the left side are
1. “future intercourse” with a frequency of 4
2. “future, and” with a frequency of 4
3. “future day” with a frequency of 3
In step six, we were asked to search for common bigrams containing the word “future,” and specifically to find and record both bigrams and their frequencies that we selected in step five to use in this step of the lab. Both of these bigrams are highlighted in bolded text in the previous bullet points.
- “the future” with a frequency of 48
- “future day” with a frequency of 3
Next, we were asked to choose one of the two bigrams and find three specific documents where it occurred. I chose to continue using the bigram “future day.” Three specific documents where this bigram occurred:
1. 1801 Jefferson.txt
2. 1841 Tyler.txt
3. 1841 Tyler.txt
We were then instructed to use the File View tab for this specific bigram so that we might note anything interesting that was revealed as we did this. However, I did not notice anything “interesting” that was revealed as a result of this step in the process of the lab.
In step nine, we uploaded the Brown corpus as a reference corpus to Antconc; this allowed us to compare the keywords list from the Brown corpus with the keywords list from the the State of the Union Addresses corpus. We were instructed to take note of three expected words and one unexpected word apart from articles, conjunctions, and prepositions.
- Expected words:
  - “be” with a frequency of 5079
  - “which” with a frequency of 4332
  - “it” with a frequency of 4287
- Unexpected word:
  - “Mexico” with a frequency of 499

In the final step, we were asked to consider a few questions regarding Antconc, corpus analysis, and the challenges that these kinds of data collection present to researchers in literary studies. Like many of the other literary data collection processes we have been through this semester, the process of collecting literary data using Antconc is very tedious. While Antconc is super helpful because it can calculate frequencies of words in entire corpuses in a matter of seconds, recording that data and then analyzing it is the tedious part. Before using Antconc, I didn’t fully understand how many things could be included under the umbrella of “corpus analysis.” In class we’ve studied Michaela Mahlberg’s “Corpus Linguistics and the Study of Nineteenth-Century Fiction,” and as I was completing this lab, I was reminded of when she explained corpus linguistics when she said, “The analysis of concordances is only one of the methods used in corpus linguistics. Other methods include the comparison of word frequencies across corpora, annotating corpora with further information to retrieve specific grammatical patterns, and applying statistical tests to assess the significance of frequency information” (293). While I was completing the lab, this quote really came alive for me as I was able to analyze concordances, compare word frequencies, and really experience the fullness of what I think Mahlberg is getting at here when she is explaining corpus linguistics and corpus analysis.