Lab 4
I used the State of the Union Addresses from 1901-1950 from the State of the Union Address corpus that goes from 1790-2014.
- The bigram for the words “the future” appears 279 times. 187 times on the left of the word future and 92 times on right of the word future. The statically number is 3.77 which is not standardly high.
- Three words that were statically important collocates are “a future” with frequency right at 23, frequency left at 21 and statics at 3.53. Next is “our future” with frequency right at 10, frequency left at 19 and statics at 3.92. Last, is “we future” with frequency right at 23, frequency left at 12, and statics at 4.36.
- The bigrams that were the most frequent on the left are “future of” at 20, “future generations” at 7, and “future years” at 7. On the right are “the future” at 123, “near future” at 12, and “of future” at 12.
- In the Ngram the frequencies of “future generation” was 7 again and the frequency of “the future” was 123 again.
- Every hit that came from the concordance had one hit in about 6 of the documents. The top three that it showed were 1901-Roosevelt.txt, 1904-Roosevelt.txt, and 1907-Roosevelt.txt. Liked the way that the concordance plot made the file available because I would see what the context of the word future generation was being put in. Roosevelt was talking about institutions and formation and how this will affect us not only in the present but in future generations.
- Three expected words were government, legislation, and war. Two unexpected words were agriculture and islands which surprisingly had a keyness of 264.
This lab was by far the most difficult we have done so far. I do not know if it was my old school computer (that is brand new by the way) or the way that the system was working. The Antconc would just freeze out of the blue and then I would have to restart and do the whole process over again. Other than these glitches the lab went relatively smoothly. I enjoyed looking at the frequencies of the words that came up when looking at “future” and the keywords that came with the brown common usage lists. The differences between the two are that the State of the Union corpus focused more on words like government and War. Which is to be expected because these speeches re made by the president about the United States of America and the goals that they have for the year. This was not the case in the Brown corpus.
Antconc has revealed many things to me that make me more interested in the corpus and the way that the system works. There are instances where the word island is mentioned numerous times in the State of the Union Corpus. Also, this is the case with the word Canal and Panama and that go hand in hand. There are multiple other words like these that have a high frequency and a high statistical value that I did not expect in the State of the Union Corpus. I am intrigued and interested now in understanding what political issues were going on in 1901-1950 because I know that World War I and World War II happened in this time, but these were not mention in the high frequencies. Why was canal and island mentioned more frequently than the major wars that went on in this time? This is just one of the questions that I am interested in now because of Antconc and the State of the Union Corpus.
The challenges that I can see Antconc and other systems like Antconc that process corpuses are that they are presenting the idea that this program only is a computer and puts information together based on frequencies and not because the words go together. This means that the computer puts the data of the words that go before and after the word future into terms of frequency and statics, but it up to researches to make sense of the words that they put future with. There is a word in that saw in by bigram that is a good example of this. There is the example of “of future” this was statically relevant word cluster that was in the State of the Union Addresses, but in real life this is hard to understand in English because it hard to make a sentence in which it fits. This is the difficulty I see researchers having. They have to comb through the two words that Antconc gives them and find the word clusters that are statically relevant and make sense in the English language.