Lab 4 – Massey | Digital Literary Studies ENGL 4590 Spring 2016

Lab 4 Results

I. Future Collocates from three left to three right

	Frequency	Stat
future the	45	3.28764
future genrations	11	8.98206
future security	5	5.02460
future hope	5	6.3112

Table 1

II. Future Bigrams

	Side	Frequency
future of	left	16
future generations	left	6
future is	left	4
the future	right	12
a future	right	6
our future	right	8

Table 2

III. Frequency of Two Bigrams (Step 6)

Frequency
Future generations: 11
Our future: 14

IV. Comparison & Keyness

Expected:

America: 326.087
People: 253.647
Us: 193.783

Unexpected:

Applause: 636.873

V. Response

With lab 4, we begin to see the finer points of data analysis. Before this, we were able to see the data in a more visual sense, but the exact figures behind the meaning of a word in a cloud have been hidden behind the assortment of colors and sizes.

Anconc provides a deeper experience with the words and their relationship. When viewing the State of the Union within Antconc, at least in the specific sense of the word ‘future,’ we can see how words interact. The number may not make complete sense, but we can at least compare those figures to one another. In Table 1 above, we can see that ‘future’ and ‘the’ appear with much greater frequency than ‘future’ and ‘hope,’ ‘security,’ and ‘generations.’ However, the stat column shows us a different relationship between the number of times an ngram appears and the number of times the secondary word appear with other words.

Anconc focuses on the relationship between words and accomplishes in seconds what would take a team of people hundreds of hours. It illustrates the sheer scale of material that needs to be mined using miniscule shifts in data. Most apparent is the change modifying the range by one word makes on the entire result. Yet the data on screen can only tell us so much. It must be interpreted. If I cannot explain what exactly a particular piece of data or result means, how can I hope to analyze it? We come back to how necessary it is for literary scholars to understand what is going on behind the GUIs. They may find this challenging as they are already investing large amounts of time into the interpretations of other, equally significant data.

While going through the final steps of this project and comparing the State of the Unions from 2001 to 2014, I became aware of the divergent language used based on the context of the work. Let us not think of this in terms of persuasive speeches versus, say, fantasy novels. Of course those two would make use of different language, but with this comparison we are looking at commonly used words against the words within the State of the Unions. The fact that the two diverge quite a lot can lead to quite a few different interpretations. Perhaps it is just different language in speeches, but the difference could also be attributed to the change in the English language in forty years (the Brown corpus was compiled in the 1960’s). These are not the only interpretations, as this data could be used to fit any number of hypotheses.

If this data can fit many different agendas, how can we judge its usefulness? I can say that this State of the Union data tells me that we are concerned with different facets of ‘future’ in 2001 to 2014 than we are in any other timespan, but someone else can say that the actual usage of the word future has increased in the common vernacular. Like any other science, we can posit theorems that need not be widely accepted. It is once we collect a mass of information multiple times from multiple sources that we can create literary law.