For this lab, the process of collecting literary data was certainly different than usual. Typically, in order to collect data about literature my first idea would not be to locate plain text files and create a mini corpus. However, after completing this, I have an appreciation for corpus collecting as a method of textual data gathering. While gathering all of this information, I didn’t run into too many issues. For once, my computer acted as it should and I had no trouble getting the plain text files off of the Gutenberg website and onto TextEdit. Once I checked out the key for the metadata mini corpus, it was easy to transfer the data into the excel document and upload the folder to the Box website.

“Literary Data” has multiple definitions. For the purpose of a corpus, the literary data in focus is centered more on the superficial characteristics of the data–for example, author, title, date published, etc. While this is important for gaining a larger understanding of the literature of a certain time period, it does little to tell us about the context and the substance of the work. For the purpose of dictating a literary work’s substance, a different type of literary data is required. This type of literary data could be internal structure of the text, topic, and actual notes from the readings. Though different, each method has value depending on the type of information the analyzer is looking to mine.

The importance of literary data can be referenced in Ted Underwood’s blog post, “We don’t already understand the broad outlines of literary history”. In this piece, Underwood writes, “For instance, I haven’t figured out yet why numbers are common in first-person narrative, but I suspect it might be because there’s a persistent affinity with travel literature. As we follow up leads like that we may discover that we don’t understand point of view itself as well as we assume.” Underwood discusses the importance of point of view in terms of a greater historical context and furthers a point about how most literary scholars and students assume they are as well-versed in point of view as possible when in actuality we are not as knowledgable as presumed.

The above quote is interesting because of the greater implications it might have on literature of the time. Personally, I would not have considered the high amount of numbers in first-person narratives to come to represent an excess of travel during the time period he selected, but it makes sense when you take a step back. For example, two literary masterpieces that are canonically studied across universities, Robinson Crusoe and Gulliver’s Travels, are entirely about traveling, yet the connection between the two first-person narratives isn’t necessarily obvious when you are focused on a close reading rather than a distant, data-gathering method.

With text-gathering done through creating a corpus and mining literary data, we are opened up to new points of view that allow readers and scholars alike to draw parallels between various works instead of focusing solely on one piece of literature and attempting to glean universal data from that single work. Though both have their pros, quantitative literary data can be exceptionally effective for anyone looking to discover overarching themes in a selected time period or genre.