I chose to collect data from the fantasy section of the Gutenberg website for my metadata analysis. I didn’t find it too hard when selecting my eight pieces and converting them into a plain text style, but when it got to the part about organizing the information into Excel, it got a bit trickier. I wouldn’t say it was hard, but it was tedious and took me a good hour by going back and forth with checking data, names, and dates. I think a big challenge for researchers would be making sure that their information came from the Public Domain and not from a private copyright source. They would need to make sure that their sources were available to use.

To my understanding, literary data means the data that is the core of the text (what’s there to inform the reader). My meaning here is that the core text is the most important information: information of the author, information of the titles, background information, etc. These aspects of literary data are important as a whole because it gives textual evidence to the text. Jockers’s piece on Macroanalysis goes into explaining about how graphs and charts can help readers grasp onto textual evidence and see the important information of a text or the “meat” of the text if you will.

Mahlberg’s Corpus Linguistics and the Study of Nineteenth Century Literature gives use a good understanding of what a corpus is and how to use it. A corpus is a collection of texts that show a pattern when placed together and also a series of literary data. Mahlberg states, “Corpus software is used to quantify linguistic phenomena and display data so that the researcher can investigate linguistic patterns” (Mahlberg 292). There is a certain pattern that the researcher needs to follow when collecting literary data. For example, the data needs to come from the Public Domain and would be made simpler if the texts were of the same genre and branch of text (meaning fiction, non-fiction, etc.). We created a corpus of a specific branch of texts under the fiction section and collected from a genre that interested us.

I think the process of literary data is about collecting the important information from a text that is there to inform the readers about the text such as author names and titles. Jockers tells us that the important information is at the center of a text or what is most important to the text as a whole (what informs us). “For this, research, I have found it useful to visualize the topic-word distributions as word clouds. This has the effect of accentuating those words that are most central to the topic while pushing the related but less central words to the periphery of the visualization” (Jockers 130). The important words to a text are solely important to that specific text and can be used to connect to other texts through literary data collecting.

I always thought that literary data was easy to get to because of the Internet and how it gives you access immediately to information such as the certain requirements we used to collect our data. I myself had to use a search engine to find dates of publication and nationalities to some of the pieces for excel. It was fast and it was easy and I would often get a little biography on the author with every search too.

Honestly, I learned a lot from this lab. I didn’t know the tedious process that collecting literary data and translating them into plain text could offer. The amount of steps with the Excel process was also a learning experience as it took most of my time to place my texts in order, but in the end it was a rewarding feeling to see that hard work put into an orderly fashion of collected data.