Robert Brissey Lab 1 Complete

Robert Brissey

Lab #1

English 8120 – Dr. Lindsay Thomas

Walden, and On The Duty of Civil Disobedience. Thoreau, Henry David. http://www.gutenberg.org/files/205/205-0.txt
The Tragedy of Julius Caesar. Shakespeare, William. http://www.gutenberg.org/cache/epub/1120/pg1120.txt
The Communist Manifesto. Marx, Karl and Friedrich Engels http://www.gutenberg.org/cache/epub/61/pg61.txt
The Marriage of Heaven and Hell. Blake, William. http://www.gutenberg.org/cache/epub/45315/pg45315.txt
Dracula. Stoker, Bram. http://www.gutenberg.org/cache/epub/345/pg345.txt
Leaves of Grass. Whitman, Walt. http://www.gutenberg.org/cache/epub/1322/pg1322.txt
Dramatic Romances. Browning, Robert. http://www.gutenberg.org/cache/epub/4253/pg4253.txt
The Cask of the Amontillado. Poe, Edgar Allan http://www.gutenberg.org/cache/epub/1063/pg1063.txt
The Count of Monte Cristo. Dumas, Alexandre Pere. http://www.gutenberg.org/cache/epub/1184/pg1184.txt
The Adventures of Sherlock Holmes. Doyle, Sir Arthur Conan. http://www.gutenberg.org/cache/epub/1661/pg1661.txt

Robert Brissey

Lab Report Write-Up #1

English 8120 – Dr. Thomas

2/1/2016

First, I intentionally chose Anglophone, male writers, because the majority of my current scholarly pursuits revolve around such writers, and these are among my favorites. Each selection represents a text that I have read at least once, and thus am at least passingly familiar. Furthermore, several of these selections will in all likelihood be involved in my corpus for my thesis.

I chose the following for my metadata fields:

ID – Required reference tag for each entry

Author ID – Required reference tag for each author

Author Last Name – Lists the surname of the author

Author First Name – Lists the first name of the author

Author Middle Name – Lists the middle name of the author if available

Author Gender – Indicates the biological sex of the author

Author Nationality – Indicates the country of origin for the author, regardless of where they may have lived and/or composed their work

Title – Lists the given Title of a selection

Genre – Lists the literary genre of the text

Literary Movement – Describes the text in terms of literary period/movement/era

Composition Type – Describes the format of composition of a text, varying between prose, various types of poetic verse, and drama

Publication Year – Indicates the alleged year(s) of publication of a particular selection

Text Link – Provides a link to the plain text document on Project Gutenburg

I chose the preceding selections mostly out of sheer pragmatism. The majority of the categories are almost entirely required for any such undertaking of digital text corpus creation. The non-essential categories, or the categories that could be removed without making an incomplete corpus, allow for discussion of the wide range of contexts, particularly in regards to country of origin, gender, genre, publication year, and literary movement. Cross references of such categories across a larger corpus would yield information regarding male dominated text types, differentiated by year, origin, genre, etc. Among my projects, I wish to chart the progression of the 20th century American male, but the background information provided by such broad studies could very well inform on such scholarship.

The data collection process was a simple, if slightly tedious endeavor. It took longer to decide upon the texts for such a limited corpus than to input the data for said corpus. The majority of the information came from Wikipedia and/or author websites, particularly regarding literary movement and publication information. The most frustrating aspect of the data collection was when I encountered dissenting opinions regarding the proper categorization of a text, in terms of genre, literary movement, and in many instances, publication year. It was easy to forget just how almost wholly subjective data can be, even supposedly concrete items such as publication information. With that in mind, it would certainly be a task designated for teams of people to create and verify the information in a much more complete corpus, with more subjective metadata fields and far more texts. This report has been completed after the readings regarding what constitutes data, particularly the Raw Data is an Oxymoron introduction, and I begin to see the objections and contentions raised in the readings. Essentially everything character is a form of data, and particularly in the field of the humanities, such minutia is going to be scrutinized, debated, disagreed upon, and usually end up unsettled. Alas, such is the life of an English scholar.