Robert Brissey

Lab Report 2

English 8120 – Dr. Lindsay Thomas


XML encoding is, at present, a rather ambivalent topic for me. The journey through what, at first, seemed like short letter from the Dorr project turned into a several hour long odyssey into the abyss. Textual encoding, at least at this stage in my development, takes an extraordinarily long time even for such a limited primary text. However, upon completion, I was able to draw several conclusions about encoding, particularly in regards to longer primary sources.

First, the decision of how complicated an encoding procedure, i.e. which items to tag and/or assign attributes to, is most likely dependent on the type of data that is needed and the type of source which is being encoded. Within the context of this letter, the list of items that might be deemed “tag worthy” was relatively short, being limited to the names of people, organizations, places, and key information like dates. Furthermore, the possible list of tags was limited by the length and breadth of the letter’s content itself, as few places, people etc. were listed. Such a task of tagging would expand exponentially in any Victorian novel, as there are far more places and character names, as well as titles and organization of which to keep track. Furthermore, encoding an item such as a poem would require absolute diligence in order to preserve the format of the original. Also, if one were attempting to encode a document with relevant physical attributes, such as parchment coloration or embedded pictures, I can only imagine the XML conversion taking on a much longer (and sinister) quality. For instance, encoding something like a comic strip or graphic novel would present several special issues for the earnest, yet amateur encoder.

Next, I would like to talk about possibilities and limits. The data mining aspects, which were admittedly a bit unclear to me except in the most abstract form, became much clearer as I slogged through Letter 5. The ability to assign attributes to anything the encoder deems noteworthy grants a large amount of individualization to something as seemingly cut and dry as direct coding of an already established source material. Meaning that what might be important or useful to one encoder might vary drastically with another. As a result, repositories of encoded texts would need to anticipate a wide array of needs on the part of the data miner, which translates to considerably more tagging of elements within any and all texts in a given corpus. A potentially limited problem, for my personal scholarship, is the requirement of the texts to appear in the public domain. I intend, as this point, to deal primarily with texts from the 20th and 21st centuries, with the majority of that corpus being unavailable in a free and available format. My concern is that this strategy for data mining might become associated only with Victorian scholarship and earlier. This limitation in particular might increase the reticence of some scholars in incorporating such types of research into their respective composition methodology. I see much greater potential however in the realm of law and journalism, as precedent court cases and perusal of back issues, for each discipline respectively, would be exponentially expedited using XML encoding and tagging procedures.

Finally, on a personal note: I figured out how to encode an ampersand! It took roughly half an hour for one character and I was quite proud when I figured it out.