I picked Detective Novels and Short Stories from the Golden Age (Late 19th-Early 20th century)

The Moonstone by Wilkie Collins: http://www.gutenberg.org/cache/epub/155/pg155.txt

The Purloined Letter by Edgar Allan Poe: http://www.gutenberg.org/cache/epub/2148/pg2148.txt

The Hound of the Baskervilles by Sir Arthur Conan Doyle: http://www.gutenberg.org/cache/epub/2852/pg2852.txt

The Riddle of the Frozen Flame by Thomas E. and Mary E. Hanshew http://www.gutenberg.org/cache/epub/17180/pg17180.txt

The Triumphs of Eugene Valmont by Robert Barr http://www.gutenberg.org/cache/epub/19369/pg19369.txt

Scarhaven Keep by J. S. Fletcher

http://www.gutenberg.org/cache/epub/9807/pg9807.txt

The Hand in the Dark by Arthur J. Rees

http://www.gutenberg.org/cache/epub/20546/pg20546.txt

The Sleuth of St. James’s Square by Melville Davisson Post

http://www.gutenberg.org/cache/epub/2861/pg2861.txt

The Angel of Terror by Edgar Wallace

http://www.gutenberg.org/cache/epub/21530/pg21530.txt

The Mystery of the Boule Cabinet by Burton Egbert Stevenson

http://www.gutenberg.org/cache/epub/10067/pg10067.txt

I began this lab by exploring Project Gutenberg’s “Detective Fiction” category and selecting a few titles with which I am personally familiar. I decided to make this the body from which I would derive my smaller corpus because I enjoy this kind of fiction and because my research interests sometimes concern novels and short stories of this kind. Selecting the specific works was a matter of finding a mix between the familiar and the unknown. I selected The Hound of the Baskervilles and “The Purloined Letter” because they are known to me and quite popular. Conversely, I selected Scarhaven Keep and The Mystery of the Boule Cabinet for the opposite reason. While I cannot comment with any authority on which of these works might be considered prolific or not, I made my selections based on a general opinion formed from my own experience.

For the metadata spreadsheet, I chose to include author name (first, middle, last), date of birth, date of death, nationality, gender, the title of the work, the location of the plain-text file that corresponds to the specific work, the original publisher of each work, the publication year, and the site from which I procured the plain-text files that make up the corpus. Biographical information provides a solid basis upon which to build an analysis of the authors that write this sort of fiction and the larger trends that might emerge from their various ages, national origins, and lifespans. The publication data is included to explore what insights might arise from acknowledging the original publishers and considering their mode of publication (magazine, periodical, traditional press) as a lens for viewing different kinds of detective fiction. The original date of publication helps to separate these recompiled and collected works from one another in a way that merely referencing the database from which they are gathered could. I suppose a good way to encapsulate the process of selecting these metadata fields is to consider their collective existence as a hopeful overture into what I felt might emerge. For example, in considering the metadata in the spreadsheet, I noticed that a number of the works in the corpus were published in the same year and were written by authors that died within a few years of one another. This is especially relevant considering that, in searching for the metadata here compiled, I came across the label “Golden Age of Detective Fiction.” This label meant something to me before I began this project, but the metadata gives it a more concise and specific definition than was perhaps possible with the colloquial noises I would make to describe it.

I am struck right away by the potential benefits that even this basic exercise might have to my research process. Moretti’s claims regarding the capacity of distant reading to characterize generational shifts in reading and writing practices seem to be right on target in the light of my (admittedly small) corpus. Reading closely tends to mean shutting out these ancillary details in favor of plunging into some rich vein of textual evidence contextualized and made vibrant by theoretical discourse. Before now, I never gave a second though to the publication information I plugged into Ref Works so that I could get down to the real business of writing papers. Just looking at the metadata spreadsheet has me crafting lines of inquiry involving holding author productivity up to year of publication to make conclusions about the kinds of detective fiction that were more common or well-received in a given year or from a given author.

However, it is important to note that the task of assembling the ideal body of texts to make this reading (or even find out if it might be valid) is immense. Though I know little about the tools one might apply to bring such a project to fruition, I do know that one would need to find the texts and assemble the corpus first. My corpus is made up of at least some random selections. In compiling my texts this way, I hoped to cast light on the unknown because I had no idea where I should start. And while I find myself fascinated by the hint of a new perspective on literature, there can be no substitute for the extrapolations that emerge from the act of reading a text and discerning movements within that may translate without. Indeed, I started the corpus with Wilkie Collins’ epistolary The Moonstone because it reveals much to the reader that can analyze the patterns of information that unfold within its pages.

The conceptual difficulties are not the only ones. In making use of databases like Project Gutenberg, one is left to decide whether or not to include the copyright information posted at the bottom of each text file. Granted, the same disclaimer in each text in the corpus could be easily found and ignored in the results of data mining. But what if one were to procure texts from more than one database? Do the differing disclaimers then become distinct parts of each text? It is easy to shrug this off and say the copyright information cannot possibly matter. However, its placement within the text files reacts to the shape of the text and therefore could be informative in some forms of macroanalysis. Much like in traditional close readings, there really is no firm guide as to what is and what is not worth looking at.

It seems almost as though the solutions of distant reading create just as many problems. To elaborate, one could start by looking at all detective novels and short stories produced between the years 1870 and 1930. However, this corpus would still miss the formative texts and trends that came before and would fail to realize the resonances in any that came after. But one cannot simply analyze all texts! And still this seems like an inevitability in the epistemological arms race that Moretti kicks off with his incendiary claims.