ENG 612/MLL 772 Topics in DH: Humanities Data Spring 2022

Final Project

Due:

  • Abstract: Wed, April 6 (post to website)
  • 5-6 minute presentation of final project in progress: Wed, April 27
  • Proof-of-concept dataset, codebook, and critical introduction: Mon, May 9 (post to website)
  • Final self-assessment: Mon, May 9 (email me your self-assessment individually)

Your final project in this class is a scholarly dataset and a critical introduction to that dataset. You may work with other members of the class to complete the final project. If working with others, your team will turn in 1 copy of your proof-of-concept dataset (i.e., you don’t each have to turn it in to me). You will also have a choice about how you would like to write your critical introduction: each member of your team may write their own, or, alternatively, you may turn in a collaboratively written critical introduction (or some combination of these two options). If you opt to collaborate with others on the final project, it will be important for your team to discuss with me how you plan to organize your labor so that all members contribute equally to the final product.

If you are already working on a project of your own involving data and/or the creation of a scholarly dataset, you may continue that work for your final project. We will work together to determine what exactly this work will look like so that it fulfills the requirements of this assignment.

Planning your Dataset and Writing your Abstract

Your goal is to create a proof-of-concept scholarly dataset that could be used to answer research questions in a particular field/subfield of humanistic inquiry (ideally, in your own field(s) of research). For many of you, given the focus of this class, this dataset will likely consist of works of literature, films, social media posts, and/or cultural artifacts of various kinds (whether contemporary or historical). Alternatively, you may wish to create a dataset including ethnographic data of various kinds (survey results, interview transcriptions, etc.), though you should talk to me well in advance if you are interested in this option. There is no minimum number of records that your dataset needs to contain, nor is there a minimum number of metadata fields – these numbers will vary depending on the data being collected, the methods of collection, the information available, etc. – but you should strive to make the theoretical version of your dataset as complete and as fully imagined as possible. However, the actual dataset you turn in to me will likely only be a small(ish) subset of this larger, fully imagined dataset. This is what I mean by the term “proof-of-concept dataset;” because you likely won’t have time to collect all of the data that your (fully imagined) dataset contains, you may wish to focus on collecting (and organizing, describing, etc.) only a subset of this data by the end of the semester.

How you collect this data is up to you, but you should take considerations about data collection very seriously when deciding what your dataset will be. You should think hard about what kind of data it will be possible for you to collect in the time that you have to complete this assignment. When deciding what data you want to collect, consider the following criteria:

  • This dataset should not already exist.
  • Your dataset should be conceptually meaningful, meaning its entries should be grouped/collected logically and according to explicit criteria. You should keep the needs of two sometimes competing audiences in mind when creating your dataset: 1) Scholars in the particular field(s)/subfield(s) in which the dataset is located, i.e., content experts; and 2) Other humanities researchers who may wish to use your data in their own projects to answer questions you may not be fully aware of, i.e., general use experts.
  • You should be able to collect this data ethically and transparently. This means you should be aware of copyright and/or fair use restrictions (if applicable), human subjects protocols (if applicable), and other potential barriers or complications to collection.
  • You should have ideas about how you would scale up data collection efforts if you had the time (and/or the money) to collect the full dataset (instead of just a subset, like you are doing for this class).

You (and/or your team, if you are working collaboratively) will turn in an abstract describing your plans for your dataset by Wednesday, April 6. You should post your abstract to your website (or to a partner’s or team member’s website). This abstract should include the following:

  1. If you are working with a partner or a team, who your partner or teammates are.
  2. The title of your dataset.
  3. A brief description of the kind of data your dataset will contain, how you plan to collect it, and your dataset’s boundaries/scope (~2-3 paragraphs). What is included or excluded from your dataset, and why (i.e., what are the inclusion/exclusion criteria)?
  4. The titles and brief descriptions of each metadata field in your dataset (i.e., an initial draft of your dataset documentation/codebook). This will likely change (and expand) as you work on your dataset, but you should have an initial plan.
  5. The audience(s) for your dataset. Who are you creating this dataset for? Again, you should think in terms of the overlapping (and often competing) audiences of content experts and general-use experts here.
  6. Several questions your dataset could help these audiences answer.
  7. A list of 2-3 already existing related scholarly datasets and an explanation of how your dataset offers a unique contribution/how it is different from these existing datasets (~2-3 paragraphs).

Creating your Dataset

You can create your dataset in whatever format makes the most sense for your data (my guess is that most of you will choose to present your data as an Excel/Google sheets spreadsheet, but this is by no means the only option). If you use a spreadsheet to organize your data, each record should be 1 row of your dataset, and your metadata fields should comprise the columns of your spreadsheet. As always, please let me know if you have questions about the best format for presenting your data.

When you turn in your dataset, you should also turn in a codebook, or documentation of each metadata field included in your dataset and a brief description of what that field means/the kind of information it records. You can turn this in as a separate list, as a separate tab in your spreadsheet, as an appendix to your critical introduction, or in whatever format suits your data/project the best. We will look at some examples in class.

Presenting your Dataset in Progress

Our class period on Wednesday, April 27 will be devoted to quick presentations of your datasets in progress. In these presentations, you should describe your dataset (including the kind of data it contains, its boundaries/scope, and its metadata fields) and quickly contextualize this dataset in relation to other existing scholarly datasets, the questions it allows researchers to answer, gaps in existing fields, etc. In brief, your presentation should answer the following 3 questions: 1) What is this dataset?; 2) How are you collecting it?; 3) Why is it significant?

If you are working on your final project individually, your presentation should be about 5 minutes long (I will warn you at 5 minutes and cut you off at 6 minutes). If you are working on your final project with others, talk to me in advance about the length of your presentation and its content, as team presentations may need more time.

Writing your Critical Introduction

After collecting your dataset, you will write a ~2500-3000-word critical introduction to your dataset. You may organize your critical introduction how you choose, but it should contain the following elements:

  1. A description of your dataset and (brief) documentation of how you collected it so that your collection efforts are reproducible (to the extent this is possible). Depending on what your data collection process entailed, you may wish to include discussion of data collection as a technical appendix to your critical introduction.
  2. An examination of the affordances and limits of your dataset, of the curatorial choices you made in creating your dataset, of the questions it allows researchers to ask, and/or of what other issues, questions, and/or data in its field(s) it is in conversation with. This may include some initial exploratory analysis of your dataset, though it need not. Basically, I am asking you here to contextualize your dataset in relation to existing scholarship, to discuss any unique features or affordances of your dataset, and to argue for its overall significance.
  3. A reflection on 1-2 issues, problems, or larger concepts that creating this dataset helped you to understand or to think about more clearly. What did this process illuminate for you, either about the data you chose to collect specifically or the process of data collection more generally or the concept of data itself? You should relate this discussion to at least 1-2 readings from our class, though you may also include other texts as appropriate.

While this specific genre of paper may be new to you, what I am asking for here still involves research. This means your critical introduction should demonstrate knowledge of its field (i.e., post-1945 US literature, or what have you), and it should contribute to knowledge in this field. It should include a works cited page/bibliography. You may wish to explore the pieces published in the “Data Sets” section in the Journal of Cultural Analytics or the Post45 Data Collective Peer Review Criteria to get a sense of the range of things you can discuss in your critical introduction.

If you are working with a team and you and your team plan to co-author the critical introduction, you should speak to me in advance about your plans for writing this document to ensure equitable distribution of labor. Collaborative critical introductions may be longer in length.

Turning in your Final Project

You (and/or your team) will post your critical introduction to your (or a specific team member’s) website as its own Markdown (.md) file. You can turn in your dataset in a way that makes sense for your data: you may want to send me a link to a Google sheets spreadsheet or a GitHub repo, share a folder with me on Dropbox, email me the dataset, etc.

You should name your critical introduction using the following convention:

  • YYYY-MM-DD-finalproject-WHAT-WORDS-YOU-WANT.md

Final Self-Assessment

Please assess your effort and performance across in-class discussions and the course assignments as they relate to the goals you set for yourself, your work to meet those goals, and your intellectual growth during the semester. If you collaborated with others on your final project, please also include an assessment of your contributions to the final project. Finally, please include the letter grade you would give yourself for the semester. There is no specific length requirement, but you should be detailed and specific.

Each member of the class should individually email me their final self-assessment when they submit their final project (i.e., do not post your self-assessment to your website).