Data Critique
15% of course grade
Due: Tuesday, February 10 by class (5:00 pm) to Blackboard**
Over the past several weeks, we’ve discussed different definitions and understandings of data; how data relates to other kinds of knowledge production, like narratives; and the importance of considering the historical and social contexts of technological production. In the next couple of weeks, we will also be examining the ethical and political dimensions of data collection and analysis. Your first assignment is therefore as follows:
- Choose a dataset that you wish to analyze. You may choose scientific data that you’ve personally collected; personal data that has been collected about you; statistical information about Clemson, South Carolina, the US, or the world; or a dataset of your choice. Some possibilities to get you started:
- Your Google or Facebook data (but keep in mind that Facebook data requests can take up to 48 hours to process): http://www.theguardian.com/news/datablog/2012/apr/22/download-your-data-google-facebook; https://www.facebook.com/help/212802592074644
- Your Twitter archive (Twitter data requests may also take several days to process): https://blog.twitter.com/2012/your-twitter-archive; https://support.twitter.com/articles/20170160-downloading-your-twitter-archive
- Data from recent United States Censuses: http://www.census.gov/2010census/data/
- The Million Song dataset (or, more accurately, the 10,000 song subset that’s available for free): http://labrosa.ee.columbia.edu/millionsong/
- One of the datasets from the R datasets package (there’s a wide range here): http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/00Index.html
- For more on these datasets and to download individual datasets: http://vincentarelbundock.github.io/Rdatasets/
- One of the datasets from the Gephi sample datasets (these are all social network datasets; also a wide range): https://wiki.gephi.org/index.php/Datasets
- If you want to visualize this data, you can download Gephi, a free visualization tool, here: http://gephi.github.io/features/ (although if you use OSX Yosemite be aware: https://lbartkowski.wordpress.com/2014/11/28/gephi-0-8-2-on-apple-osx-yosemite/)
- One of the many datasets available from Data.gov: http://www.data.gov/
- Or, of course, a dataset of your choosing. If you have an idea but you don’t know where to look, let me know.
- In a 4-5 page paper (double spaced, 1” margins, 12 pt font; 1200-1800 words), provide an analysis of that dataset. Your analysis should contain:
- A description of what your dataset contains; when, how, and by who (or what automated means) the data was collected; and any other relevant information for understanding and contextualizing the dataset.
- An analysis of how your dataset engages the themes and issues we’ve discussed in the course that incorporates at least one of the non-literary texts we have read so far in class.
The description of your dataset need not be longer than a page, which means that your analysis should constitute the majority of your paper. Your analysis should consist of an in-depth engagement with the themes and/or issues you choose to explore. These concepts include (but are not limited to):
- The nature of the dataset. Following Rosenberg, how is this dataset “rhetorical?” Can it be said to be true or factual? If so, how? If not, why not?
- The method of its collection. Are there ethical questions involved?
- The mode of its presentation. Is it public? Private? Should it be one or the other?
- Is there a hypothesis that the dataset seeks to prove, or is it more open-ended?
- What hypotheses do you have about the dataset? What information do you think the dataset might demonstrate, or contain?
- Does the dataset give you the full picture? What parts are missing?
- Who is the audience for the data? How might the audience have influenced the data’s collection?
These questions are intended to get you started thinking, not to provide an outline for your analysis. You are free to write about what specific aspects of your chosen dataset interest you most. Please be sophisticated and creative in analyzing your dataset. Your paper should be focused, specific, and it should integrate examples with analysis. You should demonstrate an awareness of the implications – and limitations – of your argument, and you should consider multiple perspectives when appropriate.
This assignment was originally developed by Professor Lauren Klein at Georgia Tech University.