Lab 7: Data Visualization
Thanks to Mia Ridge for the original version of this lab.
Over the course of the semester, we have used various data visualization techniques in our other labs without really thinking much about it. We’re now going to take a step back and think in more detail about data visualization itself, and about what different data visualization techniques and types can tell us, and what they can’t. The goal of Lab 7 is to reflect on different kinds of data visualizations and to gain some hands-on experience with data visualization tools.
NOTE: A helpful resource for this lab is Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky’s “A Tour Through the Visualization Zoo.”
- Create a post on our course site for Lab 7 (categorize it under “Lab 7”). This is where you will post the souvenirs you collect from the lab and where you will write your lab report.
NOTE: The written component of this lab report is built into the lab itself (rather than appearing as a separate step at the end). Steps 2 – 5 of the lab ask you to experiment with various visualization tools and then to write up short analyses of these tools. Shoot for about 1 paragraph for each analysis. The TOTAL word count for this post — including all of these separate analyses — should therefore be about 500-750 words, although honestly I think 500 words is a bit slim for this lab.**
- Comparing N-gram tools: In Lab 2, we experimented with Voyant, which has a built-in word frequency visualization tool. Another popular word frequency visualization tool is Google’s N-grams tool. Historian Ben Schmidt has also created a similar word frequency analysis tool. This part of the lab asks you to compare these two tools.
- Think of two words or phrases you’d like to compare over time.
- Open two browser windows.
- In one, go to http://books.google.com/ngrams.
- In the other, go to http://benschmidt.org/OL/.
- Enter your chosen words or phrases in each. In the Bookworm tab, make sure you are searching for two words/phrases (i.e., you should see two lines on the graph), and that you are searching for books written in English (UK and USA). Also make sure that you remove any subject or book categories.
- Take a screenshot of the time-series graph created with each tool. Upload these screenshots to your Lab 7 post with brief explanations of what each graph depicts.
- Write up a brief analysis and comparison of these two visualizations in your lab report. These graphs are both showing the frequency of your chosen words/phrases over time, but what differences do you see? How are these graphs/tools different? Why might these differences be important? More broadly, how might humanists utilize and/or re-imagine time series graphs to foreground humanistic, interpretive principles, as Drucker and Klein argue for?
Google Ngram tips: http://books.google.com/ngrams/info
Bookworm tips: click the ‘cog’ icon next to the ‘i’ to change the time period or click the
underlined words next to the search term to change which books are searched (e.g.
subject, language, country, gender of author). You can also compare the same word or
phrase in a different corpus.
- Visualizing topic models: In Labs 5a and 5b, we experimented with topic modeling. Lexos allows you to visualize topics in an interesting and easy way. This part of the lab asks you to use Lexos to visualize topic modeling data.
- Our class Box folder (in the “Lab 7” folder) contains a file called “sotu-word-topic-counts.txt.” This is a kind of file that MALLET can give you. It is a list of the words in the State of the Union Addresses corpus and all of the topics to which they are assigned. Download this file and save it on your computer in some place that is easily accessible (like your Desktop).
- Go to Lexos.
- Under “Visualize,” select “Multicloud.”
- Click the slider under “Multicloud Options” so that “Topic Clouds” is displayed.
- Click the box that reads “Convert Topics to Documents.”
- Upload the “sotu-word-topic-counts.txt” file by clicking “Upload File.”
- Click “Get Graphs.”
- Take a screenshot of at least one of the topic word clouds and upload it to your lab report. In your Lab 7 post, write a brief analysis of what this tool allows you to see and why this might be useful. What is this a visualization of? And what are the advantages/disadvantages of this kind of visualization? Does the visualization open up new questions about the data? More broadly, are word clouds “humanistic” graphical displays, according to how Drucker and/or Klein uses that term? Why or why not? How might we re-imagine this form of graphical display to foreground humanistic, interpretive principles?
NOTE: If you want a challenge, you can create a word-topics count file using MALLET yourself. Create a topic model of your mini-corpus (like we did in Lab 5b). To get a word-topics count file, include this command in your train topics command:
(That is a double dash before “word.” Remember, every MALLET command starts with a double dash.)
- Network visualizations: In Lab 6, we experimented with network analysis. Let’s look at network visualizations more closely.
- In your browser, go to http://fredbenenson.com/2012/12/05/the-data-behind-my-ideal-bookshelf/
- Scroll down the page to the network graph.
Take a few minutes to explore the visualisation: try holding the cursor over
items, clicking, dragging, etc.
- In your Lab 7 post, write a brief analysis of this visualization. Is it clear what it’s for? Is it intuitive to use? What does “intuitive” mean in this case? Does interacting with the network graph give you more or less information than the other representations of the data further down on the same page? Does the visualization open up new questions about the data? More broadly, how might humanists utilize and/or re-imagine time series graphs to foreground humanistic, interpretive principles, as Drucker and Klein argue for?
- Creating your own simple visualizations using Google Fusions: Google Fusions gives you lots of options for easily creating different kinds of visualizations with your own data.
- Our class Box folder (in the “Lab 7” folder) contains a file called “Tate_artists_percountry.xlsx” based on data from the UK’s Tate art museums. Download this file and save it on your computer in some place that is easily accessible (like your Desktop).
- Go to https://drive.google.com/ and log into your Google account (contact me if you don’t have a Google account).
Go to http://bit.ly/Xw0zNJ (or
https://www.google.com/fusiontables/data?dsrcid=implicit) to access Fusion
Tables from your account.
You should see a screen “Import new table” with the option called “From this
computer” highlighted .
- Click “Choose file” and select Tate_artists_percountry.xlsx. Click “Next.”
Click “Next” on the next screen, then click “Finish” on the following screen.
If you want to fill in the options on the Import screen you can update them as follows: untick “Allow export.” For “Attribute data to” put “Tate” and for Attribution page link put
- If you want to fill in the options on the Import screen you can update them as follows: untick “Allow export.” For “Attribute data to” put “Tate” and for Attribution page link put
The screen should load in “Row” view, which looks something like a spreadsheet
with two columns.
At the end of the row of menu options, there should be a red box with a plus sign
in it. Click that, then select “Add chart.”
Scroll down the left-hand side to find the Pie Chart option. Click the Pie Chart
On the “Configure pie chart screen,” check that the Category is set to
CountryOfBirth and Value is set to Number of artists.
- Change “Maximum slices” to 80.
- Click “Done'”(over on the right-hand side), you should have a pie chart of your data!
- Take a screenshot of this pie chart and upload it to your Lab 7 post. Include a brief description of what information this visualization is displaying. Explore some of the other chart and graph options Google Fusions gives you, and then write up a brief analysis of how we could use Google Fusions in our final class project. What kind(s) of data could we visualize using this tool? More importantly, why might we want to do it? And more broadly, is there a way to use Google Fusions to foreground humanistic, interpretive principles, as Drucker and Klein argue for?