Comparing N-grams

In previous labs, we have had the opportunity to work with Google N-grams and, possibly, Ben Schmidt’s Bookworm. For this lab, I chose to compare the words ‘awful’ and ‘fantastic.’ We can see in the graphs below that Bookworm and Google N-grams visualize the data similarly with a line graph depicting word use over time — Google uses words per million while Bookworm uses frequency percentages, but both values show frequency. In terms of the shape of the lines, we can see that both graphs show that ‘awful’ is much more used within English language literature. However, Google shows a small segment of time (~1935-1975) when ‘fantastic’ is found more frequently. This most likely can attributed to the differences between the two corpora. As we have discussed in class, these two visualizations depict data in a more positivist manner. Though this might be a problem for humanities studies, I believe that this form of visualization serves its purpose in an easily understood and used manner.


Graph 1 – Bookworm N-gram


Graph 2 – Google N-gram

Topic Modelling – Multicloud

We have worked through a few different ways to find topics and visualize them. The modelling tool for this lab, Lexos, can visualize topics in multiple forms. As can be seen from Graph 3, Lexos multicloud visualization presents topics and compares their use with size and color (the larger the word, the closer on the spectrum to yellow it is). These two topics, 17 and 2 within the State of the Union document provided for this lab, vary quite a bit in terms of the words found within, but maintain a similar aesthetic. Though this may not help distinguish the topics at first glance, it does provide some feeling of continuity between two very different topics. As far as the humanistic value of these word clouds, I find that they seem much less quantitative than most other visualizations we’ve used. However, I believe that they lack data to analyse. I know that there are numeric values given to each word, but I would need to find that through other means. Therefore, I think that the biggest improvement to this particular word cloud would be interactivity; let the user hover or click on a particular word to see its frequency and, possibly, the relative location within the documents.


Cloud 1 – Lexos Multicloud

Network Visualization

Network visualization is my favorite form of data visualization that we have worked with this semester. This particular instance found in Fred Benenson’s “The Data Behind My Ideal Bookshelf” is a beautiful representation of data. Each node represents a contributor’s bookshelf and can be moved and inspected to see how the node interacts with the entire network. The effect a node can have on other nodes is a powerful tool that can depict the kind of data that Klein and Drucker seem to be looking for. It’s almost like we’re not seeing data, but a group of particles that are attracted to one another in varying ways. While this seems to remove the self from the analysis, it actually puts us in a position to view data as something that is more like the people it tries to represent.


Network 1 – Contributor Shelves

Simple Pie Charts

The pie chart below depicts the input data about as simplistically as possible — each country is a percentage of the whole. More than each of the other data visualizations seen above, charts like this are built for the sciences. The data is clearly quantitative and we know little to nothing about the artists. The tool used to create this pie chart, Google Fusions, is quite powerful and has the ability to transform data into multiple visualization types. It is useful to track changes over time or to depict specific quantitative data. I believe that Fusions can be used in our final project, but it should be used to inform a different visualization rather than be the tool we use to track, display, and analyze the main point of our study/corpus.


Chart 1 – Countries and Authorship