PDF version of this assignment.

Lab 5a: Topic Modeling with pre-packaged tools

Thanks to Alan Liu for the original version of this lab.

The goal of this lab is to experiment with pre-packaged tools for topic modeling.

  1. Create a post on our course site for Lab 5a (categorize your post under “Lab 5a”).

  2. Experiment with David Mimno’s online In-Browser Topic Modeling, which works with a pre-set document corpus consisting of State of the Union addresses. We will work through this step together in class.

  3. Then download the Topic Modeling Tool, which is a Java-based implementation of the well-known MALLET topic modeling tool (see the course calendar for download instructions). Try it on some texts. If you completed Lab 3, you should use the mini-corpus you created as part of that Lab. If you didn’t complete Lab 3, you should use the State of the Union speeches corpus, which is located on our Box drive (in the “Lab 4” folder). Include some souvenirs of your experiments in your 5a post. If you’re not sure what I mean by “souvenirs,” take a look at what students at UCSB in Alan Liu’s undergraduate digital methods class have done here.

4. You should try to explain/describe in your own words what each souvenir that you post means (just like you did for Lab 2). For example, if you include a screenshot of a list of topics, you should try to explain, as concisely as possible, what this list represents/means/describes. These explanations/descriptions do not need to be long (1-3 sentences is fine); they should simply describe what is going on in each souvenir that you leave, in your own words.

  1. The final step in this lab is meant to prepare you for working with MALLET, which we will do on Wednesday in class. MALLET requires you to work with the command line. We will be going over some command line exercises in class together on Monday, February 29 to prepare for this and to get more comfortable with the command line. If you were not in class on Monday, I highly recommend you read and follow along with this basic command line tutorial from the Praxis Lab at the University of Virginia before class on Wednesday, March 2 (when you will have to use the command line).