First Screenshot

This screenshot is what I got after running the “Learn Topics” feature. First, I added the corpus that I gathered for lab 3 (adventure_plain-text) into the “Select Input File or Dir” section and left the “Select Output Dir” as it was, since I wanted my files to be stored in my downloads folder. I changed my number of topics to 15.

Second Screenshot

Next, I went to the “advanced” tab and changed the number of topic words printed to 20 and removed stop words. I left the stop word file as “Mallet Default.”

Third Screenshot

After I received my output files in my downloads folder, I checked out the output_csv folder and remembered that I could view the information found in this folder inside another output folder. Therefore, I opted to use the folder found in the next screenshot instead of the csv folder.

Fourth Screenshot

This is the output_html folder. It contains folders for “Docs” and “Topics,” which holds all of the information that is also in the csv folder, but in an easier-to-read format.


This shows what pops up when I open the “Docs” folder. Since there are eight plain text files in my corpus, there are eight documents in this folder.

Fifth Screenshot

This screenshot is what I see when I open the Doc1 html file. It shows the weight (in percentages) of each of the top topics in this specific document.


This is what I see when I open the “Topics” folder. Since I asked for 15 topics, there are 15 html files.

Sixth Screenshot

This screen pops up when I open the Topic1 html file. In short, it shows which documents contain the highest word frequency for each of my 15 top topics. In this case, it shows which documents are important for the topic: blood mr peter colonel lord pitt bishop doctor eyes voice captain governor lips men town nuttall ll king lordship english …