EMD – Lab 5a
This screenshot is what I got after running the “Learn Topics” feature. First, I added the corpus that I gathered for lab 3 (adventure_plain-text) into the “Select Input File or Dir” section and left the “Select Output Dir” as it was, since I wanted my files to be stored in my downloads folder. I changed my number of topics to 15.
Next, I went to the “advanced” tab and changed the number of topic words printed to 20 and removed stop words. I left the stop word file as “Mallet Default.”
After I received my output files in my downloads folder, I checked out the output_csv folder and remembered that I could view the information found in this folder inside another output folder. Therefore, I opted to use the folder found in the next screenshot instead of the csv folder.
This is the output_html folder. It contains folders for “Docs” and “Topics,” which holds all of the information that is also in the csv folder, but in an easier-to-read format.
This shows what pops up when I open the “Docs” folder. Since there are eight plain text files in my corpus, there are eight documents in this folder.
This screenshot is what I see when I open the Doc1 html file. It shows the weight (in percentages) of each of the top topics in this specific document.
This is what I see when I open the “Topics” folder. Since I asked for 15 topics, there are 15 html files.
This screen pops up when I open the Topic1 html file. In short, it shows which documents contain the highest word frequency for each of my 15 top topics. In this case, it shows which documents are important for the topic: blood mr peter colonel lord pitt bishop doctor eyes voice captain governor lips men town nuttall ll king lordship english …