Lab 5a
Topic Modeling
For Lab 5a, I used the corpus that I created in Lab 3 in the Topic Modeling program. Through this program, I was able to discover the main topics in the corpus and in which text they were most prevalent.
- In this souvenir, I had just inserted my plain text corpus into the program and clicked “Advanced” in order to expand the results. By changing the number of topic words printed from 10 to 15, the topics will have more words in them.
- This screenshot is after I clicked “Learn Topics.” Here, you see 10 topics found in the corpus with 15 words in them each.
- This screenshot is of the folder that was placed in my desktop after clicking Learn Topics. It shows different functions that you can do with the new data collected.
- This is a screenshot of the result of clicking all_topics.html in the folder. It brings the topics to an html file in Google Chrome. This way of viewing the topics is much more accessible because the links take you directly to information about which pieces in the corpus show the topic the most.
- After clicking on topic 2, the model shows that the text Moonfleet has the most words in this topic. Ayesha has the second highest usage of this topic. This is a very helpful tool because it is quick and easy to navigate in the html format. It is easy to see which topics are dominant in which texts.