Once again, I used my science fiction corpus from Lab 3 to perform the experiments.
I stuck with the default interval of twenty topics we were given in our initial command line, since I tried five, ten, and fifteen topics in the Topic Modeling Jar experiment (Lab 5a) and it didn’t seem to make a huge amount of difference to me as far as the coherence of topics goes. I found them equally challenging to decipher, larger or smaller. I struggled to make sense of my given topics, and even now I only have a possible umbrella term for some, not all. But as Ted Underwood says in his blog post on topic modeling, “meaningfully ambiguous” topics tend to be more rewarding because you can do more with them than intuitive ones. They allow us to ask more questions.
One reason that my topics are hard to interpret is the inclusion of proper names. Both Underwood and Jockers (I believe) briefly discuss them. Underwood recommends removing them. It would be hard to do without a stoplist. Additionally, science fiction often uses made up words. For example, the words “belphin” and “belphins” appear in the final topic. The terms come from Evelyn E. Smith’s “The Blue Tower”, and since I haven’t read the story and it doesn’t have an extensive Wikipedia (or other database) article written for it, I can’t begin to tell you what it means. From excerpts, my best guess is that it’s a kind of alien or other foreign creature. A final issue with my mini-corpus and topic modeling is that the Gutenberg Project’s copyright and legal speak messages skew the results. Topic thirteen, which contains terms like “gutenberg”, “tm”, “electronic”, and “trademark”, makes up at least forty percent of five of my eight documents. It makes up ninety-six percent of Therese Windser’s “Longevity” just because the story itself is so short.
Interestingly, my texts seem to create unique topics. Other than topic thirteen, each text has a different numbered topic as their top, and they appear to contain a few words unique to each work, such as proper names. Still, I feel that these are useful because they can potentially tell us what individual works are concerned with, if not multiple works but with a dominant one present. For example, topic 9, which makes up 41% of Tenn William’s “The Men in the Walls”, contains words like “mankind”, “burrow”, “monster”, “trap”, and “alien”. While I struggle to come up with a singular word to contain all of these, to me the terms speak of a hunt or maybe fear. Perhaps some kind of struggle.
Finally, there is a somewhat comprehensive topic that appears in the top three of nearly all of my texts. Topic fourteen says, “man back time eyes men long knew made face great hand turned looked left heard hands told arms asked”. It doesn’t seem to make a whole lot of sense at first, but if I had to make a guess, it seems to be pointing to the five senses. Perhaps this tells up that senses are important to science fiction, or at least this collection of science fiction stories.