Lab 3 consisted of gathering plain text files and creating a corpus. Throughout the lab and in class we learned the different kinds of form that text can take. We learned plain text is a type of text that is easily processed for computer, and that is what we were supposed to focus on gathering. After getting all of the plain text files, we created a corpus. The corpus is good for arranging the files and creating a structure. It is helpful to create so that you don’t just have a bunch of plain text files. The format is easy to make sense of and good for organizing what you have.
Literary data comes from the plain text. All of the fields we filled in in our corpus were data that was easy to find on the Internet. Plain text data varies from traditional forms of text (like the printed pages on a book) because of how it can be used and how it is processed and also consumed. The computer recognizes each character and is able to work with it like that instead of in one whole image, such a picture of a page that’s been copied.
Before this lab, I didn’t know what plain text or a corpus was, so I learned what both of those things are. I learned about how to find plain text, as there were several resources on our course website and Project Gutenburg which we used for all of our plain text files. I learned how to save plain text files to my computer using an app called TextEdit, which I had not worked with before. I also learned what a corpus was and what all the different fields represented in the corpus.
Some challenges of using literary data could possibly be that it can be tedious. If I thought gathering the data and filling out the corpus was tedious and repetitive, I can only imagine how tedious it must be to actually type up whole books and volumes to create these plain text files in the first place. It is good that we have them and are able to use them to distribute different texts, though, so that would be an advantage. It is challenging to learn how to create all of these things and work to create the corpus and everything, but once you get the hang of it, it’s extremely easy.
Overall, in this lab I learned a lot about literary data. I learned how to find plain text files, what they are, how they are used, and why they are used. I learned what a corpus is and how to create one. In addition to that, I learned about some challenges of gathering information this way, as well as the advantages. Corpuses are great for organizing data and Project Gutenburg is a great resource to find it. I learned a lot about literary data throughout this lab that I had no knowledge of at all prior to doing this lab.