General:Corpus Creation Instructions


These are instructions for creating a corpus to use with Voyant and Mandala.

1. Create a new folder in which to put your collection, and give it a meaningful name. Although Voyant does not require it, make sure there are no spaces in the name if you want this collection to work with Mandala as well.

2. Assemble a collection of between one and six related works, e.g. texts by the same author, or ones that are thematically, generically, or chronologically related. (In the case of short works such as poems a larger collection would be acceptable.) Voyant accepts files in plain text (.txt), HTML (.html), XML (.xml), MS Word (.doc, .docx), RTF (.rtf), or PDF (.pdf). Mandala only accepts files in XML (.xml) and plain text (.txt), so if you want to use your collection to be useful with both Voyant and Mandala, you should restrict yourself to these two formats. XML files allow for some added functionality in comparison to plan text, but they can be difficult to find, and some will be provided to you at the workshop. Plain text (.txt) is the safest and easiest option, and it is fine if all your texts are in this format. Good sources for files are Project Gutenberg, available at or Internet Archive, available at (To access all file formats on Internet Archive, click: All Files: HTTP.) To download a plain text (.txt) file, click on its link and then select one of the following menu options, depending on the browser you are using:

  • Internet Explorer: “File > Save As” or “Page > Save As”
  • Firefox or Chrome: “File > Save Page As”
  • Safari: “File > Save”
  • Opera: “Menu > Page > Save as” or “File > Save As”

(XML files can usually be downloaded in the same way shown above for text files. Alternatively, clicking on them sometimes opens a dialog box asking whether you want to save the file to your computer; if that is the case, then save the file. Don't worry if you cannot find appropriate XML files - they will be provided at the workshop. However, if you have your own XML files that you wish to work on, then by all means feel free to bring them along.)

3. Put all the files you collected into the folder you created.

4. Open each file and delete headers and footers that contain metadata about the file, if they are present. The headers and footers contain supplementary material that you will likely not want to analyze; they are located before and after the beginning and end of the actual work, and are typically well marked with asterisks and/or a statement to the effect that the work is beginning or ending.

5. Rename each file in the following way: Date-Title-Author.txt. This will allow you to view your texts historically in Voyant. Make sure there are no spaces in the names if you want this collection to work with Mandala as well.

6. Zip the folder containing your files. To zip a folder using Windows XP or later, right-click on it and select “Send to > Compressed (Zipped) folder.” To zip a folder using Mac OS X 10.5 Leopard or later, right-click on it (or control-click on it if you only have one mouse button) and then select “Compress” from the menu that appears. If you have problems zipping the folder, bring your unzipped folder to the workshop and workshop staff will be happy to assist you.