General:Corpus Creation Instructions - Mandala

From CWRC

1. Create a new folder in which to put your collection, and give it a meaningful name. Make sure there are no spaces in the name.

2. Assemble a collection of related works that you would like to study. Files can be in XML (.xml) or plain text (.txt) format. XML-encoded texts will allow for more functionality, but plain text will also work. Good sources for files are Project Gutenberg, available at http://www.gutenberg.org/ or Internet Archive, available at http://www.archive.org (To access all file formats on Internet Archive, click: All Files: HTTP.) To download a plain text (.txt) file, click on its link and then select the following menu option, depending on the browser you are using:

  • Internet Explorer: “File > Save As” or “Page > Save As”
  • Firefox or Chrome: “File > Save Page As”
  • Safari: “File > Save”
  • Opera: “Menu > Page > Save as” or “File > Save As”

XML files can usually be downloaded in the same way shown above for text files. Alternatively, clicking on them sometimes opens a dialog box asking whether you want to save the file to your computer; if that is the case, then save the file.

3. Put all the files you collected into the folder you created.

4. Open each file and delete headers and footers that contain metadata about the file, if they are present. The headers and footers contain supplementary material that you will likely not want to analyze; they are located before and after the beginning and end of the actual work, and are typically well marked with asterisks and/or a statement to the effect that the work is beginning or ending.

5. Rename each text file in a standard way, such as Date-Title-Author.txt. Make sure there are no spaces in the titles.

6. Zip the folder containing your files. To zip a folder using Windows XP or later, right-click on it and select “Send to > Compressed (Zipped) folder.” To zip a folder using Mac OS X 10.5 Leopard or later, right-click on it (or control-click on it if you only have one mouse button) and then select “Compress” from the menu that appears. If you have problems zipping the folder, bring your unzipped folder to the workshop and workshop staff will be happy to assist you.