Graham Wilcock: Text Annotation with OpenNLP and UIMA

The tutorial presents a practical overview of automatic linguistic annotation of texts using freely available open source tools. Text annotation typically involves tasks at several linguistic levels, such as sentence boundary detection, tokenization, part-of-speech tagging, phrase chunking, syntactic parsing, named entity recognition, coreference resolution, and semantic role labelling. Most of these tasks can be done with appropriate combinations of OpenNLP tools. Practical examples will show annotations of a short English text. OpenNLP tools can also be used in WordFreak as plugins. WordFreak provides an attractive, easy-to-use GUI for linguistic annotations. It is open source Java and platform-independent, and is convenient for manually correcting annotations made by the OpenNLP tools. However, Word-Freak creates annotations in its own speci\ufb01c XML stand-off annotation format. This raises the issue of interoperability. UIMA (Unstructured Information Management Architecture) provides solutions to many of the above issues. UIMA is open-source Java. It aims to support interoperability and scalability. Practical examples will show how to con\ufb01gure and use pipelines of OpenNLP tools in UIMA, and how to view the annotations in UIMA.