References and credits
Live grammatical analysis in the Portuguese system is based on amulti-level Constraint Grammar parser (PALAVRAS), developed by Eckhard Bick in a Ph.D. project framework.
The morphological analyzer is based on a lexicon of 50.000 base forms, and its output is processed by some 5.000 Constraint Grammar rules for morphological, syntactic (and - in part - semantic) disambiguation. For a description of the system, see "Eckhard Bick, The Parsing System Palavras - Automatic
Grammatical Analysis of Portuguese in a Constraint Grammar Framework , Århus 2000".
For an introduction to Constraint Grammar theory, see "Fred Karlsson et.al., Constraint Grammar: a language-independent system for parsing unrestricted text, Berlin 1995". The present version of the system uses the CG-2 rule compiler developed and licensed by Pasi Tapanainen.
The hand-tagged closed corpus is a joint effort of the Portuguese team, supervised by Eckhard Bick. The newspaper corpus treebank (Floresta Sintá(c)tica) is a joint venture with Diana Santos and her team at Linguateca. The treebank is based on data from the CETEMPúblico and CETEMFolha corpora, and consists of manually proof-read PALAVRAS-output. Linguistic revision has primarily been performed by Susana Afonso and Raquel Marchi.