University of Southern Denmark
World of VISL > Corpus Linguistics VISL - Visual Interactive Syntax Learning  
Syddansk Universitet
 
 
Picture of CorpusEye
Corpus search interface

 


    VISL's grammatical and NLP research are both largely corpus based. On the one hand, VISL develops taggers, parsers and computational lexica based on corpus data, on the other hand these tools - once functional - are used for the grammatical annotation of large running text corpora. The main methodological approach for automatic corpus annotation is Constraint Grammar (CG), a word based annotation method. Hybrid systems, making use of both function based phrase structure and dependency grammar, are used to create syntactic trees from CG output. VISL is involved in many aspects of corpus linguistics: corpus compilation, automatic corpus annotation, manual linguistic corpus revision, providing internet access for searching corpora and language specific corpus based linguistic research.


    The following is an overview over corpus annotation projects in VISL's various research languages:

    Language Corpus Type Size (words) Grammatical annotation Manual revision Partners/Projects
    Danish flag Corpus90/2000 News text, prose 2 x 26 Million PoS, morphology, syntax, CG-dep. 200.000 words DSL
    Danish flag Arboretum News text, prose 10 Million Treebank (TIGER-compatible) 200.000 words Nordic Treebank Network
    Danish flag dfk-Skalk Journal of Archeology 600.000 PoS, morphology, syntax, CG-dep. - Skalk
    Danish flag dfk-folketing Parliamentary debates 7 Million PoS, morphology, syntax, CG-dep. - Source: Folketing
    Portuguese flag Floresta Sintá(c)tica Newspaper 1 Million Treebank (TIGER-compatible) 65.000+ words Linguateca
    Portuguese flag CETEMPúblico Portuguese newspaper 192 Million PoS, morphology, syntax, CG-dep. cp. Floresta sintá(c)tica AC/DC project, Linguateca, Ref.: Público
    Portuguese flag CETENFolha Brazilian newspaper 24 Million PoS, morphology, syntax, CG-dep. cp. Floresta sintá(c)tica AC/DC project, Linguateca, Ref.: Folha de São Paulo
    Portuguese flag Europarl-pt Parliamentary debates 29 Million PoS, morphology, syntax, CG-dep. - Ref.: P. Koehn
    Portuguese flag Cartas-LR Historical letters to/by the editor 200.000 words PoS, morphology, syntax, treebank 10.000 words Ref.: Projeto para a História do Português Brasileiro
    Portuguese flag Various Dialectal speech data, Historical Portuguese 70.000 PoS, morphology, syntax, CG-dep. - (1) The CORDIAL-SIN project (2) The Tycho Brahe Project
    French flag Arboratoire/Freebank News text, prose 130.000 PoS, morphology, syntax, CG-dep. 30.000 words ATILF
    French flag ECI-FR1 Newspaper 4.4 Million PoS, morphology, syntax, CG-dep. - Ref.: Le Monde, ECI/EACL
    French flag Europarl-fr Parliamentary debates 29 Million PoS, morphology, syntax, CG-dep. - Ref.: P. Koehn
    German flag ECI-DE1 Newspaper (Frankfurter Rundschau) 34 Million PoS, morphology, syntax, CG-dep. - Ref.: Frankfurter Rundschau, ECI/EACL
    German flag BZK-tag Newspaper 4 Million PoS, morphology, syntax, CG-dep. - Bonner Zeitungskorpus
    German flag MAK-tag Newspaper 3 Million PoS, morphology, syntax, CG-dep. - Mannheimer Korpus
    German flag Europarl-de Parliamentary debates 29 Million PoS, morphology, syntax, CG-dep. - Ref.: P. Koehn
    British flag BNC-tag News text, prose 35 Million PoS, morphology, syntax, CG-dep. - Ref.: British National Corpus
    British flag Europarl-en Parliamentary debates 29 Million PoS, morphology, syntax, CG-dep. - Ref.: P. Koehn
    British flag KEMPE Early modern play texts 8.9 Million PoS, morphology, syntax, CG-dep. - University of Bristol
    Spanish flag ECI-ES2 Newspaper 1.4 Million PoS, morphology, syntax, CG-dep. - Ref.: El Diario Sur, ECI/EACL
    Spanish flag Europarl-es Parliamentary debates 29 Million PoS, morphology, syntax, CG-dep. - Ref.: P. Koehn
    Esperanto flag Monato News magazine 2 Million PoS, morphology, syntax, CG-dep. - Ref.: Monato
    Esperanto flag Eventoj Electronic News letter 1.6 Million PoS, morphology, syntax, CG-dep. - Ref.: Eventoj
    Esperanto flag Elibrejo Literature 7 Million PoS, morphology, syntax, CG-dep. - Ref.: eLibrejo
    Esperanto flag Zamenhof Esperanto Classics 1.5 Million PoS, morphology, syntax, CG-dep. - -
    Esperanto flag TTT Internet 3.6 Million PoS, morphology, syntax, CG-dep. - -
    Estonian flag Arborest News text, prose 3.500 Treebank (TIGER-compatible) at CG-level Nordic Treebank Network, Ref.: CG Annotated corpus of Estonian


    | Copyright 1996-2005 | Report a Problem / Contact Us | Visitor Questionnaire | Printable Version |