University of Southern Denmark
World of VISL > Constraint Grammar  VISL - Visual Interactive Syntax Learning  
Syddansk Universitet
 
 

Constraint

wind chime

Grammar

 

 


    Constraint Grammar (CG) parsers are at the core of most of VISL's live applications. The Constraint Grammar concept was launched by Fred Karlsson in the early 90'ies (Karlsson et.al.1995), and CG parsers have since been written for a large variety of languages, routinely achieving F-scores for PoS (word class) of over 99%. A number of syntactic CG systems have reported F-scores of around 95%. VISL's own Constraint Grammar systems are inspired by Eckhard Bick's PALAVRAS parser for Portuguese (Bick 2000), and use, as a novelty, subclause function, generalized dependency markers and semantic prototype tags. For most languages, a lexicon based morphological analyzer provides input to the first CG level, while the output of the last CG-level can be converted into syntactic tree structures by specially designed Phrase Structure Grammars (PSG's), using syntactic functions, not words, as terminals. Other, hybrid combinations are, however, feasible. Thus, the French system uses PoS information from a probabilistic tagger. Constraint Grammar (CG) is a methodological paradigm for Natural Language Parsing (NLP). Linguist-written, context dependent rules are compiled into a grammar that assigns grammatical tags ("readings") to words or other tokens in running text. Typical tags address lemmatisation (lexeme or base form), inflexion, derivation, syntactic function, dependency, valency, case roles, semantic type etc. Each rule either adds, removes, selects or replaces a tag or a set of grammatical tags in a given sentence context. Context conditions can be linked to any tag or tag set of any word anywhere in the sentence, either locally (defined distances) or globally (undefined disances). Context conditions in the same rule may be linked, i.e. conditioned upon each other, negated or blocked by interfering words or tags. Typical CG's consist of thousands of rules, that are applied set-wise in progressive steps, covering ever more advanced levels of analysis. Within each level, safe rules are used before heuristic rules, and no rule is allowed to remove the last reading of a given kind, thus providing for a hight degree of robustness.

    The following is an overview over VISL's different CG systems

    Language
    Parser
    Lexicon
    Analyzer
    Grammar
    Levels
    Applications
    Danish flag
    DanGram
    100.000 lexemes, 40.000 names
    Full
    8.000 rules
    morph., syntax, dep., psg, case roles
    Teaching, corpus annotation, MT, Spell/Grammar checker, QA-systems, NER
    Portuguese flag
    PALAVRAS
    70.000 lexemes, 15.000 names
    Full
    7.500 rules
    morph., syntax, dep., psg
    Teaching, corpus annotation, MT, QA-systems, NER
    Spanish flag
    HIS-PALAVRAS
    60.000 lexemes
    Full
    4.500 rules
    morph., syntax, dep., psg
    Teaching, corpus annotation
    British flag
    EngCG
    160.000 sem
    Full (Lingsoft)
    LS+700 rules
    morph. / syntax (Lingsoft), subclause, psg
    Teaching, corpus annotation
    French flag
    FrAG
    57.000 lexemes
    DTT (Schmid & Stein) + analysis
    1.400 rules
    morph.-correction, syntax, dep., psg
    Teaching, corpus annotation
    German flag
    GerGram
    25.000 val/sem
    Full (Lingsoft)
    LS+1.300 rules
    morph. (Lingsoft), syntax, dep., psg
    Teaching, corpus annotation
    Esperanto flag
    EspGram
    30.000 lexemes
    Full
    2.600 rules
    morph., syntax, dep.
    Teaching, corpus annotation, MT





    | Copyright 1996-2005 | Report a Problem / Contact Us | Visitor Questionnaire | Printable Version |