Chapter 6. Command Line Reference

Table of Contents

vislcg3
cg-conv
cg-comp
cg-proc
cg-strictify
cg3-autobin.pl

A list of binaries available and their usage information.

vislcg3

vislcg3 is the primary binary. It can run rules, compile grammars, and so on.

Usage: vislcg3 [OPTIONS]

Environment variable:
 CG3_DEFAULT: Sets default cmdline options, which the actual passed options will override.
 CG3_OVERRIDE: Sets forced cmdline options, which will override any passed option.

Options:
 -h, --help                 shows this help
 -?, --?                    shows this help
 -V, --version              prints copyright and version information
     --min-binary-revision  prints the minimum usable binary grammar revision
 -g, --grammar              specifies the grammar file to use for disambiguation
     --grammar-out          writes the compiled grammar in textual form to a file
     --grammar-bin          writes the compiled grammar in binary form to a file
     --grammar-only         only compiles the grammar; implies --verbose
     --ordered              (will in future allow full ordered matching)
 -u, --unsafe               allows the removal of all readings in a cohort, even the last one
 -s, --sections             number or ranges of sections to run; defaults to all sections
     --rules                number or ranges of rules to run; defaults to all rules
     --rule                 a name or number of a single rule to run
     --nrules               a regex for which rule names to parse/run; defaults to all rules
     --nrules-v             a regex for which rule names not to parse/run
 -d, --debug                enables debug output (very noisy)
 -v, --verbose              increases verbosity
     --quiet                squelches warnings (same as -v 0)
 -2, --vislcg-compat        enables compatibility mode for older CG-2 and vislcg grammars
 -I, --stdin                file to read input from instead of stdin
 -O, --stdout               file to print output to instead of stdout
 -E, --stderr               file to print errors to instead of stderr
     --no-mappings          disables all MAP, ADD, and REPLACE rules
     --no-corrections       disables all SUBSTITUTE and APPEND rules
     --no-before-sections   disables all rules in BEFORE-SECTIONS parts
     --no-sections          disables all rules in SECTION parts
     --no-after-sections    disables all rules in AFTER-SECTIONS parts
 -t, --trace                prints debug output alongside normal output; optionally stops execution
     --trace-name-only      if a rule is named, omit the line number; implies --trace
     --trace-no-removed     does not print removed readings; implies --trace
     --trace-encl           traces which enclosure pass is currently happening; implies --trace
     --deleted              read deleted readings as such, instead of as text
     --dry-run              make no actual changes to the input
     --single-run           runs each section only once; same as --max-runs 1
     --max-runs             runs each section max N times; defaults to unlimited (0)
     --profile              gathers profiling statistics and code coverage into a SQLite database
 -p, --prefix               sets the mapping prefix; defaults to @
     --unicode-tags         outputs Unicode code points for things like ->
     --unique-tags          outputs unique tags only once per reading
     --num-windows          number of windows to keep in before/ahead buffers; defaults to 2
     --always-span          forces scanning tests to always span across window boundaries
     --soft-limit           number of cohorts after which the SOFT-DELIMITERS kick in; defaults to 300
     --hard-limit           number of cohorts after which the window is forcefully cut; defaults to 500
 -T, --text-delimit         additional delimit based on non-CG text, ensuring it isn't attached to a cohort; defaults to /(^|\n)</s/r
 -D, --dep-delimit          delimit windows based on dependency instead of DELIMITERS; defaults to 10
     --dep-absolute         outputs absolute cohort numbers rather than relative ones
     --dep-original         outputs the original input dependency tag even if it is no longer valid
     --dep-allow-loops      allows the creation of circular dependencies
     --dep-no-crossing      prevents the creation of dependencies that would result in crossing branches
     --no-magic-readings    prevents running rules on magic readings
 -o, --no-pass-origin       prevents scanning tests from passing the point of origin
     --split-mappings       keep mapped readings separate in output
 -e, --show-end-tags        allows the <<< tags to appear in output
     --show-unused-sets     prints a list of unused sets and their line numbers; implies --grammar-only
     --show-tags            prints a list of unique used tags; implies --grammar-only
     --show-tag-hashes      prints a list of tags and their hashes as they are parsed during the run
     --show-set-hashes      prints a list of sets and their hashes; implies --grammar-only
     --dump-ast             prints the grammar parse tree; implies --grammar-only
 -B, --no-break             inhibits any extra whitespace in output
    

cg-conv

cg-conv converts between stream formats. It can currently convert from any of CG, Niceline CG, Apertium, HFST/XFST, and plain text formats, turning them into CG, Niceline CG, Apertium, or plain text formats. By default it tries to auto-detect the input format and convert that to CG. Currently only meant for use in a pipe.

Usage: cg-conv [OPTIONS]

Environment variable:
 CG3_CONV_DEFAULT: Sets default cmdline options, which the actual passed options will override.
 CG3_CONV_OVERRIDE: Sets forced cmdline options, which will override any passed option.

Options:
 -h, --help          shows this help
 -?, --?             shows this help
 -p, --prefix        sets the mapping prefix; defaults to @
 -u, --in-auto       auto-detect input format (default)
 -c, --in-cg         sets input format to CG
 -n, --in-niceline   sets input format to Niceline CG
 -a, --in-apertium   sets input format to Apertium
 -f, --in-fst        sets input format to HFST/XFST
 -x, --in-plain      sets input format to plain text
     --add-tags      adds minimal analysis to readings (implies -x)
 -C, --out-cg        sets output format to CG (default)
 -A, --out-apertium  sets output format to Apertium
 -F, --out-fst       sets output format to HFST/XFST
 -M, --out-matxin    sets output format to Matxin
 -N, --out-niceline  sets output format to Niceline CG
 -X, --out-plain     sets output format to plain text
 -W, --wfactor       FST weight factor (defaults to 1.0)
     --wtag          FST weight tag prefix (defaults to W)
 -S, --sub-delim     FST sub-reading delimiters (defaults to #)
 -r, --rtl           sets sub-reading direction to RTL (default)
 -l, --ltr           sets sub-reading direction to LTR
 -o, --ordered       tag order matters mode
 -D, --parse-dep     parse dependency (defaults to treating as normal tags)
     --unicode-tags  outputs Unicode code points for things like ->
     --deleted       read deleted readings as such, instead of as text
 -B, --no-break      inhibits any extra whitespace in output
    

cg-comp

cg-comp is a lighter tool that only compiles grammars to their binary form. It requires grammars to be in Unicode (UTF-8) encoding. Made for the Apertium toolchain.

USAGE: cg-comp grammar_file output_file
    

cg-proc

cg-proc is a grammar applicator which can handle the Apertium stream format. It works with binary grammars only, hence the need for cg-comp. It requires the input stream to be in Unicode (UTF-8) encoding. Made for the Apertium toolchain.

USAGE: cg-proc [-t] [-s] [-d] [-g] [-r rule] grammar_file [input_file [output_file]]

Options:
        -d:      morphological disambiguation (default behaviour)
        -s:      specify number of sections to process
        -f:      set the format of the I/O stream to NUM,
                   where `0' is VISL format, `1' is
                   Apertium format and `2' is Matxin (default: 1)
        -r:      run only the named rule
        -t:      print debug output on stderr
        -w:      enforce surface case on lemma/baseform
                   (to work with -w option of lt-proc)
        -n:      do not print out the word form of each cohort
        -g:      do not surround lexical units in ^$
        -1:      only output the first analysis if ambiguity remains
        -z:      flush output on the null character
        -v:      version
        -h:      show this help
    

cg-strictify

cg-strictify will parse a grammar and output a candidate STRICT-TAGS line that you can edit and then put into your grammar. Optionally, it can also output the whole grammar and strip superfluous LISTs along the way.

Usage: cg-strictify [OPTIONS] <grammar>

Options:
 -?, --help       outputs this help
 -g, --grammar    the grammar to parse; defaults to first non-option argument
 -o, --output     outputs the whole grammar with STRICT-TAGS
     --strip      removes superfluous LISTs from the output grammar; implies -o
     --secondary  adds secondary tags (<...>) to strict list
     --regex      adds regular expression tags (/../r, <..>r, etc) to strict list
     --icase      adds case-insensitive tags to strict list
     --baseforms  adds baseform tags ("...") to strict list
     --wordforms  adds wordform tags ("<...>") to strict list
     --all        same as --strip --secondary --regex --icase --baseforms --wordforms
    

cg3-autobin.pl

A thin Perl wrapper for vislcg3. It will compile the grammar to binary form the first time and re-use that on subsequent runs for the speed boost. Accepts all command line options that vislcg3 does.