Preprocessing and Tokenisation Standards in DELPH-IN Tools

作者: Bernd Kiefer , Ann A. Copestake , Benjamin Waldron , Ulrich Schäfer

DOI:

关键词: Minimal recursion semanticsXMLAnnotationMiddleware (distributed applications)PreprocessorProgramming languageRule-based machine translationComputer scienceInterface (Java)Parsing

摘要: We discuss preprocessing and tokenisation standards within DELPH-IN, a large scale open-source collaboration providing multiple independent multilingual shallow deep processors. (i) component-specific XML interface format which has been used for some time to preprocessor results the PET parser, (ii) our implementation of more generic influenced heavily by (ISO working draft) Morphosyntactic Annotation Framework (MAF). Our encapsulates information may be passed from stage parser: it uses standoff-annotation, lattice representation structural ambiguity, intra-annotation dependencies allows highly structured annotation content. This work builds on existing Heart Gold middleware system, previous Robust Minimal Recursion Semantics (RMRS) as part an inter-component interface. give examples usage with number DELPH-IN processing components grammars.

参考文章(16)
Kalina Bontcheva, Hamish Cunningham, Valentin Tablan, Diana Maynard, A framework and graphical development environment for robust NLP tools and applications. meeting of the association for computational linguistics. pp. 168- 175 ,(2002)
Witold Drozdzynski, Feiyu Xu, Hans-Ulrich Krieger, Jakub Piskorski, Ulrich Schäfer, Shallow Processing with Unification and Typed Feature Structures - Foundations and Applications. Künstliche Intell.. ,vol. 18, pp. 17- ,(2004)
Ulrich Schäfer, OntoNERdIE – Mapping and Linking Ontologies to Named Entity Recognition and Information Extraction Resources language resources and evaluation. pp. 1756- 1761 ,(2006)
Linguistic Computing, Lou Burnard, C. M. Sperberg-McQueen, Guidelines for electronic text encoding and interchange Text Encoding Initiative. ,(1994)
Ulrich Callmeier, Andreas Eisele, Melanie Siegel, Ulrich Schäfer, The DeepThought Core Architecture Framework language resources and evaluation. ,(2004)
Azim Roussanaly, Lou Burnard, Kiyong Lee, Lionel Clément, Éric Villemonte de la Clergerie, Harry Bunt, Claude Roux, Tomaz Erjavec, Laurent Romary, Syd Bauman, Thierry Declerck, Towards an international standard on feature structure representation language resources and evaluation. pp. 373- 376 ,(2004)
ULRICH CALLMEIER, PET – a platform for experimentation with efficient HPSG processing techniques Natural Language Engineering. ,vol. 6, pp. 99- 107 ,(2000) , 10.1017/S1351324900002369
Masayuki Asahara, Yuji Matsumoto, Extended models and tools for high-performance part-of-speech tagger international conference on computational linguistics. pp. 21- 27 ,(2000) , 10.3115/990820.990824
Ulrich Schäfer, WHAT: an XSLT-based infrastructure for the integration of natural language processing components north american chapter of the association for computational linguistics. pp. 9- 16 ,(2003) , 10.3115/1119226.1119228