Adaptive Sentence Boundary Disambiguation

作者: David D. Palmer , Marti A. Hearst

DOI: 10.3115/974358.974376

关键词:

摘要: Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-of-speech tagging and alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars exception rules. As an alternative, we have developed efficient, trainable algorithm that uses lexicon with probabilities feed-forward neural network. This work demonstrates the feasibility using prior assignments, as opposed words or definite contextual information. After training less than one minute, method correctly labels over 98.5% in corpus 27,000 sentence-boundary marks. We show be efficient easily adaptable different text genres, single-case texts.

参考文章(9)
Humphrey Sm, Research on Interactive Knowledge-Based Indexing: The MedIndEx Prototype. annual symposium on computer application in medical care. pp. 527- 533 ,(1989)
William A. Gale, Kenneth W. Church, A program for aligning sentences in bilingual corpora Computational Linguistics. ,vol. 19, pp. 75- 102 ,(1993) , 10.5555/972450.972455
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)
Michael D. Riley, Some applications of tree-based modelling to speech and language Proceedings of the workshop on Speech and Natural Language - HLT '89. pp. 339- 352 ,(1989) , 10.3115/1075434.1075492
Doug Cutting, Julian Kupiec, Jan Pedersen, Penelope Sibun, A Practical Part-of-Speech Tagger conference on applied natural language processing. pp. 133- 140 ,(1992) , 10.3115/974499.974523
Kenneth Ward Church, A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text conference on applied natural language processing. pp. 136- 143 ,(1988) , 10.3115/974235.974260
Anders Krogh, Richard G. Palmer, John Hertz, Introduction To The Theory Of Neural Computation ,(1991)
Martin Röscheisen, Martin Kay, Text-translation alignment Computational Linguistics. ,vol. 19, pp. 121- 142 ,(1993)