作者: David D. Palmer , Marti A. Hearst
关键词:
摘要: Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-of-speech tagging and alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars exception rules. As an alternative, we have developed efficient, trainable algorithm that uses lexicon with probabilities feed-forward neural network. This work demonstrates the feasibility using prior assignments, as opposed words or definite contextual information. After training less than one minute, method correctly labels over 98.5% in corpus 27,000 sentence-boundary marks. We show be efficient easily adaptable different text genres, single-case texts.