Automatic word sense discrimination

作者: Hinrich Schütze

DOI:

关键词: Cluster analysisWord (computer architecture)Computer scienceArtificial intelligenceWord lists by frequencyNatural language processingSpace (commercial competition)Similarity (psychology)ClosenessSemEvalSemantic similarity

摘要: This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts the ambiguous word. Words, contexts, and senses represented in Word Space, high-dimensional, real-valued space which closeness corresponds to semantic similarity. Similarity Space is second-order co-occurrence: two tokens contexts) word assigned same sense cluster if words they co-occur with turn occur training corpus. The automatic unsupervised both application: induced from corpus without labeled instances or other external knowledge sources. demonstrates good performance discrimination for sample natural artificial words.

参考文章(62)
Jan O. Pedersen, Douglas R. Cutting, Per Christian Halvorsen, An object-oriented architecture for text retrieval. RIAO. pp. 285- 298 ,(1991)
Steven Finch, Finding structure in language The University of Edinburgh. ,(1995)
David Yarowsky, William A. Gale, Kenneth W. Church, Work on Statistical Methods for Word Sense Disambiguation ,(1992)
Gregory Grefenstette, Corpus-Derived First, Second and Third-Order Word Affinities Proceedings of the 6th EURALEX International Congress. pp. 279- 290 ,(1994)
D K Harman, The first text REtrieval conference (TREC-1) Special Publication (NIST SP) - 500-207. ,(1993) , 10.6028/NIST.SP.500-207
Karen Sparck Jones, Synonymy and Semantic Classification ,(1987)
Ellen M. Voorhees, Claudia Leacock, Geoffrey Towell, Towards building contextual representations of word senses using statistical models Corpus processing for lexical acquisition. pp. 97- 113 ,(1996)
Rebecca F. Bruce, Ted Pedersen, Distinguishing Word Senses in Untagged Text empirical methods in natural language processing. ,(1997)