作者: Hinrich Schütze
DOI:
关键词: Cluster analysis 、 Word (computer architecture) 、 Computer science 、 Artificial intelligence 、 Word lists by frequency 、 Natural language processing 、 Space (commercial competition) 、 Similarity (psychology) 、 Closeness 、 SemEval 、 Semantic similarity
摘要: This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts the ambiguous word. Words, contexts, and senses represented in Word Space, high-dimensional, real-valued space which closeness corresponds to semantic similarity. Similarity Space is second-order co-occurrence: two tokens contexts) word assigned same sense cluster if words they co-occur with turn occur training corpus. The automatic unsupervised both application: induced from corpus without labeled instances or other external knowledge sources. demonstrates good performance discrimination for sample natural artificial words.