Dimensions of meaning

作者: H. Schutze

DOI: 10.5555/147877.148132

关键词:

摘要: The representation of documents and queries as vectors in a high-dimensional space is well-established information retrieval. author proposes that the semantics words contexts text be represented vectors. dimensions are initial determined by occurring close to entity represented, which implies has several thousand (words). This makes vector representations (which dense) too cumbersome use directly. Therefore, dimensionality reduction means singular value decomposition employed. analyzes structure applies them word sense disambiguation thesaurus induction. >

参考文章(13)
Douglas Lenat, Ramanathan V. Guha, Building large knowledge-based systems ,(1989)
Patrick Hanks, Kenneth Ward Church, Word association norms, mutual information, and lexicography Computational Linguistics. ,vol. 16, pp. 22- 29 ,(1990) , 10.5555/89086.89095
David E. Rumelhart, James L. McClelland, , Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations Computational Models of Cognition and Perception. ,(1986) , 10.7551/MITPRESS/5236.001.0001
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
PETER CHEESEMAN, JAMES KELLY, MATTHEW SELF, JOHN STUTZ, WILL TAYLOR, DON FREEMAN, AutoClass: a Bayesian classification system international conference on machine learning. pp. 431- 441 ,(1993) , 10.1016/B978-0-934613-64-4.50011-6
Douglass R. Cutting, David R. Karger, Jan O. Pedersen, John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections international acm sigir conference on research and development in information retrieval. ,vol. 51, pp. 318- 329 ,(1992) , 10.1145/3130348.3130362
David Yarowsky, Word-sense disambiguation using statistical models of Roget's categories trained on large corpora Proceedings of the 14th conference on Computational linguistics -. ,vol. 2, pp. 454- 460 ,(1992) , 10.3115/992133.992140