作者: Giannis Nikolentzos , Polykarpos Meladianos , Francois Rousseau , Yannis Stavrakas , Michalis Vazirgiannis
DOI: 10.18653/V1/E17-2072
关键词:
摘要: Recently, there has been a lot of activity in learning distributed representations words vector spaces. Although are models capable high-quality words, how to generate the same quality for phrases or documents still remains challenge. In this paper, we propose model each document as multivariate Gaussian distribution based on its words. We then measure similarity between two their distributions. Experiments eight standard text categorization datasets demonstrate effectiveness proposed approach comparison with state-of-the-art methods.