A multi-level matching method with hybrid similarity for document retrieval

作者: Haijun Zhang , Tommy W.S. Chow

DOI: 10.1016/J.ESWA.2011.08.128

关键词:

摘要: Highlights? We propose a multi-level-structured representation to express more semantic information of document. ? A multi-level matching method incorporate with EMD distance solved by linear programming is introduced. hybrid similarity including the global and local used enhance retrieval accuracy. Experimental results corroborate that our proposed works well for lengthy documents. Our two-step system can serve as general computationally efficient solution DR. This paper presents document (DR) using similarity. Documents are represented structure level paragraph level. designed model underlying semantics in flexible accurate way conventional flat term histograms find it hard cope with. The between documents then transformed into an optimization problem Earth Mover's Distance (EMD). synthesize improve In this paper, we have performed extensive experimental study verification. suggest evident spatial distributions terms.

参考文章(28)
George J. Minty, Victor Klee, HOW GOOD IS THE SIMPLEX ALGORITHM Inequalities. pp. 159- 175 ,(1970)
Gerald J. Lieberman, Frederick Stanton Hillier, Introduction to mathematical programming McGraw-Hill. ,(1995)
Yossi (Joseph) Rubner, Carlo Tomasi, Perceptual metrics for image database navigation ,(1999)
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
Justin Zobel, Alistair Moffat, Exploring the similarity space international acm sigir conference on research and development in information retrieval. ,vol. 32, pp. 18- 34 ,(1998) , 10.1145/281250.281256
Xiao-Bing Xue, Zhi-Hua Zhou, Distributional Features for Text Categorization IEEE Transactions on Knowledge and Data Engineering. ,vol. 21, pp. 428- 442 ,(2009) , 10.1109/TKDE.2008.166
Gerard Salton, Christopher Buckley, Term Weighting Approaches in Automatic Text Retrieval Information Processing and Management. ,vol. 24, pp. 323- 328 ,(1988) , 10.1016/0306-4573(88)90021-0
Güneş Erkan, Language Model-Based Document Clustering Using Random Walks language and technology conference. pp. 479- 486 ,(2006) , 10.3115/1220835.1220896
Yossi Rubner, Jan Puzicha, Carlo Tomasi, Joachim M Buhmann, Empirical Evaluation of Dissimilarity Measures for Color and Texture Computer Vision and Image Understanding. ,vol. 84, pp. 25- 43 ,(2001) , 10.1006/CVIU.2001.0934