A refinement framework for cross language text categorization

作者: Bao-Liang Lu , Ke Wu

DOI: 10.5555/1786374.1786427

关键词:

摘要: Cross language text categorization is the task of exploiting labelled documents in a source (e.g. English) to classify target Chinese). In this paper, we focus on investigating use bilingual lexicon for cross categorization. To end, propose novel refinement framework The consists two stages. first stage, model transfer proposed generate initial labels language. second expectation maximization algorithm based naive Bayes introduced yield resulting documents. Preliminary experimental results collected corpora show that effective.

参考文章(18)
Nuria Bel, Cornelis H. A. Koster, Marta Villegas, Cross-Lingual Text Categorization international conference theory and practice digital libraries. pp. 126- 139 ,(2003) , 10.1007/978-3-540-45175-4_13
Chris Buckley, Implementation of the SMART Information Retrieval System Cornell University. ,(1985)
Marianne Lykke, Birger Larsen, Haakon Lund, Peter Ingwersen, Developing a Test Collection for the Evaluation of Integrated Search Lecture Notes in Computer Science. pp. 627- 630 ,(2010) , 10.1007/978-3-642-12275-0_63
Yaoyong Li, John Shawe-Taylor, Using KCCA for Japanese---English cross-language information retrieval and document classification intelligent information systems. ,vol. 27, pp. 117- 133 ,(2006) , 10.1007/S10844-006-1627-Y
Philip Resnik, Noah A. Smith, The Web as a parallel corpus Computational Linguistics. ,vol. 29, pp. 349- 380 ,(2003) , 10.1162/089120103322711578
A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum Likelihood from Incomplete Data Via theEMAlgorithm Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 39, pp. 1- 22 ,(1977) , 10.1111/J.2517-6161.1977.TB01600.X
J. Scott Olsson, Douglas W. Oard, Jan Hajič, Cross-language text classification international acm sigir conference on research and development in information retrieval. pp. 645- 646 ,(2005) , 10.1145/1076034.1076170
Kamal Nigam, Andrew Kachites McCallum, Sebastian Thrun, Tom Mitchell, Text Classification from Labeled and Unlabeled Documents using EM Machine Learning. ,vol. 39, pp. 103- 134 ,(2000) , 10.1023/A:1007692713085
L. Rigutini, M. Maggini, Bing Liu, An EM Based Training Algorithm for Cross-Language Text Categorization web intelligence. pp. 529- 535 ,(2005) , 10.1109/WI.2005.29
Hang Li, Cong Li, Word translation disambiguation using bilingual bootstrapping Computational Linguistics. ,vol. 30, pp. 1- 22 ,(2004) , 10.1162/089120104773633367