Unsupervised Feature Generation using Knowledge Repositories for Effective Text Categorization

作者: Rajendra Prasath , Sudeshna Sarkar

DOI:

关键词: Feature vectorFeature generationStructure miningComputer scienceHuman knowledgeInformation retrievalContext (language use)Bag-of-words modelHyperlinkText categorization

摘要: We propose an unsupervised feature generation algorithm using the repositories of human knowledge for effective text categorization. Conventional bag words (BOW) depends on presence / absence keywords to classify documents. To understand actual context behind these keywords, we use concepts hyperlinks from external sources through content and structure mining Wikipedia. Then, features are clustered generate cluster vectors with which input documents mapped into a high dimensional space classification is performed. The simulation results show that proposed approach identifies associated in collection yields improved accuracy.

参考文章(7)
Somnath Banerjee, Improving text classification accuracy using topic modeling over an additional corpus Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08. pp. 867- 868 ,(2008) , 10.1145/1390334.1390546
Evgeniy Gabrilovich, Feature generation for textual information retrieval using world knowledge international acm sigir conference on research and development in information retrieval. ,vol. 41, pp. 123- 123 ,(2007) , 10.1145/1328964.1328988
M. E. Maron, Automatic Indexing: An Experimental Inquiry Journal of the ACM. ,vol. 8, pp. 404- 417 ,(1961) , 10.1145/321075.321084
Pu Wang, Jian Hu, Hua-Jun Zeng, Zheng Chen, Using Wikipedia knowledge to improve text classification Knowledge and Information Systems. ,vol. 19, pp. 265- 281 ,(2009) , 10.1007/S10115-008-0152-4
Fabrizio Sebastiani, Machine learning in automated text categorization ACM Computing Surveys. ,vol. 34, pp. 1- 47 ,(2002) , 10.1145/505282.505283
Inderjit S. Dhillon, Yuqiang Guan, Brian Kulis, Weighted Graph Cuts without Eigenvectors A Multilevel Approach IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 29, pp. 1944- 1957 ,(2007) , 10.1109/TPAMI.2007.1115