Learning to Classify Short Text with Topic Model and External Knowledge

作者: Ying Zhu , Li Li , Le Luo

DOI: 10.1007/978-3-642-39787-5_41

关键词: Computer scienceFocus (computing)CategorizationBaseline (configuration management)Artificial intelligenceNatural language processingTopic analysisTopic model

摘要: Many methods have been developed to utilize topic analysis models deal with the noises and sparseness of text. However, use a model solely sometimes unable achieve expected high performance, it is very necessary improve current cope characteristic texts specific requirements. In this paper, we focus on two tasks. One make different external corpus identify topics from for better categorization. The other add weight few features in get some those model. We further evaluate performance tasks baseline results. experiments show that our proposed method can higher accuracy text classification. approach find truly representative words which may contribute wide acceptance micro-blog analysis.

参考文章(18)
Li Xue, Yun Xiong, Yangyong Zhu, Jianfeng Wu, Zhiyuan Chen, Stock Trend Prediction by Classifying Aggregative Web Topic-Opinion pacific-asia conference on knowledge discovery and data mining. pp. 173- 184 ,(2013) , 10.1007/978-3-642-37456-2_15
Fabian Abel, Qi Gao, Geert-Jan Houben, Ke Tao, Semantic Enrichment of Twitter Posts for User Profile Construction on the Social Web The Semanic Web: Research and Applications. pp. 375- 389 ,(2011) , 10.1007/978-3-642-21064-8_26
Michael Strube, Simone Paolo Ponzetto, WikiRelate! computing semantic relatedness using wikipedia national conference on artificial intelligence. pp. 1419- 1424 ,(2006)
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Marianne Lykke, Birger Larsen, Haakon Lund, Peter Ingwersen, Developing a Test Collection for the Evaluation of Integrated Search Lecture Notes in Computer Science. pp. 627- 630 ,(2010) , 10.1007/978-3-642-12275-0_63
Bharath Sriram, Dave Fuhry, Engin Demir, Hakan Ferhatosmanoglu, Murat Demirbas, Short text classification in twitter to improve information filtering international acm sigir conference on research and development in information retrieval. pp. 841- 842 ,(2010) , 10.1145/1835449.1835643
Xia Hu, Nan Sun, Chao Zhang, Tat-Seng Chua, Exploiting internal and external semantics for the clustering of short texts using world knowledge Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09. pp. 919- 928 ,(2009) , 10.1145/1645953.1646071
Ana-Maria Popescu, Marco Pennacchiotti, Deepa Paranjpe, Extracting events and event descriptions from Twitter the web conference. pp. 105- 106 ,(2011) , 10.1145/1963192.1963246
Evgeniy Gabrilovich, Shaul Markovitch, Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge national conference on artificial intelligence. pp. 1301- 1306 ,(2006)
Haewoon Kwak, Changhyun Lee, Hosung Park, Sue Moon, None, What is Twitter, a social network or a news media? the web conference. pp. 591- 600 ,(2010) , 10.1145/1772690.1772751