Multi-classification of business types on twitter based on topic model

作者: Chanattha Thongsuk , Choochart Haruechaiyasak , Somkid Saelee

DOI: 10.1109/ECTICON.2011.5947886

关键词: EncyclopediaTopic modelLatent Dirichlet allocationThe InternetSet (abstract data type)Machine learningMarketing channelFeature (computer vision)Artificial intelligenceComputer scienceBag-of-words model

摘要: Today many businesses have adopted Twitter as a new marketing channel to promote their products and services. One of the potentially useful applications is recommend users follow which match interests. possible solution apply classification algorithm predict user's posts into some predefined business categories. Due short length characteristic, classifying very difficult challenging. In this paper, we propose feature processing framework for constructing text categorization models. A topic model constructed from set terms based on Latent Dirichlet Allocation (LDA) algorithm. We two different approaches: (1) transformation, i.e., using topics features (2) expansion, appending terms. Experimental results show that highest accuracy 95.7% obtained with expansion technique, an improvement 18.7% over Bag Words (BOW) model.

参考文章(13)
Gabriella Pasi, Gloria Bordogna, Robert Villa, A multi-criteria content-based filtering system international acm sigir conference on research and development in information retrieval. pp. 775- 776 ,(2007) , 10.1145/1277741.1277903
Somnath Banerjee, Improving text classification accuracy using topic modeling over an additional corpus Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08. pp. 867- 868 ,(2008) , 10.1145/1390334.1390546
Michael Chau, Hsinchun Chen, A machine learning approach to web page filtering using content and structure analysis decision support systems. ,vol. 44, pp. 482- 494 ,(2008) , 10.1016/J.DSS.2007.06.002
José María, Guillermo Cajigas, Enrique Puertas, Content based SMS spam filtering Proceedings of the 2006 ACM symposium on Document engineering - DocEng '06. pp. 107- 114 ,(2006) , 10.1145/1166160.1166191
Akshay Java, Xiaodan Song, Tim Finin, Belle Tseng, Why we twitter Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis - WebKDD/SNA-KDD '07. pp. 56- 65 ,(2007) , 10.1145/1348549.1348556
N. Churcharoenkrung, Y.S. Kim, B.H. Kang, Dynamic Web content filtering based on user's knowledge international conference on information technology coding and computing. ,vol. 1, pp. 184- 188 ,(2005) , 10.1109/ITCC.2005.137
Erik Linstead, Paul Rigor, Sushil Bajracharya, Cristina Lopes, Pierre Baldi, Mining concepts from code with probabilistic topic models automated software engineering. pp. 461- 464 ,(2007) , 10.1145/1321631.1321709
Bernard J. Jansen, Mimi Zhang, Kate Sobel, Abdur Chowdury, Micro-blogging as online word of mouth branding Proceedings of the 27th international conference extended abstracts on Human factors in computing systems - CHI EA '09. pp. 3859- 3864 ,(2009) , 10.1145/1520340.1520584
Dejin Zhao, Mary Beth Rosson, How and why people Twitter Proceedinfs of the ACM 2009 international conference on Supporting group work - GROUP '09. pp. 243- 252 ,(2009) , 10.1145/1531674.1531710
Mohammed Nazim uddin, Jenu Shrestha, Geun-Sik Jo, Enhanced Content-Based Filtering Using Diverse Collaborative Prediction for Movie Recommendation asian conference on intelligent information and database systems. pp. 132- 137 ,(2009) , 10.1109/ACIIDS.2009.77