作者: Chanattha Thongsuk , Choochart Haruechaiyasak , Somkid Saelee
DOI: 10.1109/ECTICON.2011.5947886
关键词: Encyclopedia 、 Topic model 、 Latent Dirichlet allocation 、 The Internet 、 Set (abstract data type) 、 Machine learning 、 Marketing channel 、 Feature (computer vision) 、 Artificial intelligence 、 Computer science 、 Bag-of-words model
摘要: Today many businesses have adopted Twitter as a new marketing channel to promote their products and services. One of the potentially useful applications is recommend users follow which match interests. possible solution apply classification algorithm predict user's posts into some predefined business categories. Due short length characteristic, classifying very difficult challenging. In this paper, we propose feature processing framework for constructing text categorization models. A topic model constructed from set terms based on Latent Dirichlet Allocation (LDA) algorithm. We two different approaches: (1) transformation, i.e., using topics features (2) expansion, appending terms. Experimental results show that highest accuracy 95.7% obtained with expansion technique, an improvement 18.7% over Bag Words (BOW) model.