#Bieber + #Blast = #BieberBlast: Early Prediction of Popular Hashtag Compounds

作者: Suman Kalyan Maity , Ritvik Saraf , Animesh Mukherjee

DOI: 10.1145/2818048.2820019

关键词: Artificial intelligenceBaseline (configuration management)CompoundingComputer scienceEarly predictionNatural language processingNatural language

摘要: Compounding of natural language units is a very common phenomena. In this paper, we show, for the first time, that Twitter hashtags which, could be considered as correlates such linguistic units, undergo compounding. We identify reasons compounding and propose prediction model can with 77.07% accuracy if pair in near future (i.e., 2 months after compounding) shall become popular. At longer times T = 6, 10 accuracies are 77.52% 79.13% respectively. This technique has strong implications to trending hashtag recommendation since newly formed compounds recommended early, even before taken place. Further, humans predict an overall only 48.7% (treated baseline). Notably, while discriminate relatively easier cases, automatic framework successful classifying harder cases.

参考文章(67)
François Maniez, Pierre J. L. Arnaud, Vincent Renner, Introduction: A bird's-eye view of lexical blending De Gruyter Mouton. pp. 1- 9 ,(2011)
Olutobi Owoputi, Kevin Gimpel, Nathan Schneider, Chris Dyer, Noah A. Smith, Brendan O'Connor, Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters north american chapter of the association for computational linguistics. pp. 380- 390 ,(2013)
Jiang Yang, Scott Counts, Predicting the Speed, Scale, and Range of Information Diffusion in Twitter international conference on weblogs and social media. ,(2010)
James Pustejovsky, The generative lexicon ,(1995)
Davide Ricca, Livio Gaeta, Composita solvantur: Compounds as lexical units or morphological objects? The Italian Journal of Linguistics. ,vol. 21, pp. 35- 70 ,(2009)
Rita Brdar-Szabó, Mario Brdar, On the marginality of lexical blending Jezikoslovlje. ,vol. 9, pp. 171- 194 ,(2008)
Heinz Giegerich, Compounding and Lexicalism Oxford Handbooks Online. pp. 178- 200 ,(2011) , 10.1093/OXFORDHB/9780199695720.013.0009
Lilian Weng, Yong-Yeol Ahn, Filippo Menczer, Predicting Successful Memes using Network and Community Structure arXiv: Social and Information Networks. ,(2014)
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Joseph L. Fleiss, Measuring nominal scale agreement among many raters. Psychological Bulletin. ,vol. 76, pp. 378- 382 ,(1971) , 10.1037/H0031619