作者: Shiva Prasad Kasiviswanathan , Prem Melville , Arindam Banerjee , Vikas Sindhwani
关键词:
摘要: Streaming user-generated content in the form of blogs, microblogs, forums, and multimedia sharing sites, provides a rich source data from which invaluable information insights maybe gleaned. Given vast volume such social media being continually generated, one challenges is to automatically tease apart emerging topics discussion constant background chatter. Such can be identified by appearance multiple posts on unique subject matter, distinct previous online discourse. We address problem identifying through use dictionary learning. propose two stage approach respectively based detection clustering novel content. derive scalable using alternating directions method solve resulting optimization problems. Empirical results show that our proposed more effective than several baselines detecting traditional news story newsgroup data. also demonstrate practical application analysis, study streaming Twitter.