BlogPulse: Automated Trend Discovery for Weblogs

作者: Natalie S. Glance , Matthew Hurst , Takashi Tomokiyo

DOI:

关键词:

摘要: Over the past few years, weblogs have emerged as a new communication and publication medium on Internet. In this paper, we describe application of data mining, information extraction NLP algorithms for discovering trends across our subset approximately 100,000 weblogs. We publish daily lists key persons, phrases, paragraphs to public web site, BlogPulse.com. addition, maintain searchable index weblog entries. On top search index, implemented trend search, which graphs normalized line over time query provides way estimate relative buzz word mouth given topics time.

参考文章(11)
Russell Swan, James Allan, Automatic generation of overview timelines international acm sigir conference on research and development in information retrieval. pp. 49- 56 ,(2000) , 10.1145/345508.345546
James Allan, Ron Papka, Victor Lavrenko, On-line new event detection and tracking international acm sigir conference on research and development in information retrieval. ,vol. 51, pp. 37- 45 ,(1998) , 10.1145/3130348.3130366
Russell Swan, James Allan, Extracting significant time varying features from text conference on information and knowledge management. pp. 38- 45 ,(1999) , 10.1145/319950.319956
Takashi Tomokiyo, Matthew Hurst, A Language Model Approach to Keyphrase Extraction meeting of the association for computational linguistics. pp. 33- 40 ,(2003) , 10.3115/1119282.1119287
David Godes, Dina Mayzlin, Using Online Conversations to Study Word-of-Mouth Communication Marketing Science. ,vol. 23, pp. 545- 560 ,(2004) , 10.1287/MKSC.1040.0071
D. M. Pennock, G. W. Flake, S. Lawrence, E. J. Glover, C. L. Giles, Winners don't take all: Characterizing the competition for links on the web Proceedings of the National Academy of Sciences of the United States of America. ,vol. 99, pp. 5207- 5211 ,(2002) , 10.1073/PNAS.032085699
R. Swan, TimeMines : Constructing Timelines with Statistical Models of Word Usage knowledge discovery and data mining. ,(2000)
Ramakrishnan Srikant, Rakesh Agrawal, Fast algorithms for mining association rules very large data bases. pp. 580- 592 ,(1998)
Patrick Pantel, Dekang Lin, A Statistical Corpus-Based Term Extractor Lecture Notes in Computer Science. pp. 36- 46 ,(2001) , 10.1007/3-540-45153-6_4
Lada A. Adamic, Bernardo A. Huberman, A.-L. Barabási, R. Albert, H. Jeong, G. Bianconi, Power-Law Distribution of the World Wide Web Science. ,vol. 287, pp. 2115- 2115 ,(2000) , 10.1126/SCIENCE.287.5461.2115A