MBCrawler: A Software Architecture for Micro-Blog Crawler

作者: Gang Lu , Shumei Liu , Kevin Lü

DOI: 10.1007/978-3-642-34531-9_13

关键词:

摘要: Getting data is the precondition of researching on micro-blogging services. By using Web 2.0 techniques such as AJAX, contents micro-blog pages are dynamically generated rapidly. That makes it hard for traditional page crawler to crawl pages. Micro-blogging services provide some APIs. Through APIs, well-structured can be easily obtained. A software architecture service crawler, which named MBCrawler, designed basing APIs provided by The modular and scalable, so fit specific features different SinaMBCrawler, a application based MBCrawler Sina Weibo, has been developed. It automatically invokes Weibo data. crawled saved into local database.

参考文章(15)
Eytan Bakshy, Jake M. Hofman, Winter A. Mason, Duncan J. Watts, Everyone's an influencer: quantifying influence on twitter web search and data mining. pp. 65- 74 ,(2011) , 10.1145/1935826.1935845
Zhaomeng Peng, Nengqiang He, Chunxiao Jiang, Zhihua Li, Lei Xu, Yipeng Li, Yong Ren, Graph-Based AJAX Crawl: Mining Data from Rich Internet Applications international conference on computer science and electronics engineering. ,vol. 3, pp. 590- 594 ,(2012) , 10.1109/ICCSEE.2012.38
Ali Mesbah, Arie van Deursen, Stefan Lenselink, Crawling Ajax-Based Web Applications through Dynamic Analysis of User Interface State Changes ACM Transactions on The Web. ,vol. 6, pp. 3- ,(2012) , 10.1145/2109205.2109208
Sitaram Asur, Bernardo A. Huberman, Predicting the Future with Social Media web intelligence. ,vol. 1, pp. 492- 499 ,(2010) , 10.1109/WI-IAT.2010.63
Arvind Arasu, Junghoo Cho, Hector Garcia-Molina, Andreas Paepcke, Sriram Raghavan, Searching the Web ACM Transactions on Internet Technology. ,vol. 1, pp. 2- 43 ,(2001) , 10.1145/383034.383035
Akshay Java, Xiaodan Song, Tim Finin, Belle Tseng, Why we twitter Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis - WebKDD/SNA-KDD '07. pp. 56- 65 ,(2007) , 10.1145/1348549.1348556
Marcelo Mendoza, Barbara Poblete, Carlos Castillo, Twitter under crisis Proceedings of the First Workshop on Social Media Analytics - SOMA '10. pp. 71- 79 ,(2010) , 10.1145/1964858.1964869
Rui Li, Kin Hou Lei, Ravi Khadiwala, Kevin Chen-Chuan Chang, TEDAS: A Twitter-based Event Detection and Analysis System 2012 IEEE 28th International Conference on Data Engineering. pp. 1273- 1276 ,(2012) , 10.1109/ICDE.2012.125
Bharath Sriram, Dave Fuhry, Engin Demir, Hakan Ferhatosmanoglu, Murat Demirbas, Short text classification in twitter to improve information filtering international acm sigir conference on research and development in information retrieval. pp. 841- 842 ,(2010) , 10.1145/1835449.1835643
Jianshu Weng, Ee-Peng Lim, Jing Jiang, Qi He, TwitterRank: finding topic-sensitive influential twitterers web search and data mining. pp. 261- 270 ,(2010) , 10.1145/1718487.1718520