Graph-Based AJAX Crawl: Mining Data from Rich Internet Applications

作者: Zhaomeng Peng , Nengqiang He , Chunxiao Jiang , Zhihua Li , Lei Xu

DOI: 10.1109/ICCSEE.2012.38

关键词:

摘要: AJAX (Asynchronous JavaScript and XML) is becoming more popular with the prosperity of web 2.0. However, traditional crawlers fail to retrieve information from applications because complex operations. Moreover, a single application one URL may have different page states, which violates rule that corresponds unique page. The can be modeled as state transition graph crawl traverse without prior knowledge its structure. In this paper, we distinguished events are not well defined in previous work proposed Graph-based State Traversal (GAST) algorithm minimal edge visits. If topology given, optimization problem turns into Directed Rural Postman Problem (DRPP) optimal lower bound obtained. Experimental results show approaches optimum exhibits better performance than existing work.

参考文章(13)
Stephen Kwek, On a Simple Depth-First Search Strategy for Exploring Unknown Graphs workshop on algorithms and data structures. pp. 345- 353 ,(1997) , 10.1007/3-540-63307-3_73
Rudolf Fleischer, Gerhard Trippen, Exploring an Unknown Graph Efficiently Algorithms – ESA 2005. ,vol. 3669, pp. 11- 22 ,(2005) , 10.1007/11561071_4
Kamara Benjamin, Gregor von Bochmann, Mustafa Emre Dincturk, Guy-Vincent Jourdan, Iosif Viorel Onut, A Strategy for Efficient Crawling of Rich Internet Applications Lecture Notes in Computer Science. pp. 74- 89 ,(2011) , 10.1007/978-3-642-22233-7_6
Rudolf Fleischer, Gerhard Trippen, Experimental studies of graph traversal algorithms Lecture Notes in Computer Science. ,vol. 2647, pp. 120- 133 ,(2003) , 10.1007/3-540-44867-5_10
X. Deng, C.H. Papadimitriou, Exploring an unknown graph foundations of computer science. pp. 355- 361 ,(1990) , 10.1109/FSCS.1990.89554
Susanne Albers, Monika R. Henzinger, Exploring Unknown Environments SIAM Journal on Computing. ,vol. 29, pp. 1164- 1188 ,(2000) , 10.1137/S009753979732428X
Moses S. Charikar, Similarity estimation techniques from rounding algorithms symposium on the theory of computing. pp. 380- 388 ,(2002) , 10.1145/509907.509965
V. Campos, J. V. Savall, A computational study of several heuristics for the DRPP Computational Optimization and Applications. ,vol. 4, pp. 67- 77 ,(1995) , 10.1007/BF01299159
Cristian Duda, Gianni Frey, Donald Kossmann, Chong Zhou, AJAXSearch: crawling, indexing and searching web 2.0 applications very large data bases. ,vol. 1, pp. 1440- 1443 ,(2008) , 10.14778/1454159.1454195
H. A. Eiselt, Michel Gendreau, Gilbert Laporte, Arc Routing Problems, Part I: The Chinese Postman Problem Operations Research. ,vol. 43, pp. 231- 242 ,(1995) , 10.1287/OPRE.43.2.231