GUIDE: an interactive and incremental approach for crawling Web applications

作者: Chien-Hung Liu , Woei-Kae Chen , Chi-Chia Sun

DOI: 10.1007/S11227-018-2335-4

关键词:

摘要: The Internet, having a sea of Web applications, is one the largest data stores for big analysis. To explore and retrieve states (pages) from crawlers have been extensively used. Most allow users to define few crawling directives so as increase coverage that crawler can explore. A directive can, example, assign an input value specified field application instructed perform specific action visit some special states. Note that, supposedly capable exploring unknown application. But, given application, how could user possibly prepare required in advance? This paper proposes interactive approach called GUIDE overcome this issue. Instead passively receiving user, actively asks when pages containing fields are found. In addition, offers hierarchical structure, allowing multiple values same field. case study with three applications indicated (1) were very useful increasing code being explored—up 10.3–50.5% improvement be achieved, (2) using more efficient than traditional crawler—given amount time, up 11% achieved.

参考文章(23)
Iosif Viorel Onut, Gregor von Bochmann, Mustafa Emre Dincturk, Seyed M. Mirtaheri, Suryakant Choudhary, Guy-Vincent Jourdan, Ali Moosavi, Crawling rich internet applications: the state of the art conference of the centre for advanced studies on collaborative research. pp. 146- 160 ,(2012)
Shabnam Mirshokraie, Ali Mesbah, JSART: javascript assertion-based regression testing international conference on web engineering. pp. 238- 252 ,(2012) , 10.1007/978-3-642-31753-8_18
Hideo Tanida, Mukul R. Prasad, Sreeranga P. Rajan, Masahiro Fujita, Automated System Testing of Dynamic Web Applications international conference on software and data technologies. pp. 181- 196 ,(2011) , 10.1007/978-3-642-36177-7_12
F. Ferrucci, F. Sarro, D. Ronca, S. Abrahao, A crawljax based approach to exploit traditional accessibility evaluation tools for AJAX applications In: UNSPECIFIED (pp. 255-262). (2011). pp. 255- 262 ,(2011) , 10.1007/978-3-7908-2632-6_29
Ali Moosavi, Salman Hooshmand, Sara Baghbanzadeh, Guy-Vincent Jourdan, Gregor V. Bochmann, Iosif Viorel Onut, Indexing Rich Internet Applications Using Components-Based Crawling international conference on web engineering. pp. 200- 217 ,(2014) , 10.1007/978-3-319-08245-5_12
Salman Hooshmand, Iosif Viorel Onut, Gregor V. Bochmann, Mustafa Emre Dinçtürk, Seyed M. Mirtaheri, Guy-Vincent Jourdan, A brief history of web crawlers conference of the centre for advanced studies on collaborative research. pp. 40- 54 ,(2013)
Ali Mesbah, Arie van Deursen, Stefan Lenselink, Crawling Ajax-Based Web Applications through Dynamic Analysis of User Interface State Changes ACM Transactions on The Web. ,vol. 6, pp. 3- ,(2012) , 10.1145/2109205.2109208
Mustafa Emre Dincturk, Guy-Vincent Jourdan, Gregor V. Bochmann, Iosif Viorel Onut, A Model-Based Approach for Crawling Rich Internet Applications ACM Transactions on The Web. ,vol. 8, pp. 19- ,(2014) , 10.1145/2626371
Amin Milani Fard, Ali Mesbah, Feedback-directed exploration of web applications to derive test models international symposium on software reliability engineering. pp. 278- 287 ,(2013) , 10.1109/ISSRE.2013.6698880
Sergey Brin, Lawrence Page, The anatomy of a large-scale hypertextual Web search engine the web conference. ,vol. 30, pp. 107- 117 ,(1998) , 10.1016/S0169-7552(98)00110-X