Indexing Rich Internet Applications Using Components-Based Crawling

作者: Ali Moosavi , Salman Hooshmand , Sara Baghbanzadeh , Guy-Vincent Jourdan , Gregor V. Bochmann

DOI: 10.1007/978-3-319-08245-5_12

关键词: Web crawlerComponent (UML)AjaxJavaScriptFinite-state machineRich Internet applicationComputer scienceSearch engine indexingCrawlingReal-time computing

摘要: Automatic crawling of Rich Internet Applications (RIAs) is a challenge because client-side code modifies the client dynamically, fetching server-side data asynchronously. Most existing solutions model RIAs as state machines with DOMs states and JavaScript events execution transitions. This approach fails when used “real-life”, complex RIAs, size produced much too large to be practical. In this paper, we propose new method crawl AJAX-based in an efficient manner by detecting “components”, which are areas DOM that independent from each other, component separately. leads dramatic reduction required space for model, without loss content coverage. Our does not require prior knowledge RIA nor predefined definition components. Instead, infer components observing behavior during crawling. experimental results show our can index quickly completely industrial simply out reach traditional methods.

参考文章(27)
Iosif Viorel Onut, Gregor von Bochmann, Mustafa Emre Dincturk, Seyed M. Mirtaheri, Suryakant Choudhary, Guy-Vincent Jourdan, Ali Moosavi, Crawling rich internet applications: the state of the art conference of the centre for advanced studies on collaborative research. pp. 146- 160 ,(2012)
Salman Hooshmand, Iosif Viorel Onut, Gregor V. Bochmann, Mustafa Emre Dinçtürk, Seyed M. Mirtaheri, Guy-Vincent Jourdan, A brief history of web crawlers conference of the centre for advanced studies on collaborative research. pp. 40- 54 ,(2013)
Piero Fraternali, Gustavo Rossi, Fernando Sánchez-Figueroa, Rich Internet Applications IEEE Internet Computing. ,vol. 14, pp. 9- 12 ,(2010) , 10.1109/MIC.2010.76
Alex Q. Chen, Widget identification and modification for web 2.0 access technologies (WIMWAT) ACM Sigaccess Accessibility and Computing. ,vol. 96, pp. 11- 18 ,(2010) , 10.1145/1731849.1731851
Zhaomeng Peng, Nengqiang He, Chunxiao Jiang, Zhihua Li, Lei Xu, Yipeng Li, Yong Ren, Graph-Based AJAX Crawl: Mining Data from Rich Internet Applications international conference on computer science and electronics engineering. ,vol. 3, pp. 590- 594 ,(2012) , 10.1109/ICCSEE.2012.38
Domenico Amalfitano, Anna Rita Fasolino, Armando Polcaro, Porfirio Tramontana, The DynaRIA tool for the comprehension of Ajax web applications by dynamic analysis Innovations in Systems and Software Engineering. ,vol. 10, pp. 41- 57 ,(2014) , 10.1007/S11334-013-0207-X
Mustafa Emre Dincturk, Guy-Vincent Jourdan, Gregor V. Bochmann, Iosif Viorel Onut, A Model-Based Approach for Crawling Rich Internet Applications ACM Transactions on The Web. ,vol. 8, pp. 19- ,(2014) , 10.1145/2626371
Amin Milani Fard, Ali Mesbah, Feedback-directed exploration of web applications to derive test models international symposium on software reliability engineering. pp. 278- 287 ,(2013) , 10.1109/ISSRE.2013.6698880
Cor-Paul Bezemer, Ali Mesbah, Arie van Deursen, Automated security testing of web widget interactions foundations of software engineering. pp. 81- 90 ,(2009) , 10.1145/1595696.1595711
Iyad Abu Doush, Faisal Alkhateeb, Eslam Al Maghayreh, Mohammed Azmi Al-Betar, The design of RIA accessibility evaluation tool Advances in Engineering Software. ,vol. 57, pp. 1- 7 ,(2013) , 10.1016/J.ADVENGSOFT.2012.11.004