作者: Chien-Hung Liu , Woei-Kae Chen , Chi-Chia Sun
DOI: 10.1007/S11227-018-2335-4
关键词:
摘要: The Internet, having a sea of Web applications, is one the largest data stores for big analysis. To explore and retrieve states (pages) from crawlers have been extensively used. Most allow users to define few crawling directives so as increase coverage that crawler can explore. A directive can, example, assign an input value specified field application instructed perform specific action visit some special states. Note that, supposedly capable exploring unknown application. But, given application, how could user possibly prepare required in advance? This paper proposes interactive approach called GUIDE overcome this issue. Instead passively receiving user, actively asks when pages containing fields are found. In addition, offers hierarchical structure, allowing multiple values same field. case study with three applications indicated (1) were very useful increasing code being explored—up 10.3–50.5% improvement be achieved, (2) using more efficient than traditional crawler—given amount time, up 11% achieved.