Web object identification for web automation and meta-search

作者: Iraklis Kordomatis , Christoph Herzog , Ruslan R. Fayzrakhmanov , Bernhard Krüpl-Sypien , Wolfgang Holzinger

DOI: 10.1145/2479787.2479798

关键词:

摘要: Web object identification plays an important role in research fields such as information extraction, web automation, and form understanding for building meta-search engines. In contrast to other works, we approach this problem by analyzing various spatial, visual, functional textual characteristics of pages. We compute 49 unique features all visible page elements, which are then applied machine learning classifiers order identify similar elements on previously unexamined evaluate our with different scenarios the relevance chosen classification rate classifiers. These focus search forms from transportation domain, particularly flight, train, bus connections. The results evaluation very promising.

参考文章(22)
Alberto Bartoli, Eric Medvet, Marco Mauri, Recording and replaying navigations on AJAX web sites international conference on web engineering. ,vol. 7387, pp. 370- 377 ,(2012) , 10.1007/978-3-642-31753-8_30
Peter McCullagh, John Ashworth Nelder, Generalized Linear Models ,(1983)
Gilly Leshed, Eben M. Haber, Tara Matthews, Tessa Lau, CoScripter Proceeding of the twenty-sixth annual CHI conference on Human factors in computing systems - CHI '08. pp. 1719- 1728 ,(2008) , 10.1145/1357054.1357323
Tim Furche, Georg Gottlob, Giovanni Grasso, Xiaonan Guo, Giorgio Orsi, Christian Schallhart, OPAL: automated form understanding for the deep web the web conference. pp. 829- 838 ,(2012) , 10.1145/2187836.2187948
Bernhard Krüpl-Sypien, Ruslan R. Fayzrakhmanov, Wolfgang Holzinger, Mathias Panzenböck, Robert Baumgartner, A versatile model for web page representation, information extraction and content re-packaging Proceedings of the 11th ACM symposium on Document engineering - DocEng '11. pp. 129- 138 ,(2011) , 10.1145/2034691.2034721
James Byrne, Cathal Heavey, P.J. Byrne, A review of Web-based simulation and supporting tools Simulation Modelling Practice and Theory. ,vol. 18, pp. 253- 276 ,(2010) , 10.1016/J.SIMPAT.2009.09.013
Christoph Herzog, Iraklis Kordomatis, Wolfgang Holzinger, Ruslan R. Fayzrakhmanov, Bernhard Krüpl-Sypien, Feature-based object identification for web automation acm symposium on applied computing. pp. 742- 749 ,(2013) , 10.1145/2480362.2480504
Fred J. Damerau, A technique for computer detection and correction of spelling errors Communications of the ACM. ,vol. 7, pp. 171- 176 ,(1964) , 10.1145/363958.363994