作者: Iraklis Kordomatis , Christoph Herzog , Ruslan R. Fayzrakhmanov , Bernhard Krüpl-Sypien , Wolfgang Holzinger
关键词:
摘要: Web object identification plays an important role in research fields such as information extraction, web automation, and form understanding for building meta-search engines. In contrast to other works, we approach this problem by analyzing various spatial, visual, functional textual characteristics of pages. We compute 49 unique features all visible page elements, which are then applied machine learning classifiers order identify similar elements on previously unexamined evaluate our with different scenarios the relevance chosen classification rate classifiers. These focus search forms from transportation domain, particularly flight, train, bus connections. The results evaluation very promising.