作者: Frank Emmert Streib , Jürgen Kilian , Alexander Mehler , Matthias Dehmer
DOI:
关键词:
摘要: Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms their DOM-Trees. Other framework well vector space model. In contrast to these we present a new approach web-based documents represented by so called generalized trees which more general than DOM-Trees represent only directed rooted trees. We will design measure graphs representing hypertext structures. Our is mainly on novel representation graph as strings linear integers, whose components properties graph. The two then defined optimal alignment underlying property strings. this paper apply technique sequence alignments solve challenging problem: Measuring More precisely, first transform our considered high dimensional objects Then derive values from order Hence, problem string problem. demonstrate that captures important information applying it different test sets consisting documents. Keywords—Graph similarity, hierarchical graphs, hypertext, trees, web structure mining.