作者: Alexander Mehler , Matthias Dehmer , Rüdiger Gleim
DOI: 10.1007/11553762_14
关键词:
摘要: Facing the retrieval problem according to overwhelming set of documents online adaptation text categorization web units has recently been pushed. The aim is utilize categories sites and pages as an additional criterion. In this context, bag-of-words model utilized just HTML tags link structures. spite promising results stays in framework IR specific models since it neglects content-based structuring inherent hypertext units. This paper approaches modelling from perspective graph-theory. It presents XML-based format for representing websites hypergraphs. These hypergraphs are used shed light on relation structure types their web-based instances. We place emphasis two characteristics relation: terms realizational ambiguity we speak functional equivalents manifestation same type. polymorphism a single unit which manifests different types. shown that prevalent characteristic done by means experiment analyses corpus content conference websites. On background plead revision representation sensitive manifold documents.