作者: G. Poonkuzhali , G. V. Uma , K. Sarukesi
DOI:
关键词:
摘要: In this Internet era, the WWW is flooded with voluminous amount of information more replicated and irrelevant web pages. As unnecessary duplicated pages increase indexing space time complexity, finding removing these become a significant issue among retrieval mining research communities as most people rely on search engines to get required information. Web content outlier plays decisive role in covering all aspects. Existing algorithms for focuses attention applying weightage only structured documents whereas work, mathematical approach based two way rectangular representations, signed trust rating correlation method developed retrieving right without duplicates present both unstructured documents.