Web content outlier mining through mathematical approach and trust rating

作者: G. Poonkuzhali , G. V. Uma , K. Sarukesi

DOI:

关键词:

摘要: In this Internet era, the WWW is flooded with voluminous amount of information more replicated and irrelevant web pages. As unnecessary duplicated pages increase indexing space time complexity, finding removing these become a significant issue among retrieval mining research communities as most people rely on search engines to get required information. Web content outlier plays decisive role in covering all aspects. Existing algorithms for focuses attention applying weightage only structured documents whereas work, mathematical approach based two way rectangular representations, signed trust rating correlation method developed retrieving right without duplicates present both unstructured documents.

参考文章(9)
Malik Agyemang, Ken Barker, Reda Alhajj, Hybrid Approach to Web Content Outlier Mining Without Query Vector Data Warehousing and Knowledge Discovery. pp. 285- 294 ,(2005) , 10.1007/11546849_28
G. Poonkuzhali, G.V.Uma, K.Thiagarajan, K.Sarukesi, Signed Approach for Mining Web Content Outliers World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering. ,vol. 3, pp. 2124- 2128 ,(2009)
Zhongming Han, Qian Mo, Hongzhi Liu, Jianzhi Sun, Effectively and efficiently detect web page duplication international conference on digital information management. pp. 1- 6 ,(2009) , 10.1109/ICDIM.2009.5356801
Malik Agyemang, Ken Barker, Rada S. Alhajj, Mining web content outliers using structure oriented weighting techniques and N-grams Proceedings of the 2005 ACM symposium on Applied computing - SAC '05. pp. 482- 487 ,(2005) , 10.1145/1066677.1066788
G. Poonkuzhali, R. Kishore Kumar, R. Kripa Keshav, P. Sudhakar, K. Sarukesi, Correlation Based Method to Detect and Remove Redundant Web Document Advanced Materials Research. pp. 543- 546 ,(2010) , 10.4028/WWW.SCIENTIFIC.NET/AMR.171-172.543
Yunhe Weng, Lei Li, Yixin Zhong, Semantic keywords-based duplicated web pages removing international conference natural language processing. pp. 1- 7 ,(2008) , 10.1109/NLPKE.2008.4906751
Min-yan Wang, Dong-sheng Liu, The Research of Web Page De-duplication Based on Web Pages Reshipment Statement database technology and applications. pp. 271- 274 ,(2009) , 10.1109/DBTA.2009.64
Malik Agyemang, Ken Barker, Reda Alhajj, Framework for mining web content outliers acm symposium on applied computing. pp. 590- 594 ,(2004) , 10.1145/967900.968022
M. Agyemang, K. Barker, R.S. Alhajj, WCOND-mine: algorithm for detecting Web content outliers from Web documents international symposium on computers and communications. pp. 885- 890 ,(2005) , 10.1109/ISCC.2005.155