Web Page Classification based on Document Structure

作者: Arul Prakash Asirvatham , Kranthi Kumar

DOI:

关键词:

摘要: The web is a huge repository of information and there need for categorizing documents to facilitate the search retrieval pages. Existing algorithms rely solely on text content pages classification. However, has lot contained in structure, images, video etc present document. In this paper, we propose method automatic classification into few broad categories based structure document characteristics images it.

参考文章(5)
Wai-chiu Wong, Ada Wai-chee Fu, Incremental Document Clustering for Web Page Classification Springer, Tokyo. pp. 101- 110 ,(2002) , 10.1007/978-4-431-66979-1_10
Mehran Sahami, Daphne Koller, Hierarchically Classifying Documents Using Very Few Words international conference on machine learning. pp. 170- 178 ,(1997)
Chidanand Apté, Fred Damerau, Sholom M. Weiss, Automated learning of decision rules for text categorization ACM Transactions on Information Systems. ,vol. 12, pp. 233- 251 ,(1994) , 10.1145/183422.183423
Susan Dumais, John Platt, David Heckerman, Mehran Sahami, Inductive learning algorithms and representations for text categorization conference on information and knowledge management. pp. 148- 155 ,(1998) , 10.1145/288627.288651