Discovery of concept entities from web sites using web unit mining

作者： Ming Yin Ming , Dion Hoe‐lian Goh , Ee‐Peng Lim , Aixin Sun

关键词:

摘要: A web site usually contains a large number of concept entities, each consisting one or more pages connected by hyperlinks. In order to discover these entities for expressive queries and other applications, the unit mining problem has been proposed. Web aims determine that constitute entity classify into categories. Nevertheless, performance an existing algorithm, iWUM, suffers as it may create than (incomplete units) from single entity. This paper presents two methods solve this problem. The first method introduces effective fragment construction so reduce later classification errors. second incorporates site‐specific knowledge handle incomplete units. Experiments show units can be removed overall accuracy significantly improved, especially on precision F1 measures.

参考文章(21)

YongHong Tian, TieJun Huang, Wen Gao, Jun Cheng, PingBo Kang, Two-phase Web site classification based on hidden Markov tree models web intelligence. pp. 227- 234 ,(2003) , 10.1109/WI.2003.1241198

Thorsten Joachims, Making large-scale support vector machine learning practical Advances in kernel methods. pp. 169- 184 ,(1999)

Xue-Mei Jiang, Gui-Rong Xue, Wen-Guan Song, Hua-Jun Zeng, Zheng Chen, Wei-Ying Ma, Exploiting PageRank at Different Block Level web information systems engineering. pp. 241- 252 ,(2004) , 10.1007/978-3-540-30480-7_26

K. Tajima, K. Tanaka, New techniques for the discovery of logical documents in Web international symposium on database applications in non traditional environments. pp. 125- 132 ,(1999) , 10.1109/DANTE.1999.844950

Wessel Kraaij, Thijs Westerveld, Djoerd Hiemstra, The Importance of Prior Probabilities for Entry Page Search Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02. pp. 27- 34 ,(2002) , 10.1145/564376.564383

Johannes Fürnkranz, Hyperlink ensembles: a case study in hypertext classification Information Fusion. ,vol. 3, pp. 299- 312 ,(2002) , 10.1016/S1566-2535(02)00090-8

Yiming Yang, Xin Liu, A re-examination of text categorization methods international acm sigir conference on research and development in information retrieval. pp. 42- 49 ,(1999) , 10.1145/312624.312647

Wen-Syan Li, Okan Kolak, Quoc Vu, Hajime Takano, Defining logical domains in a web site acm conference on hypertext. pp. 123- 132 ,(2000) , 10.1145/336296.336345

Aixin Sun, Ee-Peng Lim, Web unit mining: finding and classifying subgraphs of web pages conference on information and knowledge management. pp. 108- 115 ,(2003) , 10.1145/956863.956885

10.

Martin Ester, Hans-Peter Kriegel, Matthias Schubert, Web site mining Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02. pp. 249- 258 ,(2002) , 10.1145/775047.775084

Discovery of concept entities from web sites using web unit mining

来源期刊

我的账户

Discovery of concept entities from web sites using web unit mining

来源期刊

相似文章 5

Practical compressed string dictionaries

Compressed string dictionaries

Mining changes from versions of dynamic XML documents

Web crawlers compared

Compressed String Dictionaries

我的账户