作者: Ming Yin Ming , Dion Hoe‐lian Goh , Ee‐Peng Lim , Aixin Sun
DOI: 10.1108/17440080580000088
关键词:
摘要: A web site usually contains a large number of concept entities, each consisting one or more pages connected by hyperlinks. In order to discover these entities for expressive queries and other applications, the unit mining problem has been proposed. Web aims determine that constitute entity classify into categories. Nevertheless, performance an existing algorithm, iWUM, suffers as it may create than (incomplete units) from single entity. This paper presents two methods solve this problem. The first method introduces effective fragment construction so reduce later classification errors. second incorporates site‐specific knowledge handle incomplete units. Experiments show units can be removed overall accuracy significantly improved, especially on precision F1 measures.