作者: Chul Su Lim , Kong Joo Lee , Gil Chang Kim
DOI: 10.1016/J.IPM.2004.06.004
关键词:
摘要: With the increase of information on Web, it is difficult to find desired quickly out documents retrieved by a search engine. One way solve this problem classify web according various criteria. Most document classification has been focused subject or topic document. A genre style another view different from topic. The also criterion documents. In paper, we suggest multiple sets features genres basic set features, which have proposed in previous studies, acquired textual properties documents, such as number sentences, certain word, etc. However, are that they contain URL and HTML tags within pages. We introduce new specific extracted tags. present work an attempt evaluate performance discuss their characteristics. Finally, conclude appropriate automatic