Multiple sets of features for automatic genre classification of web documents

作者： Chul Su Lim , Kong Joo Lee , Gil Chang Kim

DOI: 10.1016/J.IPM.2004.06.004

关键词:

摘要: With the increase of information on Web, it is difficult to find desired quickly out documents retrieved by a search engine. One way solve this problem classify web according various criteria. Most document classification has been focused subject or topic document. A genre style another view different from topic. The also criterion documents. In paper, we suggest multiple sets features genres basic set features, which have proposed in previous studies, acquired textual properties documents, such as number sentences, certain word, etc. However, are that they contain URL and HTML tags within pages. We introduce new specific extracted tags. present work an attempt evaluate performance discuss their characteristics. Finally, conclude appropriate automatic

sciencedirect.com 本地加速

uni-trier.de 本地加速

doi.org 本地加速

elsevier.com 本地加速

sciencedirect.com LINK 下载加速

sci-hub.se PDF 下载加速

参考文章(21)

Niklas Wolkert, Jussi Karlgren, Johan Dewe, Ivan Bretan, Anders Hallberg, Iterative Information Retrieval Using Fast Clustering and Usage-Specific Genres ,(1999)

Jussi Karlgren, Johan Dewe, Ivan Bretan, Assembling a Balanced Corpus from the Internet Proceedings of the 11th Nordic Conference of Computational Linguistics (NODALIDA 1998). pp. 100- 108 ,(1998)

Rich Caruana, Dayne Freitag, Greedy Attribute Selection Machine Learning Proceedings 1994. pp. 28- 36 ,(1994) , 10.1016/B978-1-55860-335-6.50012-X

John M. Pierre, Practical Issues for Automated Categorization of Web Sites ,(2000)

Fiona J. Tweedie, R. Harald Baayen, How variable may a constant be? Measures of lexical richness in perspective Computers and The Humanities. ,vol. 32, pp. 323- 352 ,(1998) , 10.1023/A:1001749303137

Douglas Biber, Dimensions of Register Variation: A Cross-Linguistic Comparison ,(1995)

Wessel Kraaij, Thijs Westerveld, Djoerd Hiemstra, The Importance of Prior Probabilities for Entry Page Search Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '02. pp. 27- 34 ,(2002) , 10.1145/564376.564383

Douglass R. Cutting, David R. Karger, Jan O. Pedersen, Constant interaction-time scatter/gather browsing of very large document collections international acm sigir conference on research and development in information retrieval. pp. 126- 134 ,(1993) , 10.1145/160688.160706

Douglas Douglas, The multi-dimensional approach to linguistic analyses of genre variation: An overview of methodology and findings Computers and The Humanities. ,vol. 26, pp. 331- 345 ,(1992) , 10.1007/BF00136979

10.

E. Stamatatos, N. Fakotakis, G. Kokkinakis, Text genre detection using common word frequencies international conference on computational linguistics. pp. 808- 814 ,(2000) , 10.3115/992730.992763

Multiple sets of features for automatic genre classification of web documents

来源期刊

我的账户

Multiple sets of features for automatic genre classification of web documents

来源期刊

相似文章 10

我的账户