作者: Franca Debole , Fabrizio Sebastiani
DOI: 10.1002/ASI.V56:6
关键词:
摘要: The existence, public availability, and widespread acceptance of a standard benchmark for given information retrieval (IR) task are beneficial to research on this task, because they allow different researchers experimentally compare their own systems by comparing the results have obtained benchmark. Reuters-21578 test collection, together with its earlier variants, has been such text categorization (TC) throughout last 10 years. However, benefits that brought about somehow limited fact “carved” subsets out collection tested one these only; thus not readily comparable. In article, we present systematic, comparative experimental study three most popular among TC researchers. obtain us determine relative hardness subsets, establishing an indirect means have, or will be, subsets. © 2005 Wiley Periodicals, Inc.