作者: Hieu Quang Le
DOI:
关键词:
摘要: Multi-class classification is the task of organizing data samples into multiple predefined categories. In this thesis, we address two different research problems multi-class classification, one specific and other general. The first problem to categorize structured sources on Web. While prior works use all features, once extracted from search interfaces, further refine feature set. our approach, only text content interfaces. We choose a subset which suited classify web sources, by selection technique with new metric scheme. Using aggressive together Support Vector Machine categorizer, obtained high performance in an evaluation over real data. second general develop multi-label algorithm. problem, sample can be assigned or more Given m categories, commonly used One-Vs-All (OVA) approach transforms independent binary classifications between each category rest (the category's complement). Based OVA propose method named Multi-Pair (MP). This MP decomposes smaller easier pair comparisons complement. Furthermore, incorporate SCutFBR.1 thresholding strategy method. experiments three benchmark collections, outperforms both cases without SCutFBR.1. A common aspect that make structure methods. distinguishes researches.