Making use of category structure for multi-class classification

作者: Hieu Quang Le

DOI:

关键词:

摘要: Multi-class classification is the task of organizing data samples into multiple predefined categories. In this thesis, we address two different research problems multi-class classification, one specific and other general. The first problem to categorize structured sources on Web. While prior works use all features, once extracted from search interfaces, further refine feature set. our approach, only text content interfaces. We choose a subset which suited classify web sources, by selection technique with new metric scheme. Using aggressive together Support Vector Machine categorizer, obtained high performance in an evaluation over real data. second general develop multi-label algorithm. problem, sample can be assigned or more Given m categories, commonly used One-Vs-All (OVA) approach transforms independent binary classifications between each category rest (the category's complement). Based OVA propose method named Multi-Pair (MP). This MP decomposes smaller easier pair comparisons complement. Furthermore, incorporate SCutFBR.1 thresholding strategy method. experiments three benchmark collections, outperforms both cases without SCutFBR.1. A common aspect that make structure methods. distinguishes researches.

参考文章(36)
Hieu Quang Le, Stefan Conrad, CLASSIFYING STRUCTURED WEB SOURCES USING AGGRESSIVE FEATURE SELECTION international conference on web information systems and technologies. pp. 613- 620 ,(2009)
Kevin Chen Chuan Chang, Zhen Zhang, Bin He, Toward large scale integration: Building a MetaQuerier over databases on the Web conference on innovative data systems research. pp. 44- 55 ,(2005)
Hieu Quang Le, Stefan Conrad, Classifying Structured Web Sources Using Support Vector Machine and Aggressive Feature Selection international conference on web information systems and technologies. pp. 270- 282 ,(2009) , 10.1007/978-3-642-12436-5_20
Vipin Kumar, Pang-Ning Tan, Michael M. Steinbach, Introduction to Data Mining ,(2013)
Thorsten Joachims, Making large scale SVM learning practical Technical reports. ,(1999) , 10.17877/DE290R-14262
Dunja Mladenić, Feature subset selection in text-learning european conference on machine learning. pp. 95- 100 ,(1998) , 10.1007/BFB0026677
r;ribeiro-neto bueza-yates (b), Modern Information Retrieval ,(1999)
T. G. Dietterich, G. Bakiri, Solving multiclass learning problems via error-correcting output codes Journal of Artificial Intelligence Research. ,vol. 2, pp. 263- 286 ,(1994) , 10.1613/JAIR.105
Christopher K I Williams, Carl Edward Rasmussen, Gaussian Processes for Machine Learning ,(2005)