A Hybrid Approach for Automatic Model Recommendation

作者: Roman Vainshtein , Asnat Greenstein-Messica , Gilad Katz , Bracha Shapira , Lior Rokach

DOI: 10.1145/3269206.3269299

关键词:

摘要: One of the challenges automating machine learning applications is automatic selection an algorithmic model for a given problem. We present AutoDi, novel and resource-efficient approach selection. Our combines two sources information: metafeatures extracted from data itself word-embedding features large corpus academic publications. This hybrid enables AutoDi to select top-performing algorithms both widely rarely used datasets by utilizing its types feature sets. demonstrate effectiveness our proposed on dataset 119 179 classification grouped into 17 families. show that can reach average 98.8% optimal accuracy algorithm in 49.5% all cases.

参考文章(12)
Pavel Brazdil, Ricardo Vilalta, Carlos Soares, Christophe Giraud-Carrier, Metalearning: Applications to Data Mining ,(2008)
Matthew Wiener, Andy Liaw, Classification and Regression by randomForest ,(2007)
Hilan Bensusan, Bernhard Pfahringer, Christophe G. Giraud-Carrier, Meta-Learning by Landmarking Various Learning Algorithms international conference on machine learning. pp. 743- 750 ,(2000)
Chris Thornton, Frank Hutter, Holger H. Hoos, Kevin Leyton-Brown, Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms knowledge discovery and data mining. pp. 847- 855 ,(2013) , 10.1145/2487575.2487629
Pavel B. Brazdil, Carlos Soares, Joaquim Pinto da Costa, Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results Machine Learning. ,vol. 50, pp. 251- 277 ,(2003) , 10.1023/A:1021713901879
Jeffrey Pennington, Richard Socher, Christopher Manning, Glove: Global Vectors for Word Representation empirical methods in natural language processing. pp. 1532- 1543 ,(2014) , 10.3115/V1/D14-1162
Fábio Pinto, Carlos Soares, João Mendes-Moreira, Towards Automatic Generation of Metafeatures pacific-asia conference on knowledge discovery and data mining. pp. 215- 226 ,(2016) , 10.1007/978-3-319-31753-3_18
Gilad Katz, Eui Chul Richard Shin, Dawn Song, ExploreKit: Automatic Feature Generation and Selection 2016 IEEE 16th International Conference on Data Mining (ICDM). pp. 979- 984 ,(2016) , 10.1109/ICDM.2016.0123
Thomas Swearingen, Will Drevo, Bennett Cyphers, Alfredo Cuesta-Infante, Arun Ross, Kalyan Veeramachaneni, ATM: A distributed, collaborative, scalable system for automated machine learning 2017 IEEE International Conference on Big Data (Big Data). pp. 151- 162 ,(2017) , 10.1109/BIGDATA.2017.8257923
Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean, None, Efficient Estimation of Word Representations in Vector Space arXiv: Computation and Language. ,(2013)