A Bayes-true data generator for evaluation of supervised and unsupervised learning methods

作者: Janick V. Frasch , Aleksander Lodwich , Faisal Shafait , Thomas M. Breuel

DOI: 10.1016/J.PATREC.2011.04.010

关键词:

摘要: Benchmarking pattern recognition, machine learning and data mining methods commonly relies on real-world sets. However, there are some disadvantages in using data. On one hand collecting can become difficult or impossible for various reasons, the other variables hard to control, even problem domain; feature domain, where most statistical operate, exercising control is more hence rarely attempted. This at odds with scientific experimentation guidelines mandating use of as directly controllable observable possible. Because this, synthetic possesses certain advantages over In this paper we propose a method that produces guaranteed global class-specific properties. based overlapping class densities placed corners regular k-simplex. generator be used algorithm testing fair performance evaluation methods. strong properties researchers reproduce each others experiments by knowing parameters used, instead transmitting large

参考文章(21)
Alan F. Beardon, Algebra and Geometry ,(2005)
Christiaan M Van der Walt, E Barnard, Data characteristics that determine classifier performance ,(2006)
András Erik Csallner, Tibor Csendes, Mihály Csaba markót, Multisection in Interval Branch-and-Bound Methods for Global Optimization – I. Theoretical Results Journal of Global Optimization. ,vol. 16, pp. 371- 392 ,(2000) , 10.1023/A:1008354711345
Muriel Helmers, Horst Bunke, Generation and Use of Synthetic Training Data in Cursive Handwriting Recognition iberian conference on pattern recognition and image analysis. pp. 336- 345 ,(2003) , 10.1007/978-3-540-44871-6_39
Yannis Theodoridis, Jefferson R. O. Silva, Mario A. Nascimento, On the Generation of Spatiotemporal Datasets Lecture Notes in Computer Science. pp. 147- 164 ,(1999) , 10.1007/3-540-48482-5_11
R. D. KING, C. FENG, A. SUTHERLAND, STATLOG: COMPARISON OF CLASSIFICATION ALGORITHMS ON LARGE REAL-WORLD PROBLEMS Applied Artificial Intelligence. ,vol. 9, pp. 289- 333 ,(1995) , 10.1080/08839519508945477
Sergey Fomin, Andre Zelevinsky, Y -systems and generalized associahedra Annals of Mathematics. ,vol. 158, pp. 977- 1018 ,(2003) , 10.4007/ANNALS.2003.158.977
Daniel R. Jeske, Ryan Rich, Behrokh Samadi, Pengyue J. Lin, Lan Ye, Sean Cox, Rui Xiao, Ted Younglove, Minh Ly, Douglas Holt, Generation of synthetic data sets for evaluating the accuracy of knowledge discovery systems knowledge discovery and data mining. pp. 756- 762 ,(2005) , 10.1145/1081870.1081969