BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees

作者： Yongjoo Park , Jingyi Qing , Xiaoyang Shen , Barzan Mozafari

关键词: Poisson regression 、 Probabilistic logic 、 Speedup 、 Sampling (statistics) 、 Entropy (energy dispersal) 、 Entropy (arrow of time) 、 Entropy (information theory) 、 Linear regression 、 Logistic regression 、 Computer science 、 Algorithm 、 Entropy (statistical thermodynamics) 、 Generalized linear model 、 Maximum likelihood 、 Entropy (classical thermodynamics) 、 Entropy (order and disorder) 、 Hyperparameter

摘要: The rising volume of datasets has made training machine learning (ML) models a major computational cost in the enterprise. Given iterative nature model and parameter tuning, many analysts use small sample their entire data during initial stage analysis to make quick decisions (e.g., what features or hyperparameters use) dataset only later stages (i.e., when they have converged specific model). This sampling, however, is performed an ad-hoc fashion. Most practitioners cannot precisely capture effect sampling on quality model, eventually decision-making process tuning phase. Moreover, without systematic support for operators, optimizations reuse opportunities are lost. In this paper, we introduce BlinkML, system fast, quality-guaranteed ML training. BlinkML allows users error-computation tradeoffs: instead full model), can quickly train approximate with guarantees using sample. ensure that, high probability, makes same predictions as model. currently supports any that relies maximum likelihood estimation (MLE), which includes Generalized Linear Models linear regression, logistic max entropy classifier, Poisson regression) well PPCA (Probabilistic Principal Component Analysis). Our experiments show speed up large-scale tasks by 6.26x-629x while guaranteeing predictions, 95%

arxiv.org 本地加速

arxiv.org PDF 下载加速

sci-hub.se PDF 下载加速

参考文章(102)

Matthew D. Zeiler, ADADELTA: An Adaptive Learning Rate Method arXiv: Learning. ,(2012)

Bal zs K gl, R mi Bardenet, R mi Bardenet, M ty s Brendel, Mich le Sebag, Collaborative hyperparameter tuning international conference on machine learning. ,vol. 28, pp. 199- 207 ,(2013)

Allen Y. Yang, Yi Ma, S. Shankar Sastry, John Wright, Feature Selection in Face Recognition: A Sparse Representation Perspective ,(2007)

Kohei Ogawa, Yoshiki Suzuki, Ichiro Takeuchi, Safe Screening of Non-Support Vectors in Pathwise SVM Computation international conference on machine learning. pp. 1382- 1390 ,(2013)

Hugo Larochelle, Ruslan Salakhutdinov, Efficient Learning of Deep Boltzmann Machines international conference on artificial intelligence and statistics. pp. 693- 700 ,(2010)

James Martens, Deep learning via Hessian-free optimization international conference on machine learning. pp. 735- 742 ,(2010)

Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization arXiv: Learning. ,(2014)

S. Sathiya Keerthi, Chih-Jen Lin, Ruby C. Weng, Trust Region Newton Method for Logistic Regression Journal of Machine Learning Research. ,vol. 9, pp. 627- 650 ,(2008) , 10.1145/1390681.1390703

Christopher M. Bishop, Pattern Recognition and Machine Learning ,(2006)

10.

Ali Ghodsi, Peter Bailis, Joseph M. Hellerstein, Ion Stoica, Joseph E. Gonzalez, Michael I. Jordan, Michael J. Franklin, Asynchronous Complex Analytics in a Distributed Dataflow Architecture arXiv: Databases. ,(2015)

BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees

来源期刊

我的账户

BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees

来源期刊

相似文章 4

Computationally efficient univariate filtering for massive data.

Learning Models over Relational Data: A Brief Tutorial.

Interactive Summarization of Large Document Collections

Efficient Join Synopsis Maintenance for Data Warehouse

我的账户