Online Censoring for Large-Scale Regressions with Application to Streaming Big Data

作者: Dimitris Berberidis , Vassilis Kekatos , Georgios B. Giannakis

DOI: 10.1109/TSP.2016.2546225

关键词: Statistical inferenceApproximation algorithmEstimatorRandom projectionComputer scienceDimensionality reductionCensoring (statistics)Censoring (clinical trials)Estimation theoryMathematical optimizationStochastic approximationMaximum likelihoodOnline algorithmLinear regression

摘要: On par with data-intensive applications, the sheer size of modern linear regression problems creates an ever-growing demand for efficient solvers. Fortunately, a significant percentage data accrued can be omitted while maintaining certain quality statistical inference affordable computational budget. This work introduces means identifying and omitting less informative observations in online data-adaptive fashion. Given streaming data, related maximum-likelihood estimator is sequentially found using first- second-order stochastic approximation algorithms. These schemes are well suited when inherently censored or aim to save communication overhead decentralized learning setups. In different operational scenario, task joint censoring estimation put forth solve large-scale regressions centralized setup. Novel algorithms developed enjoying simple closed-form updates provable (non)asymptotic convergence guarantees. To attain desired patterns levels dimensionality reduction, thresholding rules investigated too. Numerical tests on real synthetic datasets corroborate efficacy proposed methods compared data-agnostic random projection-based alternatives.

参考文章(39)
Dimitri P. Bertsekas, Convex Optimization Algorithms ,(2015)
Robert Tibshirani, Trevor Hastie, Jerome H. Friedman, The Elements of Statistical Learning ,(2001)
Olivier Cappé, Aurélien Garivier, Emilie Kaufmann, On the complexity of best-arm identification in multi-armed bandit models Journal of Machine Learning Research. ,vol. 17, pp. 1- 42 ,(2016)
Garvesh Raskutti, Michael Mahoney, A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares arXiv: Machine Learning. ,(2014)
Michael W. Mahoney, Algorithmic and Statistical Perspectives on Large-Scale Data Analysis arXiv: Data Structures and Algorithms. ,(2010)
Jian Sun, Yongjian Cai, Gang Wang, Jie Chen, Power Scheduling of Kalman Filtering in Wireless Sensor Networks with Data Packet Drops arXiv: Systems and Control. ,(2013)
Sailes K. Sengijpta, Fundamentals of Statistical Signal Processing: Estimation Theory Technometrics. ,vol. 37, pp. 465- 466 ,(1995) , 10.1080/00401706.1995.10484391
Ameya Agaskar, Chuang Wang, Yue M. Lu, Randomized Kaczmarz algorithms: Exact MSE analysis and optimal sampling probabilities ieee global conference on signal and information processing. pp. 389- 393 ,(2014) , 10.1109/GLOBALSIP.2014.7032145
Takeshi Amemiya, Tobit models: A survey Journal of Econometrics. ,vol. 24, pp. 3- 61 ,(1984) , 10.1016/0304-4076(84)90074-5