Variance Reduction in Training Forecasting Models with Subgroup Sampling.

作者: Christopher De Sa , Yuyang Wang , Dean Foster , Youngsuk Park , Yucheng Lu

DOI:

关键词:

摘要: In real-world applications of large-scale time series, one often encounters the situation where temporal patterns while drifting over time, differ from another in same dataset. this paper, we provably show under such heterogeneity, training a forecasting model with commonly used stochastic optimizers (e.g. SGD) potentially suffers large gradient variance, and thus requires long training. To alleviate issue, propose sampling strategy named Subgroup Sampling, which mitigates variance via pre-grouped series. We further introduce SCott, reduced SGD-style optimizer that co-designs subgroup control variate method. theory, provide convergence guarantee SCott on smooth non-convex objectives. Empirically, evaluate other baseline both synthetic series problems, converges faster respect to iterations wall clock time. Additionally, two variants can speed up Adam Adagrad without compromising generalization models.

参考文章(45)
Léon Bottou, Large-Scale Machine Learning with Stochastic Gradient Descent Proceedings of COMPSTAT'2010. pp. 177- 186 ,(2010) , 10.1007/978-3-7908-2604-3_16
Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization arXiv: Learning. ,(2014)
Tong Zhang, Solving large scale linear prediction problems using stochastic gradient descent algorithms international conference on machine learning. pp. 116- ,(2004) , 10.1145/1015330.1015332
T. Warren Liao, Clustering of time series data-a survey Pattern Recognition. ,vol. 38, pp. 1857- 1874 ,(2005) , 10.1016/J.PATCOG.2005.01.025
João Gama, Indrė Žliobaitė, Albert Bifet, Mykola Pechenizkiy, Abdelhamid Bouchachia, A survey on concept drift adaptation ACM Computing Surveys. ,vol. 46, pp. 44- ,(2014) , 10.1145/2523813
Rie Johnson, Tong Zhang, Accelerating Stochastic Gradient Descent using Predictive Variance Reduction neural information processing systems. ,vol. 26, pp. 315- 323 ,(2013)
Alysha M. De Livera, Rob J. Hyndman, Ralph D. Snyder, Forecasting time series with complex seasonal patterns using exponential smoothing Journal of the American Statistical Association. ,vol. 106, pp. 1513- 1527 ,(2011) , 10.1198/JASA.2011.TM09771
Barry L. Nelson, Control Variate Remedies Operations Research. ,vol. 38, pp. 974- 992 ,(1990) , 10.1287/OPRE.38.6.974
Sofiane Brahim-Belhouari, Amine Bermak, Gaussian process for nonstationary time series prediction Computational Statistics & Data Analysis. ,vol. 47, pp. 705- 712 ,(2004) , 10.1016/J.CSDA.2004.02.006