Missing Data Recovery in Large-Scale, Sparse Datacenter Traces: An Alibaba Case Study

作者: Yi Liang , Linfeng Bi , Xing Su

DOI: 10.1109/CCGRID.2019.00039

关键词:

摘要: The trace analysis for datacenter holds a prominent importance the performance optimization. However, due to error and low execution priority of collection tasks, modern traces suffer from serious data missing problem. Previous works handle recovery via statistical imputation methods. such methods either recover with fixed values or require users decide relationship model among attributes, which are not feasible accurate when dealing two trends in traces: sparsity complex correlations attributes. To this end, we focus on released by Alibaba propose tensor-based facilitate efficient large-scale, sparse traces. proposed consists main phases. First, discretization attribute selection work together select attributes strong value-missing attribute. Then, tensor is constructed recovered employing CANDECOMP/PARAFAC decomposition-based completion method. experimental results demonstrate that our achieves higher accuracy than six machine learning-based

参考文章(23)
Feng Honghai, Chen Guoshun, Yin Cheng, Yang Bingru, Chen Yumei, A SVM Regression Based Approach to Filling in Missing Values Lecture Notes in Computer Science. pp. 581- 587 ,(2005) , 10.1007/11553939_83
James Dougherty, Ron Kohavi, Mehran Sahami, Supervised and Unsupervised Discretization of Continuous Features Machine Learning Proceedings 1995. pp. 194- 202 ,(1995) , 10.1016/B978-1-55860-377-6.50032-3
Evrim Acar, Daniel M. Dunlavy, Tamara G. Kolda, Morten Mørup, Scalable tensor factorizations for incomplete data Chemometrics and Intelligent Laboratory Systems. ,vol. 106, pp. 41- 56 ,(2011) , 10.1016/J.CHEMOLAB.2010.08.004
Esther-Lydia Silva-Ramírez, Rafael Pino-Mejías, Manuel López-Coello, María-Dolores Cubiles-de-la-Vega, Missing value imputation on missing completely at random data using multilayer perceptrons Neural Networks. ,vol. 24, pp. 121- 129 ,(2011) , 10.1016/J.NEUNET.2010.09.008
Tamara G. Kolda, Brett W. Bader, Tensor Decompositions and Applications Siam Review. ,vol. 51, pp. 455- 500 ,(2009) , 10.1137/07070111X
Phimmarin Keerin, Werasak Kurutach, Tossapon Boongoen, Cluster-based KNN missing value imputation for DNA microarray data 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC). pp. 445- 450 ,(2012) , 10.1109/ICSMC.2012.6377764
Zujie Ren, Xianghua Xu, Jian Wan, Weisong Shi, Min Zhou, Workload characterization on a production Hadoop cluster: A case study on Taobao ieee international symposium on workload characterization. pp. 3- 13 ,(2012) , 10.1109/IISWC.2012.6402895
Mosharaf Chowdhury, Ion Stoica, Efficient Coflow Scheduling Without Prior Knowledge acm special interest group on data communication. ,vol. 45, pp. 393- 406 ,(2015) , 10.1145/2785956.2787480
Charles Reiss, Alexey Tumanov, Gregory R. Ganger, Randy H. Katz, Michael A. Kozuch, Heterogeneity and dynamicity of clouds at scale Proceedings of the Third ACM Symposium on Cloud Computing - SoCC '12. pp. 7- ,(2012) , 10.1145/2391229.2391236
Tamara Gibson Kolda, Multilinear operators for higher-order decompositions Sandia National Laboratories. ,(2006) , 10.2172/923081