Classification in Sparse, High Dimensional Environments Applied to Distributed Systems Failure Prediction

作者: José M Navarro , Hugo A Parada Parada G , Juan C Dueñas , None

DOI: 10.1007/978-3-319-19369-4_63

关键词: Elastic net regularizationReliability (computer networking)Artificial intelligenceDistributed computingRare eventsHyperparameterSystems managementStability (learning theory)Machine learningAsset (computer security)Key (cryptography)Computer science

摘要: Network failures are still one of the main causes distributed systems’ lack reliability. To overcome this problem we present an improvement over a failure prediction system, based on Elastic Net Logistic Regression and application rare events techniques, able to work with sparse, high dimensional datasets. Specifically, prove its stability, fine tune hyperparameter improve industrial utility by showing that, slight change in dataset creation, it can also predict location failure, key asset when trying take proactive approach management.

参考文章(26)
Trevor J. Hastie, Kenneth W. Church, Ping Li, Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data neural information processing systems. ,vol. 19, pp. 873- 880 ,(2006)
Bianca Schroeder, Garth A. Gibson, A Large-Scale Study of Failures in High-Performance Computing Systems IEEE Transactions on Dependable and Secure Computing. ,vol. 7, pp. 337- 351 ,(2010) , 10.1109/TDSC.2009.4
Derrick Kondo, Artur Andrzejak, David P. Anderson, On correlated availability in Internet-distributed systems grid computing. pp. 276- 283 ,(2008) , 10.1109/GRID.2008.4662809
Song Fu, Cheng-Zhong Xu, Exploring event correlation for failure prediction in coalitions of clusters Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07. pp. 41- ,(2007) , 10.1145/1362622.1362678
Ziming Zheng, Zhiling Lan, Byung H. Park, Al Geist, System log pre-processing to improve failure prediction dependable systems and networks. pp. 572- 577 ,(2009) , 10.1109/DSN.2009.5270289
Felix Salfner, Maren Lenk, Miroslaw Malek, A survey of online failure prediction methods ACM Computing Surveys. ,vol. 42, pp. 10- ,(2010) , 10.1145/1670679.1670680
Qiang Guan, Ziming Zhang, Song Fu, A Failure Detection and Prediction Mechanism for Enhancing Dependability of Data Centers International Journal of Computer Theory and Engineering. pp. 726- 730 ,(2012) , 10.7763/IJCTE.2012.V4.566
Gary King, Langche Zeng, Logistic Regression in Rare Events Data Political Analysis. ,vol. 9, pp. 137- 163 ,(2001) , 10.1093/OXFORDJOURNALS.PAN.A004868
David S. Matteson, Nicholas A. James, Arun Kejariwal, Leveraging Cloud Data to Mitigate User Experience from "Breaking Bad" arXiv: Methodology. ,(2014)
Andrew S. Tanenbaum, Maarten Van Steen, Distributed Systems: Principles and Paradigms ,(2001)