作者: Ali Imran Jehangiri
DOI:
关键词:
摘要: An increasing number of applications are being hosted on cloud based platforms. Cloud platforms serving as a general computing facility and these range from simple multi-tier web to complex social networking, eCommerce Big Data applications. High availability, performance auto-scaling key requirements serve using dynamic provisioning resources in on-demand, multi-tenant fashion. A challenge for service providers is ensure the Quality Service (QoS), user / customer requires more explicit guarantees QoS services. problems can directly lead extensive financial loses. Thus, control verification become vital concern any production level deployment. Therefore, it crucial address managed objective. The success services depends critically automated problem diagnostics predictive analytics enabling organizations manage their proactively. Moreover, effective advance monitoring equally important management support clouds. In this thesis, we explore techniques developing systems achieve robust systems. At first, two case studies presented motivation need scalable framework. It includes study issues software service, which virtualized platform. second study, analyzed that offered by large IT provider. A generalization forms basis requirement specifications used state-of-the-art analysis. Although, some solutions particular challenges have already been provided, approach diagnosis prediction still missing. For addressing issue, distributed framework first part thesis. We conducted thorough analysis technologies be our makes use existing technologies. However, develop custom collectors retrieve data non-intrusively different layers cloud. addition, subscriber publisher components related events APIs sends alerts SLA Management component taking corrective measures. Further, implemented an Open Computing Interface (OCCI) extension OCCI Mixin mechanism. To deal with diagnosis, novel parallel anomaly detection presented. First all anomalous metrics found database time-series window. comparative three light-weight statistical selected. extend work MapReduce paradigm assess compare methods terms precision, recall, execution time, speedup scale up. Next, correlate target SLO order locate suspicious metrics. evaluated encompassing Infrastructure (IaaS) Platform (PaaS) models. Experimental results confirm efficient capturing causing anomalies. Finally, present design implementation online system infrastructures. further experimental evaluation set aim at predicting upcoming periods high utilization or poor enough time enable appropriate scheduling, scaling, migration virtual resources. Using real sets gathered university center, several approaches ranging (e.g. auto regression (AR)) classification Bayesian classifier). observe linear models, especially AR most likely suitable model measures forecast future values. models integrated Machine Learning (ML) improve proactive management.