Artificial Intelligence for IT Operations (AIOPS) Workshop White Paper.

作者: Odej Kao , Florian Schmidt , Jorge Cardoso , Alexander Acker , Sören Becker

DOI:

关键词:

摘要: Artificial Intelligence for IT Operations (AIOps) is an emerging interdisciplinary field arising in the intersection between research areas of machine learning, big data, streaming analytics, and management operations. AIOps, as a field, candidate to produce future standard operation management. To that end, AIOps has several challenges. First, it needs combine separate branches from other fields like software reliability engineering. Second, novel modelling techniques are needed understand dynamics different systems. Furthermore, requires lay out basis assessing: time horizons uncertainty imminent SLA violations, early detection problems, autonomous remediation, decision making, support various optimization objectives. Moreover, good understanding interpretability these aiding models important building trust employed tools domain experts. Finally, all this will result faster adoption further increase interest contribute bridging gap towards fully-autonomous operating The main aim AIOPS workshop bring together researchers both academia industry present their experiences, results, work progress field. The aims strengthen community unite goal joining efforts solving challenges currently facing. A consensus principles openness reproducibility boost area significantly.

参考文章(23)
Michael Stumm, Yongle Zhang, Xu Zhao, Ding Yuan, Guilherme Renna Rodrigues, Yu Luo, Xin Zhuang, Pranay U. Jain, Simple testing can prevent most critical failures: an analysis of production failures in distributed data-intensive systems operating systems design and implementation. pp. 249- 265 ,(2014) , 10.5555/2685048.2685068
Frank Doelitzscher, Martin Knahl, Christoph Reich, Nathan Clarke, Anomaly Detection in IaaS Clouds ieee international conference on cloud computing technology and science. ,vol. 1, pp. 387- 394 ,(2013) , 10.1109/CLOUDCOM.2013.57
Olivier Crameri, Nikola Knezevic, Dejan Kostic, Ricardo Bianchini, Willy Zwaenepoel, Staged deployment in mirage, an integrated software upgrade testing and distribution system Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles - SOSP '07. ,vol. 41, pp. 221- 236 ,(2007) , 10.1145/1294261.1294283
Shenglin Zhang, Ying Liu, Dan Pei, Yu Chen, Xianping Qu, Shimin Tao, Zhi Zang, Rapid and robust impact assessment of software changes in large internet-based services conference on emerging network experiment and technology. pp. 2- ,(2015) , 10.1145/2716281.2836087
Min Du, Feifei Li, Guineng Zheng, Vivek Srikumar, DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning computer and communications security. pp. 1285- 1298 ,(2017) , 10.1145/3133956.3134015
Kejiang Ye, Anomaly Detection in Clouds: Challenges and Practice architectural support for programming languages and operating systems. pp. 6- ,(2017) , 10.1145/3129457.3129497
Xu Zhang, Yong Xu, Qingwei Lin, Bo Qiao, Hongyu Zhang, Yingnong Dang, Chunyu Xie, Xinsheng Yang, Qian Cheng, Ze Li, Junjie Chen, Xiaoting He, Randolph Yao, Jian-Guang Lou, Murali Chintalapati, Furao Shen, Dongmei Zhang, Robust log-based anomaly detection on unstable log data foundations of software engineering. pp. 807- 817 ,(2019) , 10.1145/3338906.3338931
An Ran Chen, An empirical study on leveraging logs for debugging production failures international conference on software engineering. pp. 126- 128 ,(2019) , 10.1109/ICSE-COMPANION.2019.00055
Xiang Zhou, Xin Peng, Tao Xie, Jun Sun, Chao Ji, Dewei Liu, Qilin Xiang, Chuan He, Latent error prediction and fault localization for microservice applications by learning from system trace logs foundations of software engineering. pp. 683- 694 ,(2019) , 10.1145/3338906.3338961
Yue Yuan, Wenchang Shi, Bin Liang, Bo Qin, An Approach to Cloud Execution Failure Diagnosis Based on Exception Logs in OpenStack international conference on cloud computing. pp. 124- 131 ,(2019) , 10.1109/CLOUD.2019.00031