作者: Tao Xie , Hucheng Zhou , Tian Xiao , Haoxiang Lin , Sihan Li
关键词:
摘要: SCOPE is adopted by thousands of developers from tens different product teams in Microsoft Bing for daily web-scale data processing, including index building, search ranking and advertisement display. A job composed declarative SQL-like queries imperative C# user-defined functions (UDFs), which are executed pipeline machines. There jobs on clusters per day, while some them fail after a long execution time thus waste tremendous resources. Reducing failures would save significant This paper presents comprehensive characteristic study 200 failures/fixes 50 with debugging statistics Bing, investigating not only major failure types, sources, fixes, but also current practice. Our findings include (1) most the (84.5%) caused defects processing rather than code logic; (2) table-level (22.5%) mainly programmers mistakes frequent schema changes row-level (62%) exceptional data; (3) 93.0% fixes do change (4) there 8.0% root cause at failure-exposing stage, making practice insufficient this case. results provide valuable guidelines future development data-parallel programs. We believe that these limited to SCOPE, can be generalized other similar platforms.