作者: Jay Lepreau , Peter Hoogenboom
DOI:
关键词: Data mining 、 Expert system 、 Unix 、 Task (computing) 、 Workload 、 Process (computing) 、 Real-time computing 、 Scalability 、 Host (network) 、 Workstation 、 Computer science
摘要: Computer systems require monitoring to detect performance anomalies such as runaway processes, but problem detection and diagnosis is a complex task requiring skilled attention. Although human attention was never ideal for this task, networks of computers grow larger their interactions more complex, it falls far short. Existing computer-aided management the administrator manually specify fixed "trouble" thresholds. In paper we report on an expert system that automatically sets thresholds, detects diagnoses problems network Unix computers. Key success scalability are time series models developed model variations in workload each host. Analysis load average records 50 machines yielded which show, workstations with simulated injection, false positive negative rates less than 1%. The server most difficult still gave positive/negative only 6%/32%. Observed values exceeding expected range particular host cause focus machine. There applies tools finer resolution discrimination, including per-command profiles gleaned from process accounting records. It makes one 18 specific notifies administrator, optionally user [a].