摘要: In many domains, data now arrive faster than we are able to mine it. To avoid wasting these data, must switch from the traditional “one-shot” mining approach systems that continuous, high-volume, open-ended streams as they arrive. this article identify some desiderata for such systems, and outline our framework realizing them. A key property of is it minimizes time required build a model on stream while guaranteeing (as long iid) learned effectively indistinguishable one would be obtained using infinite data. Using framework, have successfully adapted several learning algorithms massive streams, including decision tree induction, Bayesian network learning, k-means clustering, EM algorithm mixtures Gaussians. These process order billions examples per day off-the-shelf hardware. Building this, currently develo...