A General Framework for Mining Massive Data Streams

作者: Pedro Domingos , Geoff Hulten

DOI: 10.1198/1061860032544

关键词:

摘要: In many domains, data now arrive faster than we are able to mine it. To avoid wasting these data, must switch from the traditional “one-shot” mining approach systems that continuous, high-volume, open-ended streams as they arrive. this article identify some desiderata for such systems, and outline our framework realizing them. A key property of is it minimizes time required build a model on stream while guaranteeing (as long iid) learned effectively indistinguishable one would be obtained using infinite data. Using framework, have successfully adapted several learning algorithms massive streams, including decision tree induction, Bayesian network learning, k-means clustering, EM algorithm mixtures Gaussians. These process order billions examples per day off-the-shelf hardware. Building this, currently develo...

参考文章(6)
Geoff Hulten, Pedro Domingos, A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering international conference on machine learning. pp. 106- 113 ,(2001)
Geoff Hulten, Laurie Spencer, Pedro Domingos, Mining time-changing data streams knowledge discovery and data mining. pp. 97- 106 ,(2001) , 10.1145/502512.502529
Pedro Domingos, Geoff Hulten, Mining high-speed data streams knowledge discovery and data mining. pp. 71- 80 ,(2000) , 10.1145/347090.347107
Geoff Hulten, Pedro Domingos, Mining complex models from arbitrarily large databases in constant time Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02. pp. 525- 531 ,(2002) , 10.1145/775047.775124
Geoff Hulten, Pedro Domingos, Learning from Infinite Data in Finite Time neural information processing systems. pp. 673- 680 ,(2001)
Wassily Hoeffding, Probability Inequalities for sums of Bounded Random Variables Journal of the American Statistical Association. ,vol. 58, pp. 13- 30 ,(1963) , 10.1007/978-1-4612-0865-5_26