Online Machine Learning in Big Data Streams.

作者: Róbert Pálovics , András A. Benczúr , Levente Kocsis

DOI:

关键词:

摘要: The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from with only a limited possibility to store past data. first requirement mostly concerns software architectures efficient algorithms. second one also imposes nontrivial theoretical restrictions on the modeling methods: In stream model, older is no longer available revise earlier suboptimal decisions as fresh arrives. In this article, we provide an overview libraries well models for learning. We highlight most important ideas classification, regression, recommendation, unsupervised streaming data, show how they implemented various processing systems. This article reference material not survey. do attempt be comprehensive describing all existing methods solutions; rather, give pointers resources field. All related sub-fields, algorithms, learning, hugely dominant current research development conceptually new results components emerging at time writing. refer several survey results, both Compared surveys, our different because discuss recommender systems extended detail.

参考文章(172)
Gerald Tesauro, TD-Gammon: A Self-Teaching Backgammon Program Springer, Boston, MA. pp. 267- 285 ,(1995) , 10.1007/978-1-4757-2379-3_11
João Gama, Pedro Pereira Rodrigues, João Pedro Pedroso, ODAC: Hierarchical Clustering of Time Series Data Streams. siam international conference on data mining. pp. 499- 503 ,(2006)
Gianmarco De Francisci Morales, SAMOA: a platform for mining big data streams the web conference. pp. 777- 778 ,(2013) , 10.1145/2487788.2488042
Usama Fayyad, Cory Reina, P. S. Bradley, Scaling clustering algorithms to large databases knowledge discovery and data mining. pp. 9- 15 ,(1998)
Aleksandar Lazarevic, Zoran Obradovic, Boosting Algorithms for Parallel and Distributed Learning Distributed and Parallel Databases. ,vol. 11, pp. 203- 229 ,(2002) , 10.1023/A:1013992203485
Kevin Canini, Lei Shi, Thomas Griffiths, Online Inference of Topics with Latent Dirichlet Allocation international conference on artificial intelligence and statistics. ,vol. 5, pp. 65- 72 ,(2009)
Martin Ester, Aoying Zhou, Weining Qian, Feng Cao, Density-Based Clustering over an Evolving Data Stream with Noise. siam international conference on data mining. pp. 328- 339 ,(2006)
Ted Dunning, Ellen Friedman, Robin Anil, Sean Owen, Mahout in Action ,(2011)
Massimo Quadrana, Albert Bifet, Ricard Gavaldà, An efficient closed frequent itemset miner for the MOA stream mining system Ai Communications. ,vol. 28, pp. 143- 158 ,(2015) , 10.3233/AIC-140615