Approximating Aggregated SQL Queries With LSTM Networks.

作者: Asaf Shabtai , Lior Rokach , Nir Regev

DOI:

关键词:

摘要: Despite continuous investments in data technologies, the latency of querying still poses a significant challenge. Modern analytic solutions require near real-time responsiveness both to make them interactive and support automated processing. Current technologies (Hadoop, Spark, Dataflow) scan dataset execute queries. They focus on providing scalable storage maximize task execution speed. We argue that these fail offer an adequate level interactivity since they depend continual access data. In this paper we present method for query approximation, also known as approximate processing (AQP), reduce need during inference (query calculation), thus enabling rapid tool. use LSTM network learn relationship between queries their results, provide layer predicting results. Our (referred ``Hunch``) produces lightweight which provides high throughput. evaluated our using 12 datasets. The results show predicted queries' with normalized root mean squared error (NRMSE) ranging from approximately 1\% 4\%. Moreover, was able predict up 120,000 second (streamed together), single no more than 2ms.

参考文章(24)
Charles Elkan, Zachary C. Lipton, John Berkowitz, A Critical Review of Recurrent Neural Networks for Sequence Learning arXiv: Learning. ,(2015)
Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, Ion Stoica, Discretized streams: fault-tolerant streaming computation at scale symposium on operating systems principles. pp. 423- 438 ,(2013) , 10.1145/2517349.2522737
Sameer Agarwal, Henry Milner, Ariel Kleiner, Ameet Talwalkar, Michael Jordan, Samuel Madden, Barzan Mozafari, Ion Stoica, Knowing when you're wrong: building fast and reliable approximate query processing systems international conference on management of data. pp. 481- 492 ,(2014) , 10.1145/2588555.2593667
Kai Zeng, Sameer Agarwal, Ankur Dave, Michael Armbrust, Ion Stoica, G-OLA: Generalized On-Line Aggregation for Interactive Analysis on Big Data international conference on management of data. pp. 913- 918 ,(2015) , 10.1145/2723372.2735381
Chris Jermaine, Minos Garofalakis, Peter J. Haas, Graham Cormode, Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches ,(2012)
Sameer Agarwal, Anand P. Iyer, Aurojit Panda, Samuel Madden, Barzan Mozafari, Ion Stoica, Blink and it's done Proceedings of the VLDB Endowment. ,vol. 5, pp. 1902- 1905 ,(2012) , 10.14778/2367502.2367533
Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, Ion Stoica, BlinkDB Proceedings of the 8th ACM European Conference on Computer Systems - EuroSys '13. pp. 29- 42 ,(2013) , 10.1145/2465351.2465355
Rajeev Motwani, Mayur Datar, Brian Babcock, Sampling from a moving window over streaming data symposium on discrete algorithms. pp. 633- 634 ,(2002) , 10.5555/545381.545465
Niranjan Kamat, Prasanth Jayachandran, Karthik Tunga, Arnab Nandi, Distributed and interactive cube exploration international conference on data engineering. pp. 472- 483 ,(2014) , 10.1109/ICDE.2014.6816674
Amitabha Bagchi, Amitabh Chaudhary, David Eppstein, Michael T. Goodrich, Deterministic sampling and range counting in geometric data streams ACM Transactions on Algorithms. ,vol. 3, pp. 16- ,(2007) , 10.1145/1240233.1240239