Database Learning: Toward a Database that Becomes Smarter Every Time

作者: Michael Cafarella , Barzan Mozafari , Yongjoo Park , Ahmad Shahab Tajik

DOI: 10.1145/3035918.3064013

关键词:

摘要: In today's databases, previous query answers rarely benefit answering future queries. For the first time, to best of our knowledge, we change this paradigm in an approximate processing (AQP) context. We make following observation: answer each reveals some degree knowledge about another because their stem from same underlying distribution that has produced entire dataset. Exploiting and refining should allow us queries more analytically, rather than by reading enormous amounts raw data. Also, continuously enhance distribution, hence lead increasingly faster response times for call novel idea---learning past answers---Database Learning. exploit principle maximum entropy produce answers, which are expectation guaranteed be accurate existing sample-based approximations. Empowered idea, build a engine on top Spark SQL, called Verdict. conduct extensive experiments real-world traces large customer major database vendor. Our results demonstrate learning supports 73.7% these queries, speeding them up 23.0x accuracy level compared AQP systems.

参考文章(75)
Manas Joglekar, Hector Garcia-Molina, Aditya Parameswaran, Interactive data exploration with smart drill-down international conference on data engineering. ,vol. 2016, pp. 906- 917 ,(2016) , 10.1109/ICDE.2016.7498300
Sunita Sarawagi, User-Adaptive Exploration of Multidimensional Data. very large data bases. pp. 307- 316 ,(2000)
Miodrag Lovric, International encyclopedia of statistical science Springer. ,(2011)
Christopher M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics) Springer-Verlag New York, Inc.. ,(2006)
Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam Hruschka, Tom Mitchell, None, Toward an architecture for never-ending language learning national conference on artificial intelligence. pp. 1306- 1313 ,(2010)
Raghav Kaushik, Christopher Ré, Dan Suciu, None, General Database Statistics Using Entropy Maximization database programming languages. pp. 84- 99 ,(2009) , 10.1007/978-3-642-03793-1_6
Henry F. Korth, S. Sudarshan, Abraham Silberschatz, Database Systems Concepts McGraw-Hill, Inc.. ,(1997)
Phillip B. Gibbons, Viswanath Poosala, Swarup Acharya, Aqua: A Fast Decision Support Systems Using Approximate Query Answers very large data bases. pp. 754- 757 ,(1999)
Exploratory data analysis International Encyclopedia of Statistical Science. pp. 486- 488 ,(2011) , 10.1007/978-3-642-04898-2
J. Considine, F. Li, G. Kollios, J. Byers, Approximate aggregation techniques for sensor databases international conference on data engineering. pp. 449- 460 ,(2004) , 10.1109/ICDE.2004.1320018