作者: Michael Cafarella , Barzan Mozafari , Yongjoo Park , Ahmad Shahab Tajik
关键词:
摘要: In today's databases, previous query answers rarely benefit answering future queries. For the first time, to best of our knowledge, we change this paradigm in an approximate processing (AQP) context. We make following observation: answer each reveals some degree knowledge about another because their stem from same underlying distribution that has produced entire dataset. Exploiting and refining should allow us queries more analytically, rather than by reading enormous amounts raw data. Also, continuously enhance distribution, hence lead increasingly faster response times for call novel idea---learning past answers---Database Learning. exploit principle maximum entropy produce answers, which are expectation guaranteed be accurate existing sample-based approximations. Empowered idea, build a engine on top Spark SQL, called Verdict. conduct extensive experiments real-world traces large customer major database vendor. Our results demonstrate learning supports 73.7% these queries, speeding them up 23.0x accuracy level compared AQP systems.