Skyline queries in crowd-enabled databases

作者: Christoph Lofi , Kinda El Maarry , Wolf-Tilo Balke

DOI: 10.1145/2452376.2452431

关键词: TupleHeuristicsHeuristicDatabaseInformation extractionComputer scienceMissing dataSkylineData miningInformation integrationPersonalization

摘要: Skyline queries are a well-established technique for database query personalization and widely acclaimed their intuitive formulation mechanisms. However, when operating on incomplete datasets, skylines severely hampered often have to resort highly error-prone heuristics. Unfortunately, datasets frequent phenomenon, especially generated automatically using various information extraction or integration approaches. Here, the recent trend of crowd-enabled databases promises powerful solution: during execution, some operators can be dynamically outsourced human workers in exchange monetary compensation, therefore enabling elicitation missing values runtime. this feature heavily impacts response times (monetary) execution costs. In paper, we present an innovative hybrid approach combining dynamic crowd-sourcing with heuristic techniques order overcome current limitations. We will show that by assessing individual risk tuple poses respect overall result quality, efforts eliciting narrowly focused only those tuples may degenerate expected quality most strongly. This leads algorithm computing skyline sets data maximum while optimizing

参考文章(33)
Troyanskaya Olga, Cantor Michael, Shelock Gavin, Brown Pat, Hastie Trevor, Tibshirani Robert, Botstein David, None, Missing value estimation methods for DNA microarrays. Bioinformatics. ,vol. 17, pp. 520- 525 ,(2001) , 10.1093/BIOINFORMATICS/17.6.520
Ying Zhang, Wenjie Zhang, Xuemin Lin, Bin Jiang, Jian Pei, Ranking uncertain sky: The probabilistic top-k skyline operator Information Systems. ,vol. 36, pp. 898- 915 ,(2011) , 10.1016/J.IS.2011.03.008
Mauro Giavalisco, Lyman-Break Galaxies Annual Review of Astronomy and Astrophysics. ,vol. 40, pp. 579- 641 ,(2002) , 10.1146/ANNUREV.ASTRO.40.121301.111837
Dimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger, Progressive skyline computation in database systems international conference on management of data. ,vol. 30, pp. 41- 82 ,(2005) , 10.1145/1061318.1061320
Mohamed E. Khalefa, Mohamed F. Mokbel, Justin J. Levandoski, Skyline Query Processing for Incomplete Data 2008 IEEE 24th International Conference on Data Engineering. pp. 556- 565 ,(2008) , 10.1109/ICDE.2008.4497464
Maytal Saar-Tsechansky, Foster Provost, Handling Missing Values when Applying Classification Models Journal of Machine Learning Research. ,vol. 8, pp. 1623- 1657 ,(2007)
Parke Godfrey, Ryan Shipley, Jarek Gryz, Algorithms and analyses for maximal vector computation very large data bases. ,vol. 16, pp. 5- 28 ,(2007) , 10.1007/S00778-006-0029-7
Christoph Lofi, Joachim Selke, Wolf-Tilo Balke, Information Extraction Meets Crowdsourcing: A Promising Couple Datenbank-spektrum. ,vol. 12, pp. 109- 120 ,(2012) , 10.1007/S13222-012-0092-8
Joachim Selke, Christoph Lofi, Wolf-Tilo Balke, Pushing the boundaries of crowd-enabled databases with query-driven schema expansion Proceedings of the VLDB Endowment. ,vol. 5, pp. 538- 549 ,(2012) , 10.14778/2168651.2168655
Zoubin Ghahramani, Michael Jordan, None, Supervised learning from incomplete data via an EM approach neural information processing systems. ,vol. 6, pp. 120- 127 ,(1993)