Skyline Queries over Incomplete Data - Error Models for Focused Crowd-Sourcing

作者: Christoph Lofi , Kinda El Maarry , Wolf-Tilo Balke

DOI: 10.1007/978-3-642-41924-9_25

关键词:

摘要: Skyline queries are a well-known technique for explorative retrieval, multi-objective optimization problems, and personalization tasks in databases. They widely acclaimed their intuitive query formulation mechanisms. However, when operating on incomplete datasets, skyline processing is severely hampered often has to resort error-prone heuristics. Unfortunately, datasets frequent phenomenon due widespread use of automated information extraction aggregation. In this paper, we evaluate compare various established heuristics adapting skylines focusing specifically the error they impose result. Building upon these results, argue improving result quality by employing crowd-enabled This allows dynamic outsourcing some database operators human workers, therefore enabling elicitation missing values during runtime. each crowd-sourcing operation will monetary runtime costs. Therefore, our main contribution introducing sophisticated model, allowing us concentrate those tuples that highly likely be error-prone, while relying safer tuples. focused strike perfect balance between costs result's quality.

参考文章(22)
David Martin Ward Powers, None, Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation arXiv: Learning. ,vol. 2, pp. 37- 63 ,(2011)
Riccardo Torlone, Paolo Ciaccia, Finding the Best when it's a Matter of Preference. SEBD. pp. 347- 360 ,(2002)
Joachim Selke, Christoph Lofi, Wolf-Tilo Balke, Highly scalable multiprocessing algorithms for preference-based database retrieval database systems for advanced applications. pp. 246- 260 ,(2010) , 10.1007/978-3-642-12098-5_19
Wolf-Tilo Balke, Jason Xin Zheng, Ulrich Güntzer, Approaching the efficient frontier: cooperative database retrieval using high-dimensional skylines database systems for advanced applications. pp. 410- 421 ,(2005) , 10.1007/11408079_37
Edgar Acuña, Caroline Rodriguez, The Treatment of Missing Values and its Effect on Classifier Accuracy Springer, Berlin, Heidelberg. pp. 639- 647 ,(2004) , 10.1007/978-3-642-17103-1_60
Parke Godfrey, Skyline Cardinality for Relational Processing foundations of information and knowledge systems. pp. 78- 97 ,(2004) , 10.1007/978-3-540-24627-5_7
Ilaria Bartolini, Paolo Ciaccia, Marco Patella, Efficient sort-based skyline evaluation ACM Transactions on Database Systems. ,vol. 33, pp. 1- 49 ,(2008) , 10.1145/1412331.1412343
Sungwoo Park, Taekyung Kim, Jonghyun Park, Jinha Kim, Hyeonseung Im, Parallel Skyline Computation on Multicore Architectures 2009 IEEE 25th International Conference on Data Engineering. pp. 760- 771 ,(2009) , 10.1109/ICDE.2009.42
Wolf Tilo Balke, Ulrich Güntzer, Wolf Siberski, Restricting skyline sizes using weak Pareto dominance Informatik - Forschung Und Entwicklung. ,vol. 21, pp. 165- 178 ,(2007) , 10.1007/S00450-007-0025-1
Christian Bizer, Tom Heath, Tim Berners-Lee, Linked Data - the story so far International Journal on Semantic Web and Information Systems. ,vol. 5, pp. 1- 22 ,(2009) , 10.4018/JSWIS.2009081901