Data Curation with Deep Learning [Vision]: Towards Self Driving Data Curation.

作者: Mourad Ouzzani , Saravanan Thirumuruganathan , Nan Tang

DOI:

关键词: Data scienceVariety (cybernetics)Data curationArtificial intelligenceDeep learningData managementComputer scienceProcess (engineering)

摘要: Past. Data curation - the process of discovering, integrating, and cleaning data is one oldest management problems. Unfortunately, it still most time consuming least enjoyable work scientists. So far, successful stories are mainly ad-hoc solutions that either domain-specific (for example, ETL rules) or task-specific entity resolution). Present. The power current not keeping up with ever changing ecosystem in terms volume, velocity, variety veracity, due to high human cost, instead machine needed for providing mentioned above. Meanwhile, deep learning making strides achieving remarkable successes areas such as image recognition, natural language processing, speech recognition. This largely its ability understanding features neither nor task-specific. Future. need keep pace fast-changing ecosystem, where main hope devise domain-agnostic task-agnostic solutions. To this end, we start a new research project, called AutoDC, unleash potential towards self-driving curation. We will discuss how different concepts can be adapted extended solve various showcase some low-hanging fruits about early encounters between happening AutoDC. believe directions pointed out by only drive AutoDC democratizing curation, but also serve cornerstone researchers practitioners move realm

参考文章(54)
Geoffrey E. Hinton, Learning distributed representations of concepts. Clarendon Press/Oxford University Press. ,(1989)
Wenfei Fan, Zhe Fan, Chao Tian, Xin Luna Dong, Keys for graphs Proceedings of the VLDB Endowment. ,vol. 8, pp. 1590- 1601 ,(2015) , 10.14778/2824032.2824056
Victor Vianu, Serge Abiteboul, Richard Hull, Foundations of databases ,(1994)
Joan Bruna, Christian Szegedy, Ilya Sutskever, Ian Goodfellow, Wojciech Zaremba, Rob Fergus, Dumitru Erhan, None, Intriguing properties of neural networks arXiv: Computer Vision and Pattern Recognition. ,(2013)
Wenfei Fan, Xibei Jia, Jianzhong Li, Shuai Ma, Reasoning about record matching rules Proceedings of the VLDB Endowment. ,vol. 2, pp. 407- 418 ,(2009) , 10.14778/1687627.1687674
Zuhair Khayyat, Ihab F. Ilyas, Alekh Jindal, Samuel Madden, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiané-Ruiz, Nan Tang, Si Yin, BigDansing: A System for Big Data Cleansing international conference on management of data. pp. 1215- 1230 ,(2015) , 10.1145/2723372.2747646
Michele Dallachiesa, Amr Ebaid, Ahmed Eldawy, Ahmed Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, Nan Tang, NADEEF: a commodity data cleaning system international conference on management of data. pp. 541- 552 ,(2013) , 10.1145/2463676.2465327
, Generative Adversarial Nets neural information processing systems. ,vol. 27, pp. 2672- 2680 ,(2014) , 10.3156/JSOFT.29.5_177_2
Floris Geerts, Wenfei Fan, Foundations of Data Quality Management ,(2012)
Joseph M. Hellerstein, Vijayshankar Raman, Potter's Wheel: An Interactive Data Cleaning System very large data bases. pp. 381- 390 ,(2001)