Pooling Hybrid Representations for Web Structured Data Annotation.

作者: Bianca Zadrozny , Breno W. Carvalho , Luciano Barbosa

DOI:

关键词:

摘要: Automatically identifying data types of web structured is a key step in the process integration. Web usually associated with entities or objects particular domain. In this paper, we aim to map attributes an entity given domain pre-specified classes same based on their values. To perform task, propose hybrid deep learning network that relies format attributes' It does so without any pre-processing using pre-defined hand-crafted features. The combines sequence-based neural networks, namely convolutional networks (CNN) and recurrent (RNN), learn sequence structure CNN captures short-distance dependencies these sequences through sliding window approach, RNN long-distance by storing information previous characters. These create different vector representations input which are combined pooling layer. This layer applies specific operation vectors order capture most useful patterns for task. Finally, top layer, softmax function predicts label attribute value. We evaluate our strategy four domains. results show outperforms approaches, use some kind pre-processing, all

参考文章(21)
Nir Friedman, Daniel L. Koller, Probabilistic graphical models : principles and techniques The MIT Press. ,(2009)
Diederik P. Kingma, Jimmy Ba, Adam: A Method for Stochastic Optimization arXiv: Learning. ,(2014)
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
Eli Cortez, Altigran S. da Silva, Marcos André Gonçalves, Edleno S. de Moura, ONDUX Proceedings of the 2010 international conference on Management of data - SIGMOD '10. pp. 807- 818 ,(2010) , 10.1145/1807167.1807254
Michael J. Cafarella, Alon Halevy, Jayant Madhavan, Structured data on the web Communications of The ACM. ,vol. 54, pp. 72- 79 ,(2011) , 10.1145/1897816.1897839
Sepp Hochreiter, Jürgen Schmidhuber, Long short-term memory Neural Computation. ,vol. 9, pp. 1735- 1780 ,(1997) , 10.1162/NECO.1997.9.8.1735
Ilya Sutskever, Geoffrey Hinton, Alex Krizhevsky, Ruslan Salakhutdinov, Nitish Srivastava, Dropout: a simple way to prevent neural networks from overfitting Journal of Machine Learning Research. ,vol. 15, pp. 1929- 1958 ,(2014)
Wilson Hsieh, Jayant Madhavan, Rob Pike, Data management projects at Google international conference on management of data. ,vol. 37, pp. 725- 726 ,(2006) , 10.1145/1142473.1142566
L. Rabiner, B. Juang, An introduction to hidden Markov models IEEE ASSP Magazine. ,vol. 3, pp. 4- 16 ,(1986) , 10.1109/MASSP.1986.1165342
Michael J. Cafarella, Alon Halevy, Daisy Zhe Wang, Eugene Wu, Yang Zhang, WebTables: exploring the power of tables on the web very large data bases. ,vol. 1, pp. 538- 549 ,(2008) , 10.14778/1453856.1453916