作者: Bianca Zadrozny , Breno W. Carvalho , Luciano Barbosa
DOI:
关键词:
摘要: Automatically identifying data types of web structured is a key step in the process integration. Web usually associated with entities or objects particular domain. In this paper, we aim to map attributes an entity given domain pre-specified classes same based on their values. To perform task, propose hybrid deep learning network that relies format attributes' It does so without any pre-processing using pre-defined hand-crafted features. The combines sequence-based neural networks, namely convolutional networks (CNN) and recurrent (RNN), learn sequence structure CNN captures short-distance dependencies these sequences through sliding window approach, RNN long-distance by storing information previous characters. These create different vector representations input which are combined pooling layer. This layer applies specific operation vectors order capture most useful patterns for task. Finally, top layer, softmax function predicts label attribute value. We evaluate our strategy four domains. results show outperforms approaches, use some kind pre-processing, all