作者: Anveshi Charuvaka , Huzefa Rangwala
DOI: 10.1109/ICDM.2012.27
关键词: Artificial intelligence 、 Computer science 、 Machine learning 、 Process (engineering) 、 Data mining 、 Hierarchical database model 、 Binary classification 、 Biological database 、 Generalization 、 Protein sequencing 、 Task (project management) 、 Multi-task learning
摘要: Several biological databases organize information in taxonomies/hierarchies. These differ terms of curation process, input data, coverage and annotation errors. SCOP CATH are examples two that classify proteins hierarchically into structurally related groups based on experimentally determined structures. Given the large number protein sequences with unavailable structure, there is a need to develop prediction methods structural classes. We have developed novel classification approach utilizes underlying relationships across multiple hierarchical source within multi-task learning (MTL) framework. MTL used simultaneously learn tasks, has been shown improve generalization performance. Specifically, we evaluated an for predicting class, as defined by databases, SCOP, using sequence only. define one task per node hierarchies formulate problem combination these binary tasks. Our experimental evaluation demonstrates integrates both outperforms base-line trains independent models task, well tasks single database. also performed extensive experiments evaluate different regularization penalties incorporate achieve superior