Multi-task Learning for Classifying Proteins Using Dual Hierarchies

作者: Anveshi Charuvaka , Huzefa Rangwala

DOI: 10.1109/ICDM.2012.27

关键词: Artificial intelligenceComputer scienceMachine learningProcess (engineering)Data miningHierarchical database modelBinary classificationBiological databaseGeneralizationProtein sequencingTask (project management)Multi-task learning

摘要: Several biological databases organize information in taxonomies/hierarchies. These differ terms of curation process, input data, coverage and annotation errors. SCOP CATH are examples two that classify proteins hierarchically into structurally related groups based on experimentally determined structures. Given the large number protein sequences with unavailable structure, there is a need to develop prediction methods structural classes. We have developed novel classification approach utilizes underlying relationships across multiple hierarchical source within multi-task learning (MTL) framework. MTL used simultaneously learn tasks, has been shown improve generalization performance. Specifically, we evaluated an for predicting class, as defined by databases, SCOP, using sequence only. define one task per node hierarchies formulate problem combination these binary tasks. Our experimental evaluation demonstrates integrates both outperforms base-line trains independent models task, well tasks single database. also performed extensive experiments evaluate different regularization penalties incorporate achieve superior

参考文章(17)
Christina Leslie, Rui Kuang, Fast Kernels for Inexact String Matching Learning Theory and Kernel Machines. ,vol. 2777, pp. 114- 128 ,(2003) , 10.1007/978-3-540-45167-9_10
M. Fazel, H. Hindi, S.P. Boyd, A rank minimization heuristic with application to minimum order system approximation american control conference. ,vol. 6, pp. 4734- 4739 ,(2001) , 10.1109/ACC.2001.945730
Gergely Csaba, Fabian Birzele, Ralf Zimmer, Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis BMC Structural Biology. ,vol. 9, pp. 23- 23 ,(2009) , 10.1186/1472-6807-9-23
Shuiwang Ji, Jieping Ye, An accelerated gradient method for trace norm minimization Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09. pp. 457- 464 ,(2009) , 10.1145/1553374.1553434
Ting Kei Pong, Paul Tseng, Shuiwang Ji, Jieping Ye, Trace Norm Regularization: Reformulations, Algorithms, and Multi-Task Learning SIAM Journal on Optimization. ,vol. 20, pp. 3465- 3489 ,(2010) , 10.1137/090763184
David L. Donoho, For most large underdetermined systems of linear equations the minimal 1-norm solution is also the sparsest solution Communications on Pure and Applied Mathematics. ,vol. 59, pp. 797- 829 ,(2006) , 10.1002/CPA.20132
Alexey G. Murzin, Steven E. Brenner, Tim Hubbard, Cyrus Chothia, SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology. ,vol. 247, pp. 536- 540 ,(1995) , 10.1016/S0022-2836(05)80134-2
Tsuyoshi Kato, Hisashi Kashima, Masashi Sugiyama, Kiyoshi Asai, Conic Programming for Multitask Learning IEEE Transactions on Knowledge and Data Engineering. ,vol. 22, pp. 957- 968 ,(2010) , 10.1109/TKDE.2009.142
CA Orengo, AD Michie, S Jones, DT Jones, MB Swindells, JM Thornton, CATH – a hierarchic classification of protein domain structures Structure. ,vol. 5, pp. 1093- 1109 ,(1997) , 10.1016/S0969-2126(97)00260-8