Machine Learning for Protein Function

作者: Dan Ofer

DOI:

关键词:

摘要: Systematic identification of protein function is a key problem in current biology. Most traditional methods fail to identify functionally equivalent proteins if they lack similar sequences, structural data or extensive manual annotations. In this thesis, I focused on feature engineering and machine learning for identifying diverse classes that share functional relatedness but little sequence similarity, notably, Neuropeptide Precursors (NPPs). I aim solely using unannotated primary sequences from any organism. This thesis focuses representations whole derived engineered features, their extraction, frameworks usage by (ML) models, the application ML models biological tasks, focusing high level functions. implemented ideas develop platform (called NeuroPID) extracts meaningful features classification overlooked NPPs. The allows mass discovery new NPs It was expanded as webserver. our approach towards other challenging classes. novel bioinformatics toolkit called ProFET (Protein Feature Engineering Toolkit). hundreds biophysical attributes, allowing proteins. applied many benchmark datasets with state art performance. success applies wide range high-level functions such metagenomic analysis, subcellular localization, structure unique properties (e.g. thermophiles, nucleic acid binding). These represent valuable resource science

参考文章(204)
Nicola Mulder, Rolf Apweiler, InterPro and InterProScan Methods of Molecular Biology. ,vol. 396, pp. 59- 70 ,(2007) , 10.1007/978-1-59745-515-2_5
Andrew W. Moore, K-means and Hierarchical Clustering ,(2004)
Roy Varshavsky, Menachem Fromer, Amit Man, Michal Linial, When less is more: improving classification of protein families with a minimal set of global features workshop on algorithms in bioinformatics. pp. 12- 24 ,(2007) , 10.1007/978-3-540-74126-8_3
Robert Tibshirani, Trevor Hastie, Jerome H. Friedman, The Elements of Statistical Learning ,(2001)
Bruce Alberts, Essential Cell Biology ,(1983)
Jonathan Tapson, Migel D. Tissera, Mark D. McDonnell, André van Schaik, Fast, simple and accurate handwritten digit classification using extreme learning machines with shaped input-weights. ,(2014)
Xin Deng, Jordan Gumm, Suman Karki, Jesse Eickholt, Jianlin Cheng, An Overview of Practical Applications of Protein Disorder Prediction and Drive for Faster, More Accurate Predictions International Journal of Molecular Sciences. ,vol. 16, pp. 15384- 15404 ,(2015) , 10.3390/IJMS160715384
Søren Kaae Sønderby, Casper Kaae Sønderby, Ole Winther, Henrik Nielsen, Convolutional LSTM Networks for Subcellular Localization of Proteins arXiv: Quantitative Methods. ,(2015) , 10.1007/978-3-319-21233-3_6
Susan D Brain, Helen M Cox, Neuropeptides and their receptors: innovative science providing novel therapeutic targets British Journal of Pharmacology. ,vol. 147, ,(2006) , 10.1038/SJ.BJP.0706461