作者: Dan Ofer
DOI:
关键词:
摘要: Systematic identification of protein function is a key problem in current biology. Most traditional methods fail to identify functionally equivalent proteins if they lack similar sequences, structural data or extensive manual annotations. In this thesis, I focused on feature engineering and machine learning for identifying diverse classes that share functional relatedness but little sequence similarity, notably, Neuropeptide Precursors (NPPs). I aim solely using unannotated primary sequences from any organism. This thesis focuses representations whole derived engineered features, their extraction, frameworks usage by (ML) models, the application ML models biological tasks, focusing high level functions. implemented ideas develop platform (called NeuroPID) extracts meaningful features classification overlooked NPPs. The allows mass discovery new NPs It was expanded as webserver. our approach towards other challenging classes. novel bioinformatics toolkit called ProFET (Protein Feature Engineering Toolkit). hundreds biophysical attributes, allowing proteins. applied many benchmark datasets with state art performance. success applies wide range high-level functions such metagenomic analysis, subcellular localization, structure unique properties (e.g. thermophiles, nucleic acid binding). These represent valuable resource science