作者: Simon Bull
DOI:
关键词:
摘要: The discovery of drug targets is a vital component in the development therapeutic treatments, as it only through modulation target?s activity that can alleviate symptoms or cure. Accurate identification therefore an important part any program, and has outsized impact on program?s success due to its position first step pipeline. This makes stringent selection potential all more when attempting control increasing cost time needed successfully complete order increase throughput entire pipeline.In this work, computational approach was taken investigation protein targets. First, new heuristic, Leaf, for approximation maximum independent set developed, evaluated terms ability remove redundancy from datasets, goal being generate largest possible non-redundant dataset. Leaf compared pre-existing heuristics optimal algorithm, Cliquer. Not did find unbiased sets were around 10% larger than commonly used PISCES found ones no one smaller by Cliquer.Following this, human proteome mined discover properties proteins may be determining their suitability pharmaceutical modulation. Data gathered concerning each protein?s sequence, post-translational modifications, secondary structure, germline variants, expression profile target status. data then analysed determine features which non-target had significantly different values. analysis repeated subsets consisting GPCRs, ion channels, kinases proteases, well subset are implicated cancer. Next, machine learning quantify dataset serve target. For dataset, accomplished inducing random forest could distinguish between non-targets, using likeness non-targets.The best differentiate non-targets primarily those directly related sequence (e.g. structure). Germline levels interactions minimal discriminative power. Overall, indicators proteins? hydrophobicities, vivo half-lives, propensity membrane bound fraction non-polar amino acids sequences. In predicting targets, datasets channels cancer able induce forests highly capable distinguishing non-targets. predicted these comprise most suitable future likely produce results if basis building programme.