UbiSitePred: A novel method for improving the accuracy of ubiquitination sites prediction by using LASSO to select the optimal Chou's pseudo components

作者: Xiaowen Cui , Zhaomin Yu , Bin Yu , Minghui Wang , Baoguang Tian

DOI: 10.1016/J.CHEMOLAB.2018.11.012

关键词:

摘要: Abstract Ubiquitination is an essential process in protein post-translational modification, which plays a crucial role cell life activities, such as proteasomal degradation, transcriptional regulation, and DNA damage repair. Therefore, recognition of ubiquitination sites step to understand the molecular mechanisms ubiquitination. However, experimental verification numerous time-consuming costly. To alleviate these issues, computational approach needed predict sites. This paper proposes new method called UbiSitePred for predicting combined least absolute shrinkage selection operator (LASSO) feature support vector machine. First, we use binary encoding (BE), pseudo-amino acid composition (PseAAC), k-spaced amino pairs (CKSAAP), position-specific propensity matrices (PSPM) extract sequence information; thus, initial space obtained. Secondly, LASSO applied remove redundancy information selects optimal subset. Finally, subset input into machine (SVM) Five-fold cross-validation shows that model can achieve better prediction performance compared with other methods, AUC values Set1, Set2, Set3 are 0.9998, 0.8887, 0.8481, respectively. Notably, has overall accuracy rates 98.33%, 81.12%, 76.90%, The results demonstrate proposed significantly superior state-of-the-art methods provide idea modification proteins. source code all datasets available at https://github.com/QUST-AIBBDRC/UbiSitePred/ .

参考文章(144)
Wei Chen, Pengmian Feng, Hui Ding, Hao Lin, Kuo-Chen Chou, iRNA-Methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition. Analytical Biochemistry. ,vol. 490, pp. 26- 33 ,(2015) , 10.1016/J.AB.2015.08.021
Xiaowei Zhao, Qiao Ning, Haiting Chai, Meiyue Ai, Zhiqiang Ma, PGlcS: Prediction of protein O-GlcNAcylation sites with multiple features and analysis Journal of Theoretical Biology. ,vol. 380, pp. 524- 529 ,(2015) , 10.1016/J.JTBI.2015.06.026
Quan Zou, Jiancang Zeng, Liujuan Cao, Rongrong Ji, A novel features ranking metric with application to scalable visual and bioinformatics data classification Neurocomputing. ,vol. 173, pp. 346- 354 ,(2016) , 10.1016/J.NEUCOM.2014.12.123
Bin Liu, Fule Liu, Xiaolong Wang, Junjie Chen, Longyun Fang, Kuo-Chen Chou, Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences Nucleic Acids Research. ,vol. 43, ,(2015) , 10.1093/NAR/GKV458
Micah Hamady, Erin Peden, Rob Knight, Ravinder Singh, Fast-Find: A novel computational approach to analyzing combinatorial motifs BMC Bioinformatics. ,vol. 7, pp. 1- 10 ,(2006) , 10.1186/1471-2105-7-1
Yu Xue, Hu Chen, Changjiang Jin, Zhirong Sun, Xuebiao Yao, NBA-Palm: prediction of palmitoylation site implemented in Naïve Bayes algorithm BMC Bioinformatics. ,vol. 7, pp. 458- 458 ,(2006) , 10.1186/1471-2105-7-458
Bin Liu, Longyun Fang, Ren Long, Xun Lan, Kuo-Chen Chou, iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudok-tuple nucleotide composition Bioinformatics. ,vol. 32, pp. 362- 369 ,(2016) , 10.1093/BIOINFORMATICS/BTV604
Daniela Hoeller, Christina-Maria Hecker, Ivan Dikic, Ubiquitin and ubiquitin-like proteins in cancer pathogenesis Nature Reviews Cancer. ,vol. 6, pp. 776- 788 ,(2006) , 10.1038/NRC1994
Wang-Ren Qiu, Xuan Xiao, Wei-Zhong Lin, Kuo-Chen Chou, iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model. Journal of Biomolecular Structure & Dynamics. ,vol. 33, pp. 1731- 1742 ,(2015) , 10.1080/07391102.2014.968875
Yoav Freund, Robert E Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting conference on learning theory. ,vol. 55, pp. 119- 139 ,(1997) , 10.1006/JCSS.1997.1504