Mapping sequence to feature vector using numerical representation of codons targeted to amino acids for alignment-free sequence analysis

作者: Jayanta Kumar Das , Antara Sengupta , Pabitra Pal Choudhury , Swarup Roy

DOI: 10.1016/J.GENE.2020.145096

关键词:

摘要: The phylogenetic analysis based on sequence similarity targeted to real biological taxa is one of the major challenging tasks. In this paper, we propose a novel alignment-free method, CoFASA (Codon Feature Amino acid Sequence Analyser), for nucleotide sequences. At first, assign numerical weights four nucleotides. We then calculate score each codon value constituent nucleotides, termed as degree codons. Accordingly, obtain amino codons towards specific acid. Utilizing twenty acids and their relative abundance within given sequence, generate 20-dimensional features every coding DNA or protein sequence. use performing set candidate multiple sequences derived from Beta-globin (BG), NADH dehydrogenase subunit 5 (ND5), Transferrins (TFs), Xylanases, low identity (<40%) high (⩾40%) (encompassing 533 1064 families) experimental assessments. compare our results with sixteen (16) well-known methods, including both alignment-based methods. Various assessment indices are used, such Pearson correlation coefficient, RF (Robinson-Foulds) distance ROC performance analysis. While comparing methods (ClustalW, ClustalΩ, MAFFT, MUSCLE), it shows very similar results. Further, better in comparison LZW-Kernal, jD2Stat, FFP, spaced, AFKS-D2s predicting taxonomic relationship among taxa. Overall, observe that by much useful isolating according labels. method cost-effective, at same time, produces consistent satisfactory outcomes.

参考文章(62)
Haiyan Wu, Yusen Zhang, Wei Chen, Zengchao Mu, Comparative analysis of protein primary sequences with graph energy Physica A-statistical Mechanics and Its Applications. ,vol. 437, pp. 249- 262 ,(2015) , 10.1016/J.PHYSA.2015.04.017
Gesine Reinert, David Chew, Fengzhu Sun, Michael S. Waterman, Alignment-free sequence comparison (I): statistics and power. Journal of Computational Biology. ,vol. 16, pp. 1615- 1634 ,(2009) , 10.1089/CMB.2009.0198
Nafiseh Jafarzadeh, Ali Iranmanesh, C-curve: A novel 3D graphical representation of DNA sequence based on codons Bellman Prize in Mathematical Biosciences. ,vol. 241, pp. 217- 224 ,(2013) , 10.1016/J.MBS.2012.11.009
Kai Song, Jie Ren, Zhiyuan Zhai, Xuemei Liu, Minghua Deng, Fengzhu Sun, Alignment-free sequence comparison based on next-generation sequencing reads. Journal of Computational Biology. ,vol. 20, pp. 64- 79 ,(2013) , 10.1089/CMB.2012.0228
U. Lagerkvist, "Two out of three": an alternative method for codon reading. Proceedings of the National Academy of Sciences of the United States of America. ,vol. 75, pp. 1759- 1762 ,(1978) , 10.1073/PNAS.75.4.1759
Sean R Eddy, What is dynamic programming Nature Biotechnology. ,vol. 22, pp. 909- 910 ,(2004) , 10.1038/NBT0704-909