Algorithm for coding DNA sequences into "spectrum-like" and "zigzag" representations.

作者: Jure Zupan , Milan Randić

DOI: 10.1021/CI040104J

关键词:

摘要: An algorithm for encoding long strings of building blocks, like 4 DNA bases (adenine - A, cytosine C, thymine T, and guanidine G), 20 natural amino acids (from Alanine Ala to Valine Val, plus the stop triplet), or all 64 possible base triplets AAA TTT), into “zigzag” “spectrum-like” representations is suggested. The new scheme can be derived in 3-, 2-, 1-dimensional form depending on user's wishes. only information, besides string which representation sought, initial positioning complete set units from composed, i.e., four positions G, stop, etc. This initialized either 1-D form. As an illustration suggested visual chemometric comparison first 10 exon beta globin gene different species, each consisting about 100 basic a...

参考文章(18)
M. Randić, M. Vračko, A. Nandy, S. C. Basak, On 3-D graphical representation of DNA primary sequences and their numerical characterization. Journal of Chemical Information and Computer Sciences. ,vol. 40, pp. 1235- 1244 ,(2000) , 10.1021/CI000034Q
Milan Randić, Condensed representation of DNA primary sequences. Journal of Chemical Information and Computer Sciences. ,vol. 40, pp. 50- 56 ,(2000) , 10.1021/CI990084Z
Bo Liao, Tian-ming Wang, General Combinatorics of RNA Hairpins and Cloverleaves Journal of Chemical Information and Computer Sciences. ,vol. 43, pp. 1138- 1142 ,(2003) , 10.1021/CI020071C
Milan Randić, On characterization of DNA primary sequences by a condensed matrix Chemical Physics Letters. ,vol. 317, pp. 29- 34 ,(2000) , 10.1016/S0009-2614(99)01321-4
M. Randić, J. Zupan, Highly compact 2D graphical representation of DNA sequences Sar and Qsar in Environmental Research. ,vol. 15, pp. 191- 205 ,(2004) , 10.1080/10629360410001697753
M. Randić*, 2-D Graphical representation of proteins based on virtual genetic code Sar and Qsar in Environmental Research. ,vol. 15, pp. 147- 157 ,(2004) , 10.1080/10629360410001697744
Milan Randić, Jure Zupan, Alexandru T. Balaban, Unique graphical representation of protein sequences based on nucleotide triplet codons Chemical Physics Letters. ,vol. 397, pp. 247- 252 ,(2004) , 10.1016/J.CPLETT.2004.08.118
Milan Randić, Marjan Vračko, Jure Zupan, Marjana Novič, Compact 2-D graphical representation of DNA Chemical Physics Letters. ,vol. 373, pp. 558- 562 ,(2003) , 10.1016/S0009-2614(03)00639-0
Xiaofeng Guo, Milan Randic, Subhash C. Basak, A novel 2-D graphical representation of DNA sequences of low degeneracy Chemical Physics Letters. ,vol. 350, pp. 106- 112 ,(2001) , 10.1016/S0009-2614(01)01246-5