An inexact-suffix-tree-based algorithm for detecting extensible patterns

作者: Abhijit Chattaraj , Laxmi Parida

DOI: 10.1016/J.TCS.2004.12.013

关键词:

摘要: Given an input sequence of data, a rigid pattern is repeating sequence, possibly interspersed with dont-care characters. The data could be characters or sets even real values. In practice, the patterns motifs interest are ones that also allow variable number gaps (or characters): these spacers termed extensible bioinformatics context, similar have been called flexible motifs. extensibility succinctly defined by single integer parameter D ≥ 1 which interpreted as allowable space to between and two successive solid in reported motif. We introduce structure inexact-suffix tree present algorithm based on this structure. This has tested primarily biological such DNA protein sequences. However generality system makes it equally applicable other mining, clustering, knowledge extraction applications.

参考文章(16)
Laxmi Parida, Some Results on Flexible-Pattern Discovery combinatorial pattern matching. pp. 33- 45 ,(2000) , 10.1007/3-540-45123-4_5
Marie -France Sagot, Alain Viari, A Double Combinatorial Approach to Discovering Patterns in Biological Sequences combinatorial pattern matching. pp. 186- 208 ,(1996) , 10.1007/3-540-61258-0_15
ALVIS BRAZMA, INGE JONASSEN, INGVAR EIDHAMMER, DAVID GILBERT, Approaches to the Automatic Discovery of Patterns in Biosequences Journal of Computational Biology. ,vol. 5, pp. 279- 305 ,(1998) , 10.1089/CMB.1998.5.279
TIMOTHY L. BAILEY, MICHAEL GRIBSKOV, Methods and Statistics for Combining Motif Match Scores Journal of Computational Biology. ,vol. 5, pp. 211- 221 ,(1998) , 10.1089/CMB.1998.5.211
Inge Jonassen, John F. Collins, Desmond G. Higgins, Finding flexible patterns in unaligned protein sequences. Protein Science. ,vol. 4, pp. 1587- 1595 ,(1995) , 10.1002/PRO.5560040817
Alberto Apostolico, Laxmi Parida, Incremental paradigms of motif discovery. Journal of Computational Biology. ,vol. 11, pp. 15- 25 ,(2004) , 10.1089/106652704773416867
Inge Jonassen, Efficient discovery of conserved patterns using a pattern graph Bioinformatics. ,vol. 13, pp. 509- 522 ,(1997) , 10.1093/BIOINFORMATICS/13.5.509
Alberto Apostolico, Mikhail J. Atallah, Compact Recognizers of Episode Sequences Information & Computation. ,vol. 174, pp. 180- 192 ,(2002) , 10.1006/INCO.2002.3143
Isidore Rigoutsos, Aris Floratos, Motif discovery without alignment or enumeration (extended abstract) research in computational molecular biology. pp. 221- 227 ,(1998) , 10.1145/279069.279118