Mining of Sequential Patterns with Variable Wildcard Regions using Modified PrefixSpan Method

作者: Keiichi Tamura , Susumu Kuroki , Yasuma Mori , Shigetaka Tono , Hajime Kitakami

DOI:

关键词:

摘要: In order to extract frequent patterns with a "variable wildcard region" from sequence database, we propose the new Modified PrefixSpan method. The method is enabled develop by adding input parameter, called maximum error count, former We verify this 2 kinds of datasets, Leucine Zipper and Zinc Finger that are included in PROSITE. results show has 8 9 times superior capacity for extraction patterns. Keyword Data Mining,Bioinformatics,Knowledge Discovery,Knowledge Management,Performance Evaluation

参考文章(16)
Behzad Mortazavi-Asl, Umeshwar Dayal, Qiming Chen, Jiawei Han, Jian Pei, Meichun Hsu, Helen Pinto, PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth international conference on data engineering. pp. 215- 224 ,(2001)
Marie -France Sagot, Alain Viari, A Double Combinatorial Approach to Discovering Patterns in Biological Sequences combinatorial pattern matching. pp. 186- 208 ,(1996) , 10.1007/3-540-61258-0_15
Charles Elkan, Timothy L. Bailey, Fitting a mixture model by expectation maximization to discover motifs in biopolymers. intelligent systems in molecular biology. ,vol. 2, pp. 28- 36 ,(1994)
Yasuma Mori, Susumu Kuroki, Yukiko Yamazaki, Hajime Kitakami, Tomoki Kanbara, Modified PrefixSpan Method for Motif Discovery in Sequence Databases pacific rim international conference on artificial intelligence. pp. 482- 491 ,(2002) , 10.1007/3-540-45683-X_52
ALVIS BRAZMA, INGE JONASSEN, INGVAR EIDHAMMER, DAVID GILBERT, Approaches to the Automatic Discovery of Patterns in Biosequences Journal of Computational Biology. ,vol. 5, pp. 279- 305 ,(1998) , 10.1089/CMB.1998.5.279
I. Rigoutsos, A. Floratos, Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm. Bioinformatics. ,vol. 14, pp. 55- 67 ,(1998) , 10.1093/BIOINFORMATICS/14.1.55
Inge Jonassen, John F. Collins, Desmond G. Higgins, Finding flexible patterns in unaligned protein sequences. Protein Science. ,vol. 4, pp. 1587- 1595 ,(1995) , 10.1002/PRO.5560040817
Saul B. Needleman, Christian D. Wunsch, A general method applicable to the search for similarities in the amino acid sequence of two proteins Journal of Molecular Biology. ,vol. 48, pp. 443- 453 ,(1970) , 10.1016/0022-2836(70)90057-4
Isidore Rigoutsos, Aris Floratos, Motif discovery without alignment or enumeration (extended abstract) research in computational molecular biology. pp. 221- 227 ,(1998) , 10.1145/279069.279118
A. Bairoch, P. Bucher, K. Hofmann, The PROSITE database, its status in 1997 Nucleic Acids Research. ,vol. 25, pp. 217- 221 ,(1997) , 10.1093/NAR/25.1.217