Motif matching using gapped patterns

作者: Emanuele Giaquinta , Kimmo Fredriksson , Szymon Grabowski , Alexandru I. Tomescu , Esko Ukkonen

DOI: 10.1016/J.TCS.2014.06.032

关键词: Variable lengthMathematicsFixed lengthAlphabetCombinatoricsMotif (music)

摘要: We consider the problem of matching a set \(\mathcal{P}\) gapped patterns against given text T length n, where pattern is sequence strings (keywords), over finite alphabet Σ size σ, such that there gap fixed between each two consecutive strings.We assume RAM model, with words w in bits.We are interested computing list ofmatching for position text. This specific instance Variable Length Gaps [2] (VLG problem) multiple and has applications discovery transcription factor (TF) binding sites DNA sequences when using generalized versions PositionWeightMatrix (PWM) model to representTF specificities. The paper [5] describes howa motif represented as generalizedPWM can bematched patternswith unit-length keywords, presents algorithms restricted case keywords.

参考文章(19)
Seppo Sippu, Eljas Soisalon-Soininen, Online Matching of Multiple Regular Patterns with Gaps and Character Classes language and automata theory and applications. pp. 523- 534 ,(2013) , 10.1007/978-3-642-37064-9_46
M. Sohel Rahman, Costas S. Iliopoulos, Inbok Lee, Manal Mohamed, William F. Smyth, Finding patterns with variable length gaps or don’t cares computing and combinatorics conference. pp. 146- 155 ,(2006) , 10.1007/11809678_17
Tuukka Haapasalo, Panu Silvasti, Seppo Sippu, Eljas Soisalon-Soininen, Online dictionary matching with variable-length gaps symposium on experimental and efficient algorithms. pp. 76- 87 ,(2011) , 10.1007/978-3-642-20662-7_7
Kimmo Fredriksson, Szymon Grabowski, Nested Counters in Bit-Parallel String Matching Language and Automata Theory and Applications. pp. 338- 349 ,(2009) , 10.1007/978-3-642-00982-2_29
Eilon Sharon, Shai Lubliner, Eran Segal, A feature-based approach to modeling protein-DNA interactions research in computational molecular biology. ,vol. 4, pp. 77- 91 ,(2007) , 10.1371/JOURNAL.PCBI.1000154
Viet-An Nguyen, Jordan Boyd-Graber, Stephen F. Altschul, Dirichlet Mixtures, the Dirichlet Process, and the Structure of Protein Space Journal of Computational Biology. ,vol. 20, pp. 1- 18 ,(2013) , 10.1089/CMB.2012.0244
Philip Bille, Inge Li Gørtz, Hjalte Wedel Vildhøj, David Kofoed Wind, String matching with variable length gaps Theoretical Computer Science. ,vol. 443, pp. 25- 34 ,(2012) , 10.1016/J.TCS.2012.03.029
Gonzalo Navarro, Mathieu Raffinot, Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. Journal of Computational Biology. ,vol. 10, pp. 903- 923 ,(2003) , 10.1089/106652703322756140
Yingtao Bi, Hyunsoo Kim, Ravi Gupta, Ramana V. Davuluri, Tree-based position weight matrix approach to model transcription factor binding site profiles. PLOS ONE. ,vol. 6, ,(2011) , 10.1371/JOURNAL.PONE.0024210