String indexing for patterns with wildcards

作者: Philip Bille , Inge Li Gørtz , Hjalte Wedel Vildhøj , Søren Vind

DOI: 10.1007/978-3-642-31155-0_25

关键词: Space (mathematics)MathematicsAlgorithmSearch engine indexingWildcardBinary logarithmLinear spaceSuffix treeCombinatoricsWildcard characterString (computer science)

摘要: We consider the problem of indexing a string t length n to report occurrences query pattern p containing m characters and j wildcards. Let occ be number in t, σ size alphabet. obtain following results. – A linear space index with time O(m+σj loglogn+occ). This significantly improves previously best known by Lam et al. [ISAAC 2007], which requires Θ(jn) worst case. – An O(m+j+occ) using $O(\sigma^{k^2} \log^k\log n)$, where k is maximum wildcards allowed pattern. first non-trivial bound this time. – time-space trade-off, generalizing Cole [STOC 2004]. Our results are obtained novel combination well-known new techniques, could independent interest.

参考文章(45)
Moshe Lewenstein, Indexing with Gaps String Processing and Information Retrieval. pp. 135- 143 ,(2011) , 10.1007/978-3-642-24583-1_14
Costas S. Iliopoulos, M. Sohel Rahman, Pattern Matching Algorithms with Don't Cares. conference on current trends in theory and practice of informatics. pp. 116- 126 ,(2007)
Erkki Sutinen, Ricardo A. Baeza-Yates, Jorma Tarhio, Gonzalo Navarro, Indexing methods for approximate string matching IEEE Data(base) Engineering Bulletin. ,vol. 24, pp. 19- 27 ,(2001)
M. S. Paterson, M. J. Fischer, STRING-MATCHING AND OTHER PRODUCTS Massachusetts Institute of Technology. ,(1974)
M. Sohel Rahman, Costas S. Iliopoulos, Inbok Lee, Manal Mohamed, William F. Smyth, Finding patterns with variable length gaps or don’t cares computing and combinatorics conference. pp. 146- 155 ,(2006) , 10.1007/11809678_17
Amos Marc Bairoch, Philip Bucher, A Generalized Profile Syntax for Biomolecular Sequence Motifs and its Function in Automatic Sequence Interpretation intelligent systems in molecular biology. ,vol. 2, pp. 53- 61 ,(1994)
Torben Hagerup, Sorting and Searching on the Word RAM Untitled Event. pp. 366- 398 ,(1998)
Djamal Belazzougui, Faster and Space-Optimal Edit Distance 1 Dictionary combinatorial pattern matching. pp. 154- 167 ,(2009) , 10.1007/978-3-642-02441-2_14
Alan Tam, Edward Wu, Tak-Wah Lam, Siu-Ming Yiu, Succinct Text Indexing with Wildcards string processing and information retrieval. ,vol. 5721, pp. 39- 50 ,(2009) , 10.1007/978-3-642-03784-9_5
Kimmo Fredriksson, Szymon Grabowski, Nested Counters in Bit-Parallel String Matching Language and Automata Theory and Applications. pp. 338- 349 ,(2009) , 10.1007/978-3-642-00982-2_29