Compact Approximation of Lattice Functions with Applications to Large-Alphabet Text Search

作者: Paolo Boldi , Sebastiano Vigna

DOI:

关键词: Theoretical computer scienceBloom filterAlgorithmLattice (order)Computer scienceFull text searchData structure

摘要: We propose a very simple randomised data structure that stores an approximation from above of lattice-valued function. Computing the function value requires constant number steps, and error probability can be balanced with space usage, much like in Bloom filters. The is particularly well suited for functions are bottom on most their domain. then show how to use our methods store compact way bad-character shift variants Boyer-Moore text search algorithms. As result, we obtain practical implementations these algorithms used large alphabets, such as Unicode collation elements, small setup time. ideas described this paper have been implemented free software under GNU General Public License within MG4J project (this http URL).

参考文章(13)
Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, Luca Trevisan, Counting Distinct Elements in a Data Stream randomization and approximation techniques in computer science. pp. 1- 10 ,(2002) , 10.1007/3-540-45726-7_1
Paolo Boldi, Sebastiano Vigna, Rethinking Java strings principles and practice of programming in java. pp. 27- 30 ,(2003) , 10.5555/957289.957300
Daniel M. Sunday, A very fast substring search algorithm Communications of the ACM. ,vol. 33, pp. 132- 142 ,(1990) , 10.1145/79173.79184
L. Colussi, Fastest pattern matching in strings Journal of Algorithms. ,vol. 16, pp. 163- 189 ,(1994) , 10.1006/JAGM.1994.1008
M. Crochemore, A. Czumaj, L. Gasieniec, S. Jarominek, T. Lecroq, W. Plandowski, W. Rytter, Speeding up two string-matching algorithms Algorithmica. ,vol. 12, pp. 247- 267 ,(1994) , 10.1007/BF01185427
Alberto Apostolico, Raffaele Giancarlo, The Boyer Moore Galil string searching strategies revisited SIAM Journal on Computing. ,vol. 15, pp. 98- 105 ,(1986) , 10.1137/0215007
Philippe Flajolet, G. Nigel Martin, Probabilistic counting algorithms for data base applications Journal of Computer and System Sciences. ,vol. 31, pp. 182- 209 ,(1985) , 10.1016/0022-0000(85)90041-8
Thierry Lecroq, A variation on the Boyer-Moore algorithm Theoretical Computer Science. ,vol. 92, pp. 119- 144 ,(1992) , 10.1016/0304-3975(92)90139-7
Burton H. Bloom, Space/time trade-offs in hash coding with allowable errors Communications of the ACM. ,vol. 13, pp. 422- 426 ,(1970) , 10.1145/362686.362692