Engineering a Lightweight Suffix Array Construction Algorithm

作者: Giovanni Manzini , Paolo Ferragina

DOI: 10.1007/S00453-004-1094-1

关键词:

摘要: In this paper we describe a new algorithm for building the suffix array of string. This task is equivalent to problem lexicographically sorting all suffixes input Our based on approach called deep–shallow sorting: use “shallow” sorter with short common prefix, and “deep” long prefix. All known algorithms either require large amount space or are inefficient when string contains many repeated substrings. has been designed overcome dichotomy. “lightweight” in sense that it uses very small addition required by itself. At same time our fast even repetitions: shown extensive experiments inputs size up 110 Mb. The source code algorithm, as well C library providing simple API, available under GNU GPL.

参考文章(30)
Pang Ko, Srinivas Aluru, Space efficient linear time construction of suffix arrays combinatorial pattern matching. pp. 200- 210 ,(2003) , 10.1007/3-540-44888-8_15
Kunihiko Sadakane, Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array international symposium on algorithms and computation. pp. 410- 421 ,(2000) , 10.1007/3-540-40996-3_35
Dong Kyue Kim, Jeong Seop Sim, Heejin Park, Kunsoo Park, Linear-time construction of suffix arrays combinatorial pattern matching. pp. 186- 199 ,(2003) , 10.1007/3-540-44888-8_14
Juha Kärkkäinen, Stefan Burkhardt, E. Chávez, M. Crochemore, R. Baeza-Yates, Fast Lightweight Suffix Array Construction and Checking Untitled Event. pp. 55- 69 ,(2003)
Ricardo A. Baeza-Yates, Gaston H. Gonnet, Tim Snider, New indices for text: PAT Trees and PAT arrays Information Retrieval. pp. 66- 82 ,(1992)
R. Grossi, Jeffrey Scott Vitter, Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching symposium on the theory of computing. pp. 397- 406 ,(2000)
H. Itoh, H. Tanaka, An efficient method for in memory construction of suffix arrays string processing and information retrieval. pp. 81- 88 ,(1999) , 10.1109/SPIRE.1999.796581
J. Seward, On the performance of BWT sorting algorithms data compression conference. pp. 173- 182 ,(2000) , 10.1109/DCC.2000.838157
Giovanni Manzini, An analysis of the Burrows—Wheeler transform Journal of the ACM. ,vol. 48, pp. 407- 430 ,(2001) , 10.1145/382780.382782