Exposition and Analysis of a Suffix Sorting Algorithm

作者: Simon J. Puglisi

DOI:

关键词:

摘要: This paper focuses on the suffix sorting algorithm of Maniscalco [10], which at time writing is available only as C++ source code Internet. We will refer to program MSufSort. MSufSort computes Inverse Suffix Array (ISA) an input string, equivalent computing (converting one other discussed in section 8). Recall that for i ∈ [0..n − 1], ISA[i] gives lexicographic rank x[i..n 1] amongst all suffixes string x[0..n 1]. Experiments summarized [10] suggest outperforms fastest known programs, while using little extra space aside from 4n bytes hold array and n (in terms [11] it would be lightweight). It also purported perform well periodic strings, are catastrophic worst cases some algorithms. addresses need a more formal examination what appears very robust sorter. examine describe inner workings algorithm, try explain why performs by analyzing its asymptotic behavior. As published crosses several classes files not easy absorb single sitting. The presented this constitutes complete rewrite original just few C functions, included optimization, but rather aid explanation approach. After introducing notation Section 2, basic described 3 before two powerful heuristics introduced Sections 4 5. 6 7 consider usage. 8 discusses ISA SA transformation. In 9 we extensively test compare performance leading sorters. Possible areas future work outlined 10 11, brief conclusions offered 12. assumed reader familiar with concept applications, particularly Burrows-Wheeler Transformation (BWT).

参考文章(15)
Jens Stoye, Klaus-Bernd Schürmann, An Incomplex Algorithm for Fast Suffix Array Construction. Proc. of ALENEX/ANALCO 2005. pp. 78- 85 ,(2005)
Juha Kärkkäinen, Stefan Burkhardt, E. Chávez, M. Crochemore, R. Baeza-Yates, Fast Lightweight Suffix Array Construction and Checking Untitled Event. pp. 55- 69 ,(2003)
H. Itoh, H. Tanaka, An efficient method for in memory construction of suffix arrays string processing and information retrieval. pp. 81- 88 ,(1999) , 10.1109/SPIRE.1999.796581
J. Seward, On the performance of BWT sorting algorithms data compression conference. pp. 173- 182 ,(2000) , 10.1109/DCC.2000.838157
DAVID R. MUSSER, Introspective sorting and selection algorithms Software - Practice and Experience. ,vol. 27, pp. 983- 993 ,(1997) , 10.1002/(SICI)1097-024X(199708)27:8<983::AID-SPE117>3.0.CO;2-#
J. Ian Munro, David R. Clark, Efficient suffix trees on secondary storage symposium on discrete algorithms. pp. 383- 391 ,(1996) , 10.5555/313852.314087
W.F. Smyth, Repetitive perhaps, but certainly not boring Theoretical Computer Science. ,vol. 249, pp. 343- 355 ,(2000) , 10.1016/S0304-3975(00)00067-0
Robert Sedgewick, Jon L. Bentley, Fast algorithms for sorting and searching strings symposium on discrete algorithms. pp. 360- 369 ,(1997) , 10.5555/314161.314321
William Smyth, Computing Patterns in Strings ,(2003)
Giovanni Manzini, Paolo Ferragina, Engineering a Lightweight Suffix Array Construction Algorithm Algorithmica. ,vol. 40, pp. 33- 50 ,(2004) , 10.1007/S00453-004-1094-1