Compressed string dictionaries

作者: Nieves R. Brisaboa , Rodrigo Cánovas , Francisco Claude , Miguel A. Martínez-Prieto , Gonzalo Navarro

DOI: 10.1007/978-3-642-20662-7_12

关键词:

摘要: The problem of storing a set strings - string dictionary in compact form appears naturally many cases. While classically it has represented small part the whole data to be processed (e.g., for Natural Language processing or indexing text collections), recent applications inWeb engines, RDF graphs, Bioinformatics, and others, handle very large dictionaries, whose size is significant fraction data. Thus efficient approaches compress them are necessary. In this paper we empirically compare time space performance some existing alternatives, as well new ones propose. We show that reductions up 20% original possible while supporting searches within few microseconds, 10% tens hundreds microseconds.

参考文章(37)
Donald E. Knuth, The art of computer programming, volume 3: (2nd ed.) sorting and searching Addison Wesley Longman Publishing Co., Inc.. ,(1998)
TH Cormen, RL Rivest, CE Leiserson, C Stein, Introduction to Algorithms, 2nd edition. ,(2001)
Veli Mäkinen, Gonzalo Navarro, Implicit Compression Boosting with Applications to Self-indexing String Processing and Information Retrieval. pp. 229- 241 ,(2007) , 10.1007/978-3-540-75530-2_21
Nieves R. Brisaboa, Susana Ladra, Gonzalo Navarro, Directly Addressable Variable-Length Codes string processing and information retrieval. ,vol. 5721, pp. 122- 130 ,(2009) , 10.1007/978-3-642-03784-9_12
Christopher D. Manning, Hinrich Schütze, Foundations of Statistical Natural Language Processing ,(1999)
Alistair Moffat, Jyrki Katajainen, In-Place Calculation of Minimum-Redundancy Codes workshop on algorithms and data structures. pp. 393- 402 ,(1995) , 10.1007/3-540-60220-8_79
Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutierrez, Compact representation of large RDF data sets for publishing and exchange international semantic web conference. pp. 193- 208 ,(2010) , 10.1007/978-3-642-17746-0_13
r;ribeiro-neto bueza-yates (b), Modern Information Retrieval ,(1999)
Ming Yin Ming, Dion Hoe‐lian Goh, Ee‐Peng Lim, Aixin Sun, Discovery of concept entities from web sites using web unit mining International Journal of Web Information Systems. ,vol. 1, pp. 123- 135 ,(2005) , 10.1108/17440080580000088