作者: Kamran Tirdad , Pedram Ghodsnia , J. Ian Munro , Alejandro López-Ortiz
DOI: 10.1007/978-3-642-24583-1_31
关键词: Index (publishing) 、 Coca 、 Signature (logic) 、 Algorithm 、 Data structure 、 Bloom filter 、 Search engine indexing 、 Co-occurrence 、 Computer science
摘要: We propose an indexing data structure based on a novel variation of Bloom filters. Signature files have been proposed in the past as method to index large text databases though they suffer from high false positive error problem. In this paper we introduce COCA Filters, new type filters which exploits co-occurrence probability words documents reduce error. show experimentally that by using technique can up 21.6 times for same size. Furthermore be replaced wherever any two members universe is identifiable.