FAST: near real-time searchable data analytics for the cloud

作者: Yu Hua , Hong Jiang , Dan Feng

DOI: 10.1109/SC.2014.67

关键词:

摘要: With the explosive growth in data volume and complexity increasing need for highly efficient searchable analytics, existing cloud storage systems have largely failed to offer an adequate capability real-time analytics. Since true value or worth of heavily depends on how efficiently analytics can be carried out (near-) real-time, large fractions end up with their values being lost significantly reduced due staleness. To address this problem, we propose a near-real-time cost-effective methodology, called FAST. The idea behind FAST is explore exploit semantic correlation within among datasets via correlation-aware hashing manageable flat-structured addressing reduce processing latency, while incurring acceptably small loss data-search accuracy. property enables rapid identification correlated files significant narrowing scope processed. supports several types which implemented systems. We conduct real-world use case children reported missing extremely crowded environment (e.g., popular scenic spot peak tourist day) are identified timely fashion by analyzing 60 million images using Extensive experimental results demonstrate efficiency efficacy performance improvements energy savings.

参考文章(54)
Fred Douglis, Philip Shilane, Sazzala Reddy, Kai Li, Wei Dong, Hugo Patterson, Tradeoffs in scalable data routing for deduplication clusters file and storage technologies. pp. 2- 2 ,(2011) , 10.5555/1960475.1960477
Kave Eshghi, Mark Lillibridge, Deepavali Bhagwat, Peter Camble, Vinay Deolalikar, Greg Trezise, Sparse indexing: large scale, inline deduplication using sampling and locality file and storage technologies. pp. 111- 123 ,(2009)
Shankar Pasupathy, Andrew W. Leung, Timothy Bisson, Ethan L. Miller, Minglong Shao, Spyglass: fast, scalable metadata search for large-scale storage systems file and storage technologies. pp. 153- 166 ,(2009)
Kai Li, Hugo Patterson, Benjamin Zhu, Avoiding the disk bottleneck in the data domain deduplication file system file and storage technologies. pp. 18- ,(2008)
Piotr Indyk, Aristides Gionis, Rajeev Motwani, Similarity Search in High Dimensions via Hashing very large data bases. pp. 518- 529 ,(1999)
Sudipta Sengupta, Biplob Debnath, Jin Li, ChunkStash: speeding up inline storage deduplication using flash memory usenix annual technical conference. pp. 16- 16 ,(2010)
Pushkar Chitnis, Ashok Anand, Chitra Muthukrishnan, Bhavish Aggarwal, George Varghese, Athula Balachandran, Aditya Akella, Ramachandran Ramjee, EndRE: an end-system redundancy elimination service for enterprises networked systems design and implementation. pp. 28- 28 ,(2010) , 10.5555/1855711.1855739
D. Bhagwat, K. Eshghi, D.D.E. Long, M. Lillibridge, Extreme Binning: Scalable, parallel deduplication for chunk-based file backup modeling, analysis, and simulation on computer and telecommunication systems. pp. 1- 9 ,(2009) , 10.1109/MASCOT.2009.5366623
Arif M Khan, David F Gleich, Alex Pothen, Mahantesh Halappanavar, None, A multithreaded algorithm for network alignment via approximate matching ieee international conference on high performance computing data and analytics. pp. 1- 11 ,(2012) , 10.5555/2388996.2389083
Cristiana Amza, Madalin Mihailescu, Gokul Soundararajan, MixApart: decoupled analytics for shared storage systems file and storage technologies. pp. 133- 146 ,(2013) , 10.5555/2591272.2591287