Real-Time Semantic Search Using Approximate Methodology for Large-Scale Storage Systems

作者: Yu Hua , Hong Jiang , Dan Feng

DOI: 10.1109/TPDS.2015.2425399

关键词:

摘要: The challenges of handling the explosive growth in data volume and complexity cause increasing needs for semantic queries. queries can be interpreted as correlation-aware retrieval, while containing approximate results. Existing cloud storage systems mainly fail to offer an adequate capability Since true value or worth heavily depends on how efficiently search carried out (near-) real-time, large fractions end up with their values being lost significantly reduced due staleness. To address this problem, we propose a near-real-time cost-effective based methodology, called FAST. idea behind FAST is explore exploit correlation within among datasets via hashing manageable flat-structured addressing reduce processing latency, incurring acceptably small loss data-search accuracy. property enables rapid identification correlated files significant narrowing scope processed. supports several types analytics, which implemented existing searchable systems. We conduct real-world use case children reported missing extremely crowded environment (e.g., highly popular scenic spot peak tourist day) are identified timely fashion by analyzing 60 million images using further improved semantic-aware namespace provide dynamic adaptive management ultra-large Extensive experimental results demonstrate efficiency efficacy performance improvements.

参考文章(59)
Carlos Maltzahn, Sage A. Weil, Ethan L. Miller, Darrell D. E. Long, Scott A. Brandt, Ceph: a scalable, high-performance distributed file system operating systems design and implementation. pp. 307- 320 ,(2006) , 10.5555/1298455.1298485
Partho Nath, Bhuvan Urgaonkar, Anand Sivasubramaniam, Evaluating the usefulness of content addressable storage for high-performance data intensive applications Proceedings of the 17th international symposium on High performance distributed computing - HPDC '08. pp. 35- 44 ,(2008) , 10.1145/1383422.1383428
Sriram Lakshminarasimhan, Scott Klasky, Robert Latham, Robert Ross, Nagiza F. Samatova, John Jenkins, Isha Arkatkar, Zhenhuan Gong, Hemanth Kolla, Seung-Hoe Ku, Stephane Ethier, Jackie Chen, C. S. Chang, ISABELA-QA: query-driven analytics with ISABELA-compressed extreme-scale scientific data ieee international conference on high performance computing data and analytics. pp. 31- ,(2011) , 10.1145/2063384.2063425
Andre Brinkmann, Toni Cortes, Dirk Meister, Jürgen Kaiser, Julian Kunkel, Michael Kuhn, A study on data deduplication in HPC storage systems ieee international conference on high performance computing data and analytics. pp. 1- 11 ,(2012) , 10.5555/2388996.2389006
Peer-Timo Bremer, Scott Klasky, Ray Grout, Philippe Pebay, Hongfeng Yu, Hasan Abbasi, Tong Jin, Jacqueline Chen, Manish Parashar, Hemanth Kolla, Janine C. Bennett, Attila Gyulassy, Valerio Pascucci, Fan Zhang, David Thompson, Combining in-situ and in-transit processing to enable extreme-scale scientific analysis ieee international conference on high performance computing data and analytics. pp. 1- 9 ,(2012) , 10.5555/2388996.2389063
Jerry Chou, Rob D. Ryne, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang, E. Wes Bethel, Arie Shoshani, Oliver Rübel, Prabhat, Parallel index and query for large scale data analysis ieee international conference on high performance computing data and analytics. pp. 30- ,(2011) , 10.1145/2063384.2063424
Alexandr Andoni, Piotr Indyk, Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions Communications of the ACM. ,vol. 51, pp. 117- 122 ,(2008) , 10.1145/1327452.1327494
Yan Ke, Rahul Sukthankar, Larry Huston, An efficient parts-based near-duplicate and sub-image retrieval system acm multimedia. pp. 869- 876 ,(2004) , 10.1145/1027527.1027729
Athicha Muthitacharoen, Benjie Chen, David Mazières, A low-bandwidth network file system symposium on operating systems principles. ,vol. 35, pp. 174- 187 ,(2001) , 10.1145/502034.502052
Yu Hua, Xue Liu, Scheduling Heterogeneous Flows with Delay-Aware Deduplication for Avionics Applications IEEE Transactions on Parallel and Distributed Systems. ,vol. 23, pp. 1790- 1802 ,(2012) , 10.1109/TPDS.2012.51