Scalable Distributed Stream Processing

作者: Cherniack Mitch , Balakrishnan Hari , Balazinska Magdalena , Carney Donald , Cetintemel Ugur

DOI:

关键词:

摘要: Stream processing fits a large class of new applications for which conventional DBMSs fall short. Because many stream-oriented systems are inherently geographically distributed and because distribution offers scalable load management higher availability, future stream will operate in fashion. They run across the Internet on computers typically owned by multiple cooperating administrative domains. This paper describes architectural challenges facing design large-scale systems, discusses novel approaches addressing management, high federated operation issues. We describe two Aurora* Medusa, being designed to explore complementary solutions these challenges. issues systems. begin Section 2 with brief description our centralized system, Aurora [4]. then discuss efforts extend environment: Medusa. assumes an environment all nodes under single domain. Medusa provides infrastructure support boundaries. After describing architectures 3, we consider three common both: infrastructures protocols supporting communication amongst (Section 4), sharing response variable network conditions 5), availability presence failures 6). also high-level policy specifications employed 7. For issues, believe that push-based nature stream-based not only raises but possibility domain-specific solutions.

参考文章(17)
M. Kamath, G. Alonso, R. Günthör, C. Mohan, Providing high availability in very large workflow management systems Advances in Database Technology — EDBT '96. pp. 425- 442 ,(1996) , 10.1007/BFB0014169
Jim Gray, Andreas Reuter, Transaction Processing: Concepts and Techniques ,(1992)
Daniela Florescu, Patrick Valduriez, Luc Bouganim, Dynamic Load Balancing in Hierarchical Parallel Database Systems very large data bases. pp. 436- 447 ,(1996)
Amol Deshpande, Joseph M. Hellerstein, Vijayshankar Raman, Samuel Madden, Mehul A. Shah, Sirish Chandrasekaran, Michael J. Franklin, Kris Hildrum, Adaptive Query Processing: Technology in Evolution. IEEE Data(base) Engineering Bulletin. ,vol. 23, pp. 7- 18 ,(2000)
H. Balakrishnan, S. Seshan, The Congestion Manager RFC. ,vol. 3124, pp. 1- 22 ,(2001)
David DeWitt, Jim Gray, Parallel database systems Communications of the ACM. ,vol. 35, pp. 85- 98 ,(1992) , 10.1145/129888.129894
Hari Balakrishnan, Hariharan S. Rahul, Srinivasan Seshan, An integrated congestion management architecture for Internet hosts acm special interest group on data communication. ,vol. 29, pp. 175- 187 ,(1999) , 10.1145/316188.316220
David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, Daniel Lewin, Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web symposium on the theory of computing. pp. 654- 663 ,(1997) , 10.1145/258533.258660
Derek L. Eager, Edward D. Lazowska, John Zahorjan, Adaptive load sharing in homogeneous distributed systems IEEE Transactions on Software Engineering. ,vol. 12, pp. 662- 675 ,(1986) , 10.1109/TSE.1986.6312961
Witold Litwin, Marie-Anna Neimat, Donovan A. Schneider, LH*—a scalable, distributed data structure ACM Transactions on Database Systems. ,vol. 21, pp. 480- 525 ,(1996) , 10.1145/236711.236713