作者: Ahmed Metwally , Divyakant Agrawal , Amr El Abbadi
DOI: 10.1007/978-3-540-30570-5_27
关键词: Computation 、 Order (group theory) 、 Set (abstract data type) 、 Computer science 、 Data stream mining 、 Algorithm 、 Data stream 、 Integrated approach 、 Space (mathematics) 、 Distribution (mathematics)
摘要: We propose an integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream. Our technique is efficient and exact if the alphabet under consideration is small. In the more practical large alphabet case, our solution is space efficient and reports both top-k and frequent elements with tight guarantees on errors. For general data distributions, our top-k algorithm can return a set of k′ elements, where k′≈ k, which are guaranteed to be the top-k'elements; and we use …