Method for computing frequency distribution for many fields in one pass in parallel

作者: Jerry Lee Callen , Michael James Beckerle

DOI:

关键词: AlgorithmIdentifierStatisticsComputer scienceValue (computer science)Frequency distributionTable (database)Set (abstract data type)Distribution (mathematics)SortingField (computer science)

摘要: Provided are a techniques for determining frequency distribution set of records. A count table distributions is built in memory each field the records, wherein record includes identifier, value, and number times value occurs identifier concatenated with comprises composite key value. It determined that at least one approaching maximum amount allocated to table. The records sent sorting additional counting, include values.

参考文章(12)
Bhashyam Ramesh, Olli Pekka Kostamaa, Statistical representation of skewed data ,(2003)
Brian C. Edem, Richard P. Helliwell, John T. Johnston, Stable sorting for a sort accelerator ,(1989)
Gary Graunke, Leonard D. Shapiro, Sujata Ramamoorthy, Parallel merge sort method and apparatus ,(1996)
Michael L. Horowitz, Sort system for merging database entries ,(1999)
Ravi Kothuri, Siva Ravada, Jayant Sharma, Jayanta Banerjee, Heirarchical indexing of multi-attribute data by sorting, dividing and storing subsets ,(1999)
Michael L. Horowitz, Sort system for text retrieval ,(2001)