Dimensionality reduction for k-means clustering

作者： Cameron N. Musco

DOI:

关键词: k-means clustering 、 Heuristics 、 Principal component analysis 、 Dimensionality reduction 、 Cluster analysis 、 Random projection 、 Theoretical computer science 、 Data science 、 Supervisor 、 Sketch 、 Computer science

摘要: In this thesis we study dimensionality reduction techniques for approximate k-means clustering. Given a large dataset, consider how to quickly compress smaller dataset (a sketch), such that solving the clustering problem on sketch will give an approximately optimal solution original dataset. First, provide exposition of technical results [CEM15], which show provably accurate is possible using common as principal component analysis, random projection, and sampling. We next present empirical evaluations supplement our theoretical results. algorithms, along with heuristics based these indeed perform well in practice. Finally, discuss extensions work neurally plausible algorithms reduction. This joint Michael Cohen, Samuel Elder, Nancy Lynch, Christopher Musco, Madalina Persu. Thesis Supervisor: A. Lynch Title: Professor Electrical Engineering Computer Science

mit.edu 本地加速

cameronmusco.com PDF 下载加速

mit.edu PDF 下载加速

cameronmusco.com LINK 下载加速

参考文章(81)

Dimitris Papailiopoulos, Alexandros Dimakis, Megasthenis Asteris, Nonnegative Sparse PCA with Provable Guarantees international conference on machine learning. pp. 1728- 1736 ,(2014)

Sid Ray, Rose H Turi, Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation international conference on advances in pattern recognition. pp. 137- 143 ,(2000)

Michael B. Cohen, Jelani Nelson, David P. Woodruff, Optimal approximate matrix product in terms of stable rank arXiv: Data Structures and Algorithms. ,(2015)

Jan-Philipp W. Kappmeier, Daniel R. Schmidt, Melanie Schmidt, Solving k-means on High-Dimensional Big Data symposium on experimental and efficient algorithms. pp. 259- 270 ,(2015) , 10.1007/978-3-319-20086-6_20

Mina Ghashami, Edo Liberty, Jeff M. Phillips, David P. Woodruff, Frequent Directions: Simple and Deterministic Matrix Sketching SIAM Journal on Computing. ,vol. 45, pp. 1762- 1792 ,(2016) , 10.1137/15M1009718

Rosa I. Arriaga, Santosh Vempala, An algorithmic theory of learning: robust concepts and random projection foundations of computer science. ,vol. 63, pp. 161- 182 ,(1999) , 10.1007/S10994-006-6265-7

Venkatesan Guruswami, Ali Kemal Sinop, Optimal column-based low-rank matrix reconstruction symposium on discrete algorithms. ,vol. 2012, pp. 1207- 1214 ,(2012) , 10.5555/2095116.2095211

Andrea Vattani, Kamalika Chaudhuri, Sanjoy Dasgupta, Learning Mixtures of Gaussians Using the k-Means Algorithm arXiv: Learning. ,(2009)

Lloyd N. Trefethen, David Bau, Numerical Linear Algebra ,(1997)

10.

M. Inaba, Application of weighted Voronoi diagrams and randomization to variance-based k-clustering symposium on computational geometry. ,(1994)

Dimensionality reduction for k-means clustering

来源期刊

我的账户

Dimensionality reduction for k-means clustering

来源期刊

相似文章 6

Input Sparsity Time Low-Rank Approximation via Ridge Leverage Score Sampling

Structural Conditions for Projection-Cost Preservation via Randomized Matrix Multiplication

Scalable Batched Iterative Matrix Sketching: Theory and Practice

Structural conditions for projection-cost preservation via randomized matrix multiplication

Growing neural gas with random projection method for high-dimensional data stream clustering

Projection-Cost-Preserving Sketches: Proof Strategies and Constructions.

我的账户