作者: Cameron N. Musco
DOI:
关键词: k-means clustering 、 Heuristics 、 Principal component analysis 、 Dimensionality reduction 、 Cluster analysis 、 Random projection 、 Theoretical computer science 、 Data science 、 Supervisor 、 Sketch 、 Computer science
摘要: In this thesis we study dimensionality reduction techniques for approximate k-means clustering. Given a large dataset, consider how to quickly compress smaller dataset (a sketch), such that solving the clustering problem on sketch will give an approximately optimal solution original dataset. First, provide exposition of technical results [CEM15], which show provably accurate is possible using common as principal component analysis, random projection, and sampling. We next present empirical evaluations supplement our theoretical results. algorithms, along with heuristics based these indeed perform well in practice. Finally, discuss extensions work neurally plausible algorithms reduction. This joint Michael Cohen, Samuel Elder, Nancy Lynch, Christopher Musco, Madalina Persu. Thesis Supervisor: A. Lynch Title: Professor Electrical Engineering Computer Science