作者: Tianbao Yang , Haiqin Yang , Yi Xu , Lijun Zhang
DOI:
关键词: Subspace topology 、 Approximation error 、 Computer science 、 Benchmark (computing) 、 Matrix multiplication 、 Generalization 、 Random projection 、 Clustering high-dimensional data 、 Mathematical optimization 、 Matrix (mathematics) 、 Training set 、 Reduction (complexity)
摘要: In this paper, we address learning problems for high dimensional data. Previously, oblivious random projection based approaches that project features onto a subspace have been used in practice tackling high-dimensionality challenge machine learning. Recently, various non-oblivious randomized reduction methods developed and deployed solving many numerical such as matrix product approximation, low-rank etc. However, they are less explored the tasks, e.g., classification. More seriously, theoretical analysis of excess risk bounds minimization, an important measure generalization performance, has not established methods. It therefore remains open problem what is benefit using them over previous approaches. To tackle these challenges, propose algorithmic framework employing method general empirical minimizing where original high-dimensional projected derived from data with small approximation error. We then derive first bound proposed approach without requiring strong assumptions on training The exhibits provides much better performance it also sheds more insights about different Finally, conduct extensive experiments both synthetic real-world benchmark datasets, whose dimension scales to $O(10^7)$, demonstrate efficacy our approach.