作者: Bala Rajaratnam , Alfred O. Hero
DOI:
关键词:
摘要: When can reliable inference be drawn in the "Big Data" context? This paper presents a framework for answering this fundamental question context of correlation mining, with implications general large scale inference. In data applications like genomics, connectomics, and eco-informatics dataset is often variable-rich but sample-starved: regime where number $n$ acquired samples (statistical replicates) far fewer than $p$ observed variables (genes, neurons, voxels, or chemical constituents). Much recent work has focused on understanding computational complexity proposed methods Data." Sample however received relatively less attention, especially setting when sample size fixed, dimension grows without bound. To address gap, we develop unified statistical that explicitly quantifies various inferential tasks. Sampling regimes divided into several categories: 1) classical asymptotic variable fixed goes to infinity; 2) mixed both go infinity at comparable rates; 3) purely high dimensional fixed. Each its niche only latter applies exa-scale dimension. We illustrate problem it matrix pairwise partial correlations among are interest. demonstrate mining based unifying perspective learning rates different structured covariance models