Foundational principles for large scale inference: Illustrations through correlation mining

作者: Bala Rajaratnam , Alfred O. Hero

DOI:

关键词:

摘要: When can reliable inference be drawn in the "Big Data" context? This paper presents a framework for answering this fundamental question context of correlation mining, with implications general large scale inference. In data applications like genomics, connectomics, and eco-informatics dataset is often variable-rich but sample-starved: regime where number $n$ acquired samples (statistical replicates) far fewer than $p$ observed variables (genes, neurons, voxels, or chemical constituents). Much recent work has focused on understanding computational complexity proposed methods Data." Sample however received relatively less attention, especially setting when sample size fixed, dimension grows without bound. To address gap, we develop unified statistical that explicitly quantifies various inferential tasks. Sampling regimes divided into several categories: 1) classical asymptotic variable fixed goes to infinity; 2) mixed both go infinity at comparable rates; 3) purely high dimensional fixed. Each its niche only latter applies exa-scale dimension. We illustrate problem it matrix pairwise partial correlations among are interest. demonstrate mining based unifying perspective learning rates different structured covariance models

参考文章(141)
Sham Machandranath Kakade, On the Sample Complexity of Reinforcement Learning Doctoral thesis, UCL (University College London).. ,(2003)
Jorma Rissanen, Stochastic Complexity in Statistical Inquiry Theory World Scientific Publishing Co., Inc.. ,(1989)
Lucien M. (Lucien Marie), Le Cam, On some asymptotic properties of maximum likelihood estimates and related Bayes' estimates Univ. of California Press. ,(1953)
C. RADHAKRISHNA RAO, Criteria of estimation in large samples Contributions to Statistics. pp. 345- 362 ,(1965) , 10.1016/B978-1-4832-3160-0.50027-0
Alexandre B. Tsybakov, Introduction to Nonparametric Estimation ,(2008)
Hamed Firouzi, Dennis Wei, Alfred O. Hero, Spectral Correlation Hub Screening of Multivariate Time Series arXiv: Other Statistics. pp. 335- 366 ,(2015) , 10.1007/978-3-319-20188-7_13
C. F. Loan, N. Pitsianis, Approximation with Kronecker Products Linear Algebra for Large Scale and Real-Time Applications. pp. 293- 314 ,(1993) , 10.1007/978-94-015-8196-7_17
Donald F. Morrison, Multivariate statistical methods Published in <b>1976</b> in New York NY) by McGraw-Hill. ,(1976)