On Quantifying Dependence: A Framework for Developing Interpretable Measures

作者: Matthew Reimherr , Dan L. Nicolae

DOI: 10.1214/12-STS405

关键词: Probabilistic logicRange (mathematics)Data miningMeasure (mathematics)EconometricsKruskal's algorithmSelection (linguistics)Contrast (statistics)InterpretabilityMathematicsAxiom

摘要: We present a framework for selecting and developing measures of dependence when the goal is quantification relationship between two variables, not simply establishment its existence. Much literature on focused, at least implicitly, detection or revolves around inclusion/exclusion particular axioms discussing which satisfy said axioms. In contrast, we start with only few nonrestrictive guidelines focused existence, range interpretability, provide very open flexible framework. For quantification, most crucial notion whose foundation can be found in work Goodman Kruskal [Measures Association Cross Classifications (1979) Springer], importance seen popularity tools such as $R^2$ linear regression. While probabilistic interpretations their measures, demonstrate how more general information used to achieve same goal. To that end, strategy building designed allow practitioners tailor needs. many well-known fit our conclude paper by presenting real data examples. Our first example explores U.S. income education where this methodology help guide selection development measure. second examines functional data, illustrates them using geomagnetic storms.

参考文章(16)
Peter McCullagh, John Ashworth Nelder, Generalized Linear Models ,(1983)
A. Rényi, On measures of dependence Acta Mathematica Hungarica. ,vol. 10, pp. 441- 451 ,(1959) , 10.1007/BF02024507
Bradley Efron, Regression and ANOVA with Zero-One Data: Measures of Residual Variation Journal of the American Statistical Association. ,vol. 73, pp. 113- 121 ,(1978) , 10.1080/01621459.1978.10480013
STUART R. LIPSITZ, NAN M. LAIRD, DAVID P. HARRINGTON, Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association Biometrika. ,vol. 78, pp. 153- 160 ,(1991) , 10.1093/BIOMET/78.1.153
B. Schweizer, E. F. Wolff, On Nonparametric Measures of Dependence for Random Variables Annals of Statistics. ,vol. 9, pp. 879- 885 ,(1981) , 10.1214/AOS/1176345528
Kjell Doksum, Alexander Samarov, Nonparametric Estimation of Global Functionals and a Measure of the Explanatory Power of Covariates in Regression Annals of Statistics. ,vol. 23, pp. 1443- 1473 ,(1995) , 10.1214/AOS/1176324307
E. L. Lehmann, Some Concepts of Dependence Annals of Mathematical Statistics. ,vol. 37, pp. 1137- 1153 ,(1966) , 10.1007/978-1-4614-1412-4_64
Dan L. Nicolae, Quantifying the amount of missing information in genetic association studies Genetic Epidemiology. ,vol. 30, pp. 703- 717 ,(2006) , 10.1002/GEPI.20181
Roger B. Nelsen, An Introduction to Copulas ,(1998)
Dan L. Nicolae, Xiao-Li Meng, Augustine Kong, Rejoinder: Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies Statistical Science. ,vol. 23, pp. 287- 312 ,(2008) , 10.1214/07-STS244