Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distances.

作者: Carlos Sáez , Montserrat Robles , Juan M García-Gómez

DOI: 10.1177/0962280214545122

关键词: MathematicsData miningStability (probability)Context (language use)EstimatorInformation geometryProbability distributionProbabilistic logicMulti-sourceData quality

摘要: Biomedical data may be composed of individuals generated from distinct, meaningful sources. Due to possible contextual biases in the processes that generate data, there exist an undesirable and unexpected variability among probability distribution functions (PDFs) source subsamples, which, when uncontrolled, lead inaccurate or unreproducible research results. Classical statistical methods have difficulties undercover such variabilities dealing with multi-modal, multi-type, multi-variate data. This work proposes two metrics for analysis stability multiple sources, robust aforementioned conditions, defined context quality assessment. Specifically, a global probabilistic deviation outlyingness are proposed. The first provides bounded degree multi-source variability, designed as estimator equivalent notion normalized standard PDFs. second bou...

参考文章(28)
Richard Y. Wang, Diane M. Strong, Beyond accuracy: what data quality means to data consumers Journal of Management Information Systems. ,vol. 12, pp. 5- 33 ,(1996) , 10.1080/07421222.1996.11518099
Robert Detrano, Andras Janosi, Walter Steinbrunn, Matthias Pfisterer, Johann-Jakob Schmid, Sarbjit Sandhu, Kern H. Guppy, Stella Lee, Victor Froelicher, International application of a new probability algorithm for the diagnosis of coronary artery disease American Journal of Cardiology. ,vol. 64, pp. 304- 310 ,(1989) , 10.1016/0002-9149(89)90524-9
Warren S. Torgerson, Multidimensional scaling: I. Theory and method Psychometrika. ,vol. 17, pp. 401- 419 ,(1952) , 10.1007/BF02288916
Joshua B Tenenbaum, Vin de Silva, John C Langford, A Global Geometric Framework for Nonlinear Dimensionality Reduction Science. ,vol. 290, pp. 2319- 2323 ,(2000) , 10.1126/SCIENCE.290.5500.2319
Harold R. Parks, Dean C. Wills, An elementary calculation of the dihedral angle of the regular n-simplex American Mathematical Monthly. ,vol. 109, pp. 756- 758 ,(2002) , 10.1080/00029890.2002.11919910
I. Borg, P. Groenen, Modern Multidimensional Scaling: Theory and Applications Journal of Educational Measurement. ,vol. 40, pp. 277- 280 ,(2003) , 10.1111/J.1745-3984.2003.TB01108.X
Hugh S. Markus, Rob Ackerstaff, Viken Babikian, Chris Bladin, Dirk Droste, Donald Grosset, Chris Levi, David Russell, Mario Siebler, Charles Tegeler, Intercenter Agreement in Reading Doppler Embolic Signals A Multicenter International Study Stroke. ,vol. 28, pp. 1307- 1310 ,(1997) , 10.1161/01.STR.28.7.1307
B. Jarman, S. Gault, B. Alves, A. Hider, S. Dolan, A. Cook, B. Hurwitz, L. I Iezzoni, Explaining differences in English hospital death rates using routinely collected data BMJ. ,vol. 318, pp. 1515- 1520 ,(1999) , 10.1136/BMJ.318.7197.1515
Hideaki Shimazaki, Shigeru Shinomoto, A Method for Selecting the Bin Size of a Time Histogram Neural Computation. ,vol. 19, pp. 1503- 1527 ,(2007) , 10.1162/NECO.2007.19.6.1503
Carlos Saez, Montserrat Robles, Juan Miguel Garcia-Gomez, Comparative study of probability distribution distances to define a metric for the stability of multi-source biomedical research data international conference of the ieee engineering in medicine and biology society. ,vol. 2013, pp. 3226- 3229 ,(2013) , 10.1109/EMBC.2013.6610228