Discovering Data Source Stability Patterns in Biomedical Repositories Based on Simplicial Projections from Probability Distribution Distances

作者: Pablo Ferri Borreda , Carlos Saez , Juan Miguel Garcia Gomez

DOI: 10.1109/CBMS.2017.153

关键词:

摘要: The degree of homogeneity statistical distributions among data sources is a critical issue when reusing Integrated Data Repositories (IDR). Evaluating this source stability utmost importance in order to ensure confident reuse. This work tackles the task discovering and classifying patterns multiple IDRs, by means novel approach based on simplicial projections from probability distribution distances, combined with Density-based spatial clustering applications noise (DBSCAN). results evaluated 20 public repositories support existence four main biomedical repositories: global pattern (GSP), local (LSP), sparse (SSP) instability (IP).

参考文章(14)
Kwanghee Jung, Yoshio Takane, Multidimensional Scaling I International Encyclopedia of the Social & Behavioral Sciences (Second Edition). pp. 34- 39 ,(2015) , 10.1016/B978-0-08-097086-8.42045-3
David G Clayton, Generalized Linear Mixed Models Encyclopedia of Biostatistics. pp. 845- 852 ,(2003) , 10.1002/9781118445112.STAT07540
Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)
Carlos Sáez, Montserrat Robles, Juan M García-Gómez, Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distances. Statistical Methods in Medical Research. ,vol. 26, pp. 312- 336 ,(2017) , 10.1177/0962280214545122
Warren S. Torgerson, Multidimensional scaling: I. Theory and method Psychometrika. ,vol. 17, pp. 401- 419 ,(1952) , 10.1007/BF02288916
Karen Kafadar, Adrian W. Bowman, Adelchi Azzalini, Applied smoothing techniques for data analysis : the kernel approach with S-plus illustrations Journal of the American Statistical Association. ,vol. 94, pp. 982- ,(1999) , 10.2307/2670015
I. Borg, P. Groenen, Modern Multidimensional Scaling: Theory and Applications Journal of Educational Measurement. ,vol. 40, pp. 277- 280 ,(2003) , 10.1111/J.1745-3984.2003.TB01108.X
Emanuel Parzen, On Estimation of a Probability Density Function and Mode Annals of Mathematical Statistics. ,vol. 33, pp. 1065- 1076 ,(1962) , 10.1214/AOMS/1177704472
B.W. Silverman, Density estimation for statistics and data analysis Monographs on Statistics and Applied Probability. ,(1986) , 10.1201/9781315140919
D.M. Endres, J.E. Schindelin, A new metric for probability distributions IEEE Transactions on Information Theory. ,vol. 49, pp. 1858- 1860 ,(2003) , 10.1109/TIT.2003.813506