Applying probabilistic temporal and multisite data quality control methods to a public health mortality registry in Spain: a systematic approach to quality control of repositories

作者: Carlos Sáez , Oscar Zurriaga , Jordi Pérez-Panadés , Inma Melchor , Montserrat Robles

DOI: 10.1093/JAMIA/OCW010

关键词:

摘要: Objective To assess the variability in data distributions among sources and over time through a case study of large multisite repository as systematic approach to quality (DQ). Materials Methods Novel probabilistic DQ control methods based on information theory geometry are applied Public Health Mortality Registry Region Valencia, Spain, with 512 143 entries from 2000 2012, disaggregated into 24 health departments. The provide metrics exploratory visualizations for (1) assessing multiple (2) monitoring exploring changes time. suited big multitype, multivariate, multimodal data. Results was partitioned 2 probabilistically separated temporal subgroups following change Spanish National Death Certificate 2009. Punctual anomalies were noticed due punctual increment missing data, along outlying clustered departments differences populations or practices. Discussion Changes protocols, populations, biased practices, other problems affected variability. Even if semantic integration aspects addressed sharing infrastructures, may still be present. Solutions include fixing excluding analyzing different sites periods separately. A is proposed. Conclusion Multisite affects DQ, hindering reuse, an assessment such should part procedures.

参考文章(41)
R. Rampatige, L. Mikkelsen, C. AbouZahr, A. D Lopez, Strengthening civil registration and vital statistics for births, deaths and causes of death: resource kit World Health Organization. ,(2013)
Sandra L MacKenzie, Matt C Wyatt, Robert Schuff, Jessica D Tenenbaum, Nick Anderson, Practices and perspectives on building integrated data repositories: results from a 2010 CTSA survey Journal of the American Medical Informatics Association. ,vol. 19, ,(2012) , 10.1136/AMIAJNL-2011-000508
Barbara L Massoudi, Kenneth W Goodman, Ivan J Gotham, John H Holmes, Lisa Lang, Kathleen Miner, David D Potenziani, Janise Richards, Anne M Turner, Paul C Fu, An informatics agenda for public health: summarized recommendations from the 2011 AMIA PHI Conference Journal of the American Medical Informatics Association. ,vol. 19, pp. 688- 695 ,(2012) , 10.1136/AMIAJNL-2011-000507
Mohamed Medhat Gaber, Joao Gama, Learning from Data Streams: Processing Techniques in Sensor Networks Springer. ,(2007)
Carlos Sáez, Montserrat Robles, Juan M García-Gómez, Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distances. Statistical Methods in Medical Research. ,vol. 26, pp. 312- 336 ,(2017) , 10.1177/0962280214545122
Mingfeng Lin, Henry C. Lucas, Galit Shmueli, Research Commentary---Too Big to Fail: Large Samples and the p-Value Problem Information Systems Research. ,vol. 24, pp. 906- 917 ,(2013) , 10.1287/ISRE.2013.0480
Carlos Sáez, Pedro Pereira Rodrigues, João Gama, Montserrat Robles, Juan M García-Gómez, None, Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality Data Mining and Knowledge Discovery. ,vol. 29, pp. 950- 975 ,(2015) , 10.1007/S10618-014-0378-6
G. L. Knatterud, Management and Conduct of Randomized Controlled Trials Epidemiologic Reviews. ,vol. 24, pp. 12- 25 ,(2002) , 10.1093/EPIREV/24.1.12
Oscar Zurriaga, Hermelinda Vanaclocha, Miguel A Martinez-Beneito, Paloma Botella-Rocamora, Spatio-temporal evolution of female lung cancer mortality in a region of Spain, is it worth taking migration into account? BMC Cancer. ,vol. 8, pp. 35- 35 ,(2008) , 10.1186/1471-2407-8-35