作者: Ligia Adamska , Naomi Allen , Robin Flaig , Cathie Sudlow , Michael Lay
DOI: 10.1186/1745-6215-16-S2-O68
关键词: Coding (social sciences) 、 Health care 、 Bioinformatics 、 Biobank 、 Data quality 、 Primary care 、 Data science 、 Medicine 、 Schema (psychology) 、 Data integrity 、 Prospective cohort study
摘要: Increased availability of electronic healthcare records (EHR) has transformed how health research is conducted in the UK by enabling linkages between various health-related datasets. UK Biobank a prospective cohort study 500,000 men and women aged 40-69 years recruited throughout 2006 2010. To follow participants' over time, linked to national death cancer registries hospital admissions data, with linkage primary care data under development. This exercise involves linking separate providers, determining which data-fields are most value research, mapping changes clinical coding systems distinguishing can be standardised from those that should remain specific dataset origin, before integration into single amenable analysis external researchers. Data for poses several challenges including different regulatory processes across each devolved as well differences matching algorithms, formats schema, addition sheer volume processed. One biggest defining rules handling ambiguities whilst preserving integrity provenance. first map episode England, Scotland Wales. These datasets vary terms content, quality, geographical temporal coverage considerable expertise required integrate, document present these an accessible way researchers.