作者: David Borland , Wenyuan Wang , Jonathan Zhang , Joshua Shrestha , David Gotz
DOI: 10.1109/TVCG.2019.2934209
关键词:
摘要: The collection of large, complex datasets has become common across a wide variety domains. Visual analytics tools increasingly play key role in exploring and answering questions about these large datasets. However, many visualizations are not designed to concurrently visualize the number dimensions present (e.g. tens thousands distinct codes an electronic health record system). This fact, combined with ability visual systems enable rapid, ad-hoc specification groups, or cohorts, individuals based on small subset visualized dimensions, leads possibility introducing selection bias–when user creates cohort specified set differences other unseen may also be introduced. These unintended side effects result no longer being representative larger population intended studied, which can negatively affect validity subsequent analyses. We techniques for bias tracking visualization that incorporated into high-dimensional exploratory systems, focus medical data existing hierarchies. include: (1) tree-based provenance visualization, including user-specified baseline all cohorts compared against, encoding “drift”, indicates where have occurred, (2) visualizations, novel icicle-plot compare detail per-dimension between cohort. integrated temporal event sequence tool. example use cases report findings from domain expert interviews.