作者: Carlos Sáez , Montserrat Robles , Juan M García-Gómez
关键词: Mathematics 、 Data mining 、 Stability (probability) 、 Context (language use) 、 Estimator 、 Information geometry 、 Probability distribution 、 Probabilistic logic 、 Multi-source 、 Data quality
摘要: Biomedical data may be composed of individuals generated from distinct, meaningful sources. Due to possible contextual biases in the processes that generate data, there exist an undesirable and unexpected variability among probability distribution functions (PDFs) source subsamples, which, when uncontrolled, lead inaccurate or unreproducible research results. Classical statistical methods have difficulties undercover such variabilities dealing with multi-modal, multi-type, multi-variate data. This work proposes two metrics for analysis stability multiple sources, robust aforementioned conditions, defined context quality assessment. Specifically, a global probabilistic deviation outlyingness are proposed. The first provides bounded degree multi-source variability, designed as estimator equivalent notion normalized standard PDFs. second bou...