作者: Slavé Petrovski , Ayal B. Gussow , Quanli Wang , Matt Halvorsen , Yujun Han
DOI: 10.1371/JOURNAL.PGEN.1005492
关键词:
摘要: Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, regulatory are notoriously difficult to recognize. Most fundamentally, we not yet adept at recognizing the stretches human genome that most important regulating expression of genes. For this reason, it is apply regions same kinds analytical paradigms being successfully applied identify among influence risk. To determine whether dosage sensitive genes have distinct patterns their noncoding present two primary approaches focus solely on a gene’s proximal sequence. The first approach analogue recently introduced residual variation intolerance score (RVIS), termed RVIS, or ncRVIS. ncRVIS compares observed and predicted levels standing second approach, ncGERP, reflects phylogenetic conservation using GERP++. We assess how well these correlate four gene lists use different ways known likely cause disease through changes expression: 1) haploinsufficiency, 2) curated as ClinGen’s Genome Dosage Map, 3) judged be under purifying selection for change because they statistically depleted loss-of-function variants general population, 4) unlikely based presence copy number population. find both scores highly predictive sensitivity any criteria. In similar way ensemble-based predictors regional importance, ncCADD ncGWAVA, significantly appear carry information beyond conservation, assessed by ncGERP. These results highlight can provide critical complementary tool other annotation help parts increasingly harbor risk disease.