作者: Weiguang Mao , Elena Zaslavsky , Boris M. Hartmann , Stuart C. Sealfon , Maria Chikina
DOI: 10.1101/116061
关键词:
摘要: Genome scale molecular datasets are often highly structured, with many correlated measurements. This general phenomenon can be related to the underlying data generating process. In assays of mixed cell populations, such as blood, variation in cell-type proportion induces a complex correlation structure at gene-level. Likewise, groups genes co-regulated/co-expressed through shared transcription factors and signaling pathways. Many applications gene expression analysis rely on their ability reflect these unobserved biological processes order draw mechanistic conclusions. On other hand, patterns may also nuisance factors, batch effects, which interfere correct interpretation. The choice method is heavily dependent (nuisance or interesting-biological) believed account for more optimal variance strategy remains an open question. this study we describe infer biologically grounded model that provides estimates processes, including explicitly identified pathway-level effects. Specifically, formulate new matrix decomposition framework, PLIER (Pathway-level Information ExtractoR), incorporates prior knowledge. Using simulations, demonstrate superiority our recovering true model. real data, show approach able recover interpretable variables, reproduce previous findings simplified distinguish technical variation, provide additional insight. auxiliary functions compiled R package available https://github.com/wgmao/PLIER.