作者: Raphael Cohen , Iddo Aviram , Michael Elhadad , Noémie Elhadad
DOI: 10.1371/JOURNAL.PONE.0087555
关键词:
摘要: The clinical notes in a given patient record contain much redundancy, large part due to clinicians’ documentation habit of copying from previous the and pasting into new note. Previous work has shown that this redundancy negative impact on quality text mining topic modeling particular. In paper we describe novel variant Latent Dirichlet Allocation (LDA) modeling, Red-LDA, which takes account inherent records when content notes. To assess value experiment with three baselines our redundancy-aware method: collection records, (i) apply vanilla LDA all documents input records; (ii) identify remove by chosing single representative document for each as LDA; (iii) redundant paragraphs record, leaving partial, non-redundant (iv) Red-LDA records. Both quantitative evaluation carried out through log-likelihood held-out data coherence produced topics qualitative assessement physicians show produces superior models baseline strategies. This research contributes emerging field understanding characteristics electronic health how them framework mining. code two redundancy-elimination is made publicly available community.