Redundancy-Aware Topic Modeling for Patient Record Notes

作者： Raphael Cohen , Iddo Aviram , Michael Elhadad , Noémie Elhadad

关键词:

摘要: The clinical notes in a given patient record contain much redundancy, large part due to clinicians’ documentation habit of copying from previous the and pasting into new note. Previous work has shown that this redundancy negative impact on quality text mining topic modeling particular. In paper we describe novel variant Latent Dirichlet Allocation (LDA) modeling, Red-LDA, which takes account inherent records when content notes. To assess value experiment with three baselines our redundancy-aware method: collection records, (i) apply vanilla LDA all documents input records; (ii) identify remove by chosing single representative document for each as LDA; (iii) redundant paragraphs record, leaving partial, non-redundant (iv) Red-LDA records. Both quantitative evaluation carried out through log-likelihood held-out data coherence produced topics qualitative assessement physicians show produces superior models baseline strategies. This research contributes emerging field understanding characteristics electronic health how them framework mining. code two redundancy-elimination is made publicly available community.

参考文章(27)

Alex A.T. Bui, Corey W. Arnold, Ricky Taira, Suzie M. El-Saden, Clinical Case-based Retrieval Using Latent Topic Analysis. american medical informatics association annual symposium. ,vol. 2010, pp. 26- 30 ,(2010)

Kostas Tsioutsiouliklis, Fabio Massimo Zanzotto, Marco Pennaccchiotti, Linguistic Redundancy in Twitter empirical methods in natural language processing. pp. 659- 669 ,(2011)

Hongyuan Zha, Steven P Crain, Shuang-Hong Yang, Yu Jiao, Dialect Topic Modeling for Improved Consumer Medical Search american medical informatics association annual symposium. ,vol. 2010, pp. 132- 136 ,(2010)

Daniel Walker, William B. Lund, Eric K. Ringger, Evaluating Models of Latent Document Semantics in the Presence of OCR Errors empirical methods in natural language processing. pp. 240- 250 ,(2010)

David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937

Daniel Ramage, David Hall, Ramesh Nallapati, Christopher D. Manning, Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora empirical methods in natural language processing. pp. 248- 256 ,(2009) , 10.3115/1699510.1699543

Aria Haghighi, Lucy Vanderwende, Exploring Content Models for Multi-Document Summarization north american chapter of the association for computational linguistics. pp. 362- 370 ,(2009) , 10.3115/1620754.1620807

Eugenia L. Siegler, Ronald Adelman, Copy and paste: a remediable hazard of electronic health records. The American Journal of Medicine. ,vol. 122, pp. 495- 496 ,(2009) , 10.1016/J.AMJMED.2009.02.010

Corey Arnold, William Speier, A topic model of clinical reports Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '12. pp. 1031- 1032 ,(2012) , 10.1145/2348283.2348454

10.

Liisa Holm, Chris Sander, Protein Structure Comparison by Alignment of Distance Matrices Journal of Molecular Biology. ,vol. 233, pp. 123- 138 ,(1993) , 10.1006/JMBI.1993.1489

Redundancy-Aware Topic Modeling for Patient Record Notes

来源期刊

我的账户

Redundancy-Aware Topic Modeling for Patient Record Notes

来源期刊

相似文章 10

我的账户