Dataset and Enhanced Model for Eligibility Criteria-to-SQL Semantic Parsing

作者: Anxiao Jiang , Xiaoqian Jiang , Yang Yang , Tianlong Chen , Xiaojing Yu

DOI:

关键词: ParsingComputer scienceClinical trialResearch opportunitiesExecutableTask (project management)SQLInformation retrieval

摘要: Clinical trials often require that patients meet eligibility criteria (e.g., have specific conditions) to ensure the safety and effectiveness of studies. However, retrieving eligible for a trial from electronic health record (EHR) database remains challenging task clinicians since it requires not only medical knowledge about criteria, but also an adequate understanding structured query language (SQL). In this paper, we introduce new dataset includes first-of-its-kind eligibility-criteria corpus corresponding queries criteria-to-sql (Criteria2SQL), translating executable SQL queries. Compared existing datasets, in here are derived clinical include Order-sensitive, Counting-based, Boolean-type cases which seen before. addition dataset, propose novel neural semantic parser as strong baseline model. Extensive experiments show proposed outperforms state-of-the-art general-purpose text-to-sql models while highlighting challenges presented by dataset. The uniqueness diversity leave lot research opportunities future improvement.

参考文章(23)
Albert M. Lai, Chunhua Weng, Stephen B. Johnson, Zhihui Luo, Extracting temporal constraints from clinical research eligibility criteria using conditional random fields. american medical informatics association annual symposium. ,vol. 2011, pp. 843- 852 ,(2011)
Krystyna Milian, Annette ten Teije, Towards Automatic Patient Eligibility Assessment: From Free-Text Criteria to Queries artificial intelligence in medicine in europe. pp. 78- 83 ,(2013) , 10.1007/978-3-642-38326-7_12
D.W. Lonsdale, C. Tustison, C.G. Parker, D.W. Embley, Assessing clinical trial eligibility with logic expression queries data and knowledge engineering. ,vol. 66, pp. 3- 17 ,(2008) , 10.1016/J.DATAK.2007.07.005
Simona Carini, Jessica Ross, Samson Tu, Ida Sim, Analysis of eligibility criteria complexity in clinical trials. AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science. ,vol. 2010, pp. 46- 50 ,(2010)
Samson W. Tu, Mor Peleg, Simona Carini, Michael Bobak, Jessica Ross, Daniel Rubin, Ida Sim, A practical method for transforming free-text eligibility criteria into computable criteria Journal of Biomedical Informatics. ,vol. 44, pp. 239- 250 ,(2011) , 10.1016/J.JBI.2010.09.007
Nate Kushman, Yoav Artzi, Luke Zettlemoyer, Regina Barzilay, Learning to Automatically Solve Algebra Word Problems Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). ,vol. 1, pp. 271- 281 ,(2014) , 10.3115/V1/P14-1026
Willem Zuidema, Phong Le, Learning Compositional Semantics for Open Domain Semantic Parsing international conference on computational linguistics. pp. 1535- 1552 ,(2012)
Tian Kang, Shaodian Zhang, Youlan Tang, Gregory W Hruby, Alexander Rusanov, Noémie Elhadad, Chunhua Weng, EliIE: An open-source information extraction system for clinical trial eligibility criteria. Journal of the American Medical Informatics Association. ,vol. 24, pp. 1062- 1071 ,(2017) , 10.1093/JAMIA/OCX019
Dawn Song, Chang Liu, Xiaojun Xu, SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning arXiv: Computation and Language. ,(2017)
Dipti Misra Sharma, Manish Shrivastava, Pruthwik Mishra, Vinayak Athavale, Purvanshi Mehta, Deep Neural Network based system for solving Arithmetic Word problems. international joint conference on natural language processing. pp. 65- 68 ,(2017)