Recognition of Disease Genetic Information from Unstructured Text Data Based on BiLSTM-CRF for Molecular Mechanisms

作者: Tianyin Chen , Li Zhang , Xingxing Zhang , Lejun Gong

DOI: 10.1155/2021/6635027

关键词:

摘要: Disease relevant entities are an important task in mining unstructured text data from the biomedical literature for achieving knowledge. Autism spectrum disorder (ASD) is a disease related to neurological and developmental characterized by deficits communication social interaction repetitive behaviour. However, this kind of remains unclear date. In study, it identifies associated with using machine learning computational way collection molecular mechanisms ASD. Entities extracted autism deep bidirectional long short-term memory (BiLSTM) conditional random field (CRF) model. Compared other previous works, approach promising identifying disease. The proposed including five types evaluated GENIA corpus obtain F-score 76.81%. work has 9146 proteins, 145 RNAs, 7680 DNAs, 1058 cell-types, 981 cell-lines after removing repeated entities. Finally, we perform GO KEGG analyses test dataset. This study could serve as reference further studies on etiology basis provide explore genetic information.

参考文章(40)
Ari Rosenberg, Jaclyn Sky Patterson, Dora E. Angelaki, A computational perspective on autism Proceedings of the National Academy of Sciences of the United States of America. ,vol. 112, pp. 9158- 9165 ,(2015) , 10.1073/PNAS.1510583112
Zhihua Liao, Hongguang Wu, Biomedical Named Entity Recognition Based on Skip-Chain CRFS international conference on industrial control and electronics engineering. pp. 1495- 1498 ,(2012) , 10.1109/ICICEE.2012.393
Wilco W.M. Fleuren, Wynand Alkema, Application of text mining in the biomedical domain Methods. ,vol. 74, pp. 97- 106 ,(2015) , 10.1016/J.YMETH.2015.01.015
Antonio Jimeno Yepes, Rafael Berlanga, Knowledge based word-concept model estimation and refinement for biomedical text mining Journal of Biomedical Informatics. ,vol. 53, pp. 300- 307 ,(2015) , 10.1016/J.JBI.2014.11.015
Lejun Gong, Ronggen Yang, Qin Yan, Xiao Sun, Prioritization of Disease Susceptibility Genes Using LSM/SVD IEEE Transactions on Biomedical Engineering. ,vol. 60, pp. 3410- 3417 ,(2013) , 10.1109/TBME.2013.2257767
LEJUN GONG, XIAO SUN, DONGKE JIANG, SHENGTAO GONG, AUTMINER: A SYSTEM FOR EXTRACTING ASD-RELATED GENES USING TEXT MINING Journal of Biological Systems. ,vol. 19, pp. 113- 125 ,(2011) , 10.1142/S0218339011003828
Buzhou Tang, Hongxin Cao, Xiaolong Wang, Qingcai Chen, Hua Xu, None, Evaluating word representation features in biomedical named entity recognition tasks. BioMed Research International. ,vol. 2014, pp. 240403- 240403 ,(2014) , 10.1155/2014/240403
Marta Macedoni-Lukšič, Ingrid Petrič, Bojan Cestnik, Tanja Urbančič, Developing a Deeper Understanding of Autism: Connecting Knowledge through Literature Mining. Autism Research and Treatment. ,vol. 2011, pp. 307152- 307152 ,(2011) , 10.1155/2011/307152
Saeed Hassanpour, Martin J O’Connor, Amar K Das, A semantic-based method for extracting concept definitions from scientific publications: evaluation in the autism phenotype domain. Journal of Biomedical Semantics. ,vol. 4, pp. 14- 14 ,(2013) , 10.1186/2041-1480-4-14
Jesse F Abelson, Kenneth Y Kwan, Brian J O'Roak, Danielle Y Baek, Althea A Stillman, Thomas M Morgan, Carol A Mathews, David L Pauls, Mladen-Roko Rašin, Murat Gunel, Nicole R Davis, A Gulhan Ercan-Sencicek, Danielle H Guez, John A Spertus, James F Leckman, Leon S Dure IV, Roger Kurlan, Harvey S Singer, Donald L Gilbert, Anita Farhi, Angeliki Louvi, Richard P Lifton, Nenad Sestan, Matthew W State, Sequence Variants in SLITRK1 Are Associated with Tourette's Syndrome Science. ,vol. 310, pp. 317- 320 ,(2005) , 10.1126/SCIENCE.1116502