作者: Xiangwen Liu , Joe Meehan , Weida Tong , Leihong Wu , Xiaowei Xu
DOI: 10.1186/S12911-020-1078-3
关键词: Semantic similarity 、 Rich Text Format 、 Standard test image 、 Cosine similarity 、 Deep learning 、 Artificial neural network 、 Artificial intelligence 、 Pattern recognition 、 Computer science 、 Tesseract 、 Identification (information)
摘要: Drug label, or packaging insert play a significant role in all the operations from production through drug distribution channels to end consumer. Image of label also called Display Panel could be used identify illegal, illicit, unapproved and potentially dangerous drugs. Due time-consuming process high labor cost investigation, an artificial intelligence-based deep learning model is necessary for fast accurate identification In addition image-based technology, we take advantages rich text information on pharmaceutical package images. this study, developed Label Identification Text embedding (DLI-IT) text-based patterns historical data detection suspicious DLI-IT, first trained Connectionist Proposal Network (CTPN) crop raw image into sub-images based text. The texts cropped are recognized independently Tesseract OCR Engine combined as one document each image. Finally, applied universal sentence transform these documents vectors find most similar reference images test cosine similarity. We DLI-IT 1749 opioid 2365 non-opioid was then tested 300 external images, result demonstrated our achieves up-to 88% precision identification, which outperforms previous method by 35% improvement. To conclude, combining analysis under framework, approach achieved competitive performance advancing identification.