Measuring Semantic Similarity of Bengali Texts with Parts-of-Speech Tags and Word-Level Semantics

作者: Shajalal , Md. Atabuzzaman

DOI: 10.1109/ICCIT51783.2020.9392700

关键词:

摘要: The semantic textual similarity is essential for many applications related to natural language processing. But measuring the not an easy task. Because there are different types of sentences and diversities sentences’ structure make assessing a formidable When two texts lexicographically dissimilar but semantically similar, traditional lexical matching cannot return actual degree similarity. Besides these, lack well-recognized processing resources Bengali makes calculation difficult In this paper, we tried measure using word-level parts-of-speech tags. To assess similarity, exploit tagger pre-trained word-embedding model. Then, maximum word-to-word words employed if belong identical tag. We also introduced grammatical role level in our proposed method validate performance method, conducted experiments on publicly available benchmark dataset. results demonstrated that effective achieved state-of-the-art performance.

参考文章(22)
Manjira Sinha, Tirthankar Dasgupta, Anupam Basu, Abhik Jana, A New Semantic Lexicon and Similarity Measure in Bangla Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon. pp. 171- 182 ,(2012)
Carlo Strapparava, Rada Mihalcea, Courtney Corley, Corpus-based and knowledge-based measures of text semantic similarity national conference on artificial intelligence. pp. 775- 780 ,(2006)
Yoon Kim, Convolutional Neural Networks for Sentence Classification empirical methods in natural language processing. pp. 1746- 1751 ,(2014) , 10.3115/V1/D14-1181
Antoine Bordes, Sumit Chopra, Jason Weston, Question Answering with Subgraph Embeddings Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 615- 620 ,(2014) , 10.3115/V1/D14-1067
Rafael Ferreira, Rafael Dueire Lins, Fred Freitas, Bruno Avila, Steven J. Simske, Marcelo Riss, A New Sentence Similarity Method Based on a Three-Layer Sentence Representation Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 01. ,vol. 1, pp. 110- 117 ,(2014) , 10.1109/WI-IAT.2014.23
Ramiz M. Aliguliyev, A new sentence similarity measure and sentence based extractive technique for automatic text summarization Expert Systems With Applications. ,vol. 36, pp. 7764- 7772 ,(2009) , 10.1016/J.ESWA.2008.11.022
Tom Kenter, Maarten de Rijke, Short Text Similarity with Word Embeddings conference on information and knowledge management. pp. 1411- 1420 ,(2015) , 10.1145/2806416.2806475
Eneko Agirre, Carmen Banea, Claire Cardie, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Weiwei Guo, Inigo Lopez-Gazpio, Montse Maritxalar, Rada Mihalcea, German Rigau, Larraitz Uria, Janyce Wiebe, SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability north american chapter of the association for computational linguistics. pp. 252- 263 ,(2015) , 10.18653/V1/S15-2045
Hang Li, Jun Xu, Semantic Matching in Search ,(2014)
Dwijen Rudrapal, Amitava Das, Baby Bhattacharya, None, Measuring Semantic Similarity for Bengali Tweets Using WordNet recent advances in natural language processing. pp. 537- 544 ,(2015)