作者: Arun K. Pujari , Ansh Mittal , Anshuman Padhi , Anshul Jain , Mukesh Jadon
关键词:
摘要: Word-embedding is a major machine learning technique for computational applications of languages. For given corpus, the process word-embedding to embed each word onto multi-dimensional space such that semantic similarities between similar words are retained. While similarity as encapsulated in training embedding inadvertently captures many other inherent features present corpus. One thing bias arising out stereotyping almost all corpus no matter how extensively used and trusted they are. We study this aspect context Hindi language. show gender-neutral mapped vectors which inclined towards one gender or space. propose new algorithm debiasing demonstrate its efficacy Further, we build SVM-based classifier determines whether classified neutral otherwise. corroborate our claim with experimental results on large number individual words. This work first ever result Language can be applicable any