Debiasing Gender biased Hindi Words with Word-embedding

作者: Arun K. Pujari , Ansh Mittal , Anshuman Padhi , Anshul Jain , Mukesh Jadon

DOI: 10.1145/3377713.3377792

关键词:

摘要: Word-embedding is a major machine learning technique for computational applications of languages. For given corpus, the process word-embedding to embed each word onto multi-dimensional space such that semantic similarities between similar words are retained. While similarity as encapsulated in training embedding inadvertently captures many other inherent features present corpus. One thing bias arising out stereotyping almost all corpus no matter how extensively used and trusted they are. We study this aspect context Hindi language. show gender-neutral mapped vectors which inclined towards one gender or space. propose new algorithm debiasing demonstrate its efficacy Further, we build SVM-based classifier determines whether classified neutral otherwise. corroborate our claim with experimental results on large number individual words. This work first ever result Language can be applicable any

参考文章(12)
Richard Socher, Will Y. Zou, Christopher D. Manning, Daniel Cer, Bilingual Word Embeddings for Phrase-Based Machine Translation empirical methods in natural language processing. pp. 1393- 1398 ,(2013)
Ilya Sutskever, Tomas Mikolov, Greg S Corrado, Kai Chen, Jeff Dean, Distributed Representations of Words and Phrases and their Compositionality neural information processing systems. ,vol. 26, pp. 3111- 3119 ,(2013)
Duyu Tang, Furu Wei, Bing Qin, Nan Yang, Ting Liu, Ming Zhou, Sentiment Embeddings with Applications to Sentiment Analysis IEEE Transactions on Knowledge and Data Engineering. ,vol. 28, pp. 496- 509 ,(2016) , 10.1109/TKDE.2015.2489653
Jeffrey Pennington, Richard Socher, Christopher Manning, Glove: Global Vectors for Word Representation empirical methods in natural language processing. pp. 1532- 1543 ,(2014) , 10.3115/V1/D14-1162
Adam Kalai, Venkatesh Saligrama, Kai-Wei Chang, Tolga Bolukbasi, James Zou, Man is to computer programmer as woman is to homemaker? debiasing word embeddings neural information processing systems. ,vol. 29, pp. 4356- 4364 ,(2016)
Jacob Goldberger, Ido Dagan, Oren Melamud, context2vec: Learning Generic Context Embedding with Bidirectional LSTM conference on computational natural language learning. pp. 51- 61 ,(2016) , 10.18653/V1/K16-1006
Nathaniel Swinger, Maria De-Arteaga, Neil Thomas Heffernan IV, Mark DM Leiserson, Adam Tauman Kalai, What are the Biases in My Word Embedding national conference on artificial intelligence. pp. 305- 311 ,(2019) , 10.1145/3306618.3314270
Gertjan van Noord, Ivan Titov, Simon Šuster, Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders arXiv: Computation and Language. ,(2016)
Thomas Manzini, Lim Yao Chong, Alan W Black, Yulia Tsvetkov, Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings. north american chapter of the association for computational linguistics. pp. 615- 621 ,(2019) , 10.18653/V1/N19-1062
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer, Deep contextualized word representations north american chapter of the association for computational linguistics. ,vol. 1, pp. 2227- 2237 ,(2018) , 10.18653/V1/N18-1202