Multimodal Word Sense Disambiguation in Creative Practice

作者: Akshat Gupta , Manuel Ladron de Guevara , Daragh Byrne , Christopher George , Ramesh Krishnamurti

DOI:

关键词: Relevance (information retrieval)Natural language processingWord (computer architecture)AmbiguityProduct designContext (language use)Resource (project management)Computer scienceArtificial intelligenceSentence

摘要: Language is ambiguous; many terms and expressions can convey the same idea. This especially true in creative practice, where ideas design intents are highly subjective. We present a dataset, Ambiguous Descriptions of Art Images (ADARI), contemporary workpieces, which aims to provide foundational resource for subjective image description multimodal word disambiguation context practice. The dataset contains total 240k images labeled with 260k descriptive sentences. It additionally organized into sub-domains architecture, art, design, fashion, furniture, product technology. In description, labels not deterministic: example, ambiguous label dynamic might correspond hundreds different images. To understand this complexity, we analyze ambiguity relevance text respect using state-of-the-art pre-trained BERT model sentence classification. baseline multi-label classification tasks demonstrate potential approaches understanding intentions. hope that ADARI baselines constitute first step towards

参考文章(44)
German Rigau, Llu'is Padr'o, Montse Cuadros, Highlighting relevant concepts from Topic Signatures language resources and evaluation. pp. 3841- 3848 ,(2012)
Bryan Lawson, How designers think ,(1980)
Rada Mihalcea, Dan I. Moldovan, eXtended WordNet: progress report north american chapter of the association for computational linguistics. ,(2001)
Pietro Perona, Gregory Griffin, Alex Holub, Caltech-256 Object Category Dataset California Institute of Technology. ,(2007)
Nal Kalchbrenner, Phil Blunsom, Recurrent Continuous Translation Models empirical methods in natural language processing. pp. 1700- 1709 ,(2013)
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, C. Lawrence Zitnick, Microsoft COCO: Common Objects in Context Computer Vision – ECCV 2014. pp. 740- 755 ,(2014) , 10.1007/978-3-319-10602-1_48
Thang Luong, Hieu Pham, Christopher D. Manning, Effective Approaches to Attention-based Neural Machine Translation empirical methods in natural language processing. pp. 1412- 1421 ,(2015) , 10.18653/V1/D15-1166
Montse Cuadros, German Rigau, KnowNet: Building a Large Net of Knowledge from the Web international conference on computational linguistics. pp. 161- 168 ,(2008) , 10.3115/1599081.1599102
Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, Antonio Torralba, SUN database: Large-scale scene recognition from abbey to zoo computer vision and pattern recognition. pp. 3485- 3492 ,(2010) , 10.1109/CVPR.2010.5539970
Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John Winn, Andrew Zisserman, The Pascal Visual Object Classes Challenge: A Retrospective International Journal of Computer Vision. ,vol. 111, pp. 98- 136 ,(2015) , 10.1007/S11263-014-0733-5