作者: Shih-Fu Chang , Di Lu , Manling Li , Heng Ji , Alireza Zareian
DOI:
关键词: Modalities 、 Argument (linguistics) 、 Event (computing) 、 Benchmark (computing) 、 Space (commercial competition) 、 Computer science 、 Task (project management) 、 Annotation 、 Embedding 、 Multimedia
摘要: We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents. develop the first benchmark collect dataset of 245 news articles with extensively annotated arguments. propose novel method, Weakly Aligned Structured Embedding (WASE), that encodes structured representations semantic information textual visual data into common embedding space. The structures are aligned across modalities by employing weakly supervised training strategy, enables exploiting available resources without explicit cross-media annotation. Compared uni-modal state-of-the-art methods, our approach achieves 4.0% 9.8% absolute F-score gains on text event argument role labeling extraction. unstructured representations, we achieve 8.3% 5.0% extraction labeling, respectively. By utilizing images, 21.4% more mentions than traditional text-only methods.