作者: Shi-Yong Neo , Sheng Gao , Qi Tian , Rui Shi , Tat-Seng Chua
DOI:
关键词:
摘要: This paper describes the details of our systems for feature extraction and search tasks TRECVID-2004. For extraction, we emphasize use visual auto-concept annotation technique, with fusion text specialized detectors, to induce concepts in videos. task, emphasis is two-fold. First employ query-specific models, second, multi-modality features, including text, annotated concepts, OCR output, shot classes detectors perform search. Our pipeline similar that employed text-based definition question-answering approaches. We first query analysis categorize into categories of: {PERSON, SPORTS, FINANCE, WEATHER, DISASTER GENERAL}. From these categories, a number constraints on process, including: (a) type features or emphasize; (b) key concept terms use; (c) video classes, such as sports anchor person etc exclude results. The results 60 hours test from TRECVID 2004 evaluation demonstrate approaches are effective.