SlideImages: A Dataset for Educational Image Classification

作者: David Morris , Eric Müller-Budack , Ralph Ewerth

DOI: 10.1007/978-3-030-45442-5_36

关键词:

摘要: In the past few years, convolutional neural networks (CNNs) have achieved impressive results in computer vision tasks, which however mainly focus on photos with natural scene content. Besides, non-sensor derived images such as illustrations, data visualizations, figures, etc. are typically used to convey complex information or explore large datasets. However, this kind of has received little attention vision. CNNs and similar techniques use volumes training data. Currently, many document analysis systems trained part due lack datasets educational image paper, we address issue present SlideImages, a dataset for task classifying illustrations. SlideImages contains collected from various sources, e.g., Wikimedia Commons AI2D dataset, test slides. We reserved all actual order ensure that approaches using generalize well new images, potentially other domains. Furthermore, baseline system standard deep architecture discuss dealing challenge limited

参考文章(17)
Priscilla Moraes, Gabriel Sina, Kathleen McCoy, Sandra Carberry, Evaluating the accessibility of line graphs through textual summaries for visually impaired users Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility - ASSETS '14. pp. 83- 90 ,(2014) , 10.1145/2661334.2661368
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, Li Fei-Fei, ImageNet: A large-scale hierarchical image database computer vision and pattern recognition. pp. 248- 255 ,(2009) , 10.1109/CVPR.2009.5206848
Valerie S. Morash, Yue-Ting Siu, Joshua A. Miele, Lucia Hasty, Steven Landau, Guiding Novice Web Workers in Making Image Descriptions Using Templates ACM Transactions on Accessible Computing. ,vol. 7, pp. 12- ,(2015) , 10.1145/2764916
Alba García Seco de Herrera, Dimitrios Markonis, Ranveer Joyseeree, Roger Schaer, Antonio Foncubierta-Rodríguez, Henning Müller, Semi---supervised Learning for Image ModalityźClassification Revised Selected Papers from the First International Workshop on Multimodal Retrieval in the Medical Domain - Volume 9059. pp. 85- 98 ,(2015) , 10.1007/978-3-319-24471-6_8
Aniruddha Kembhavi, Mike Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, Ali Farhadi, A Diagram is Worth a Dozen Images Computer Vision – ECCV 2016. pp. 235- 251 ,(2016) , 10.1007/978-3-319-46493-0_15
Muhammad Zeshan Afzal, Andreas Kolsch, Sheraz Ahmed, Marcus Liwicki, Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification international conference on document analysis and recognition. pp. 883- 888 ,(2017) , 10.1109/ICDAR.2017.149
Nibal Nayef, Jean-Marc Ogier, Semantic Text Detection in Born-Digital Images via Fully Convolutional Networks international conference on document analysis and recognition. pp. 859- 864 ,(2017) , 10.1109/ICDAR.2017.145
Chun Yang, Xu-Cheng Yin, Hong Yu, Dimosthenis Karatzas, Yu Cao, None, ICDAR2017 Robust Reading Challenge on Text Extraction from Biomedical Literature Figures (DeTEXT) international conference on document analysis and recognition. pp. 1444- 1447 ,(2017) , 10.1109/ICDAR.2017.235
Jean Charbonnier, Lucia Sohmen, John Rothman, Birte Rohden, Christian Wartena, NOA: A Search Engine for Reusable Scientific Images Beyond the Life Sciences european conference on information retrieval. pp. 797- 800 ,(2018) , 10.1007/978-3-319-76941-7_78
Lucia Sohmen, Jean Charbonnier, Ina Blümel, Christian Wartena, Lambert Heller, Figures in Scientific Open Access Publications international conference theory and practice digital libraries. pp. 220- 226 ,(2018) , 10.1007/978-3-030-00066-0_19