An AI-Based Methodology for the Automatic Classification of a Multiclass Ebook Collection Using Information From the Tables of Contents

作者: Eleni Giannopoulou , Nikolas Mitrou

DOI: 10.1109/ACCESS.2020.3041651

关键词:

摘要: Book recommendation to support professors and students in the identification of relevant sources is significant importance for both universities digital libraries and, hence, motivates development a system. This paper aims at automatically classifying multiclass corpus that was created from ebooks Springer collection, which available through Hellenic Academic Libraries’ subscription, by utilizing an unsupervised neural network (NN) (self-organizing maps, SOM) two deep (DNN) architectures, namely, long short-term memory (LSTM) convolutional (CNN) combined with LSTM(CNN+LSTM) under various configuration scenarios. The vector construction leverages information extracted table contents (ToC) each book using TF-IDF weighting scheme (for first case) Keras tokenizer second). Extensive experiments were conducted configurations preprocessing steps, NN set up vocabulary sizes assess their impact on classifier’s performance. Furthermore, we show majority voting more suitable selecting dominant label specified node. experimental analysis showed feasibility developing system supporting related based detailed thematic description (e.g., abstract or book) rather than few keywords. In experiments, subsystem utilized DNN performed best, F1-scores 67% 26 categories 80% 5 general categories, whereas SOM realizes less 5% cases.

参考文章(44)
Tomáš Brychcín, Pavel Král, Novel Unsupervised Features for Czech Multi-label Document Classification mexican international conference on artificial intelligence. pp. 70- 79 ,(2014) , 10.1007/978-3-319-13647-9_8
Teuvo Kohonen, Hongbing Xing, Contextually self-organized maps of chinese words workshop on self organizing maps. pp. 16- 29 ,(2011) , 10.1007/978-3-642-21566-7_2
Dominik Scherer, Andreas Müller, Sven Behnke, None, Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition Artificial Neural Networks – ICANN 2010. pp. 92- 101 ,(2010) , 10.1007/978-3-642-15825-4_10
X. Luo, A.N. Zincir-Heywood, A comparison of SOM based document categorization systems international joint conference on neural network. ,vol. 3, pp. 1786- 1791 ,(2003) , 10.1109/IJCNN.2003.1223678
Yoon Kim, Convolutional Neural Networks for Sentence Classification empirical methods in natural language processing. pp. 1746- 1751 ,(2014) , 10.3115/V1/D14-1181
Jean-Charles Lamirel, Pascal Cuxac, Aneesh Sreevallabh Chivukula, Kafil Hajlaoui, Optimizing text classification through efficient feature selection based on quality metric intelligent information systems. ,vol. 45, pp. 379- 396 ,(2015) , 10.1007/S10844-014-0317-4
Teuvo Kohonen, The Self-Organizing Map Neurocomputing. ,vol. 21, pp. 1- 6 ,(1998) , 10.1016/S0925-2312(98)00030-7
Teuvo Kohonen, Essentials of the self-organizing map Neural Networks. ,vol. 37, pp. 52- 65 ,(2013) , 10.1016/J.NEUNET.2012.09.018
Geoffrey E Hinton, Ruslan R Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks Science. ,vol. 313, pp. 504- 507 ,(2006) , 10.1126/SCIENCE.1127647
Nal Kalchbrenner, Edward Grefenstette, Phil Blunsom, A Convolutional Neural Network for Modelling Sentences Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 655- 665 ,(2014) , 10.3115/V1/P14-1062