作者: Rajendra Prasath , Sudeshna Sarkar
DOI:
关键词: Feature vector 、 Feature generation 、 Structure mining 、 Computer science 、 Human knowledge 、 Information retrieval 、 Context (language use) 、 Bag-of-words model 、 Hyperlink 、 Text categorization
摘要: We propose an unsupervised feature generation algorithm using the repositories of human knowledge for effective text categorization. Conventional bag words (BOW) depends on presence / absence keywords to classify documents. To understand actual context behind these keywords, we use concepts hyperlinks from external sources through content and structure mining Wikipedia. Then, features are clustered generate cluster vectors with which input documents mapped into a high dimensional space classification is performed. The simulation results show that proposed approach identifies associated in collection yields improved accuracy.