作者: Andrej Bratko , Bogdan Filipič
DOI: 10.1016/J.IPM.2005.06.003
关键词:
摘要: This paper examines several different approaches to exploiting structural information in semi-structured document categorization. The methods under consideration are designed for categorization of documents consisting a collection fields, or arbitrary tree-structured that can be adequately modeled with such fiat structure. range from trivial modifications text modeling more elaborate schemes, specifically tailored structured documents. We combine these three classification algorithms and evaluate their performance on four standard datasets containing types best results were obtained stacking, an approach which predictions based components combined by meta classifier. A further improvement this method is achieved including the flat model final prediction.