A flexible structured-based representation for XML document mining

作者： Anne-Marie Vercoustre , Mounir Fegas , Saba Gul , Yves Lechevallier

关键词:

摘要: This paper reports on the INRIA group’s approach to XML mining while participating in INEX Mining track 2005. We use a flexible representation of documents that allows taking into account structure only or both and content. Our consists representing by set their sub-paths, defined according some criteria (length, root beginning, leaf ending). By considering those sub-paths as words, we can standard methods for vocabulary reduction, simple clustering such k-means. an implementation algorithm known dynamic clouds work with distinct groups independent modalities put separate variables. is useful our model since embedded are not independent: split potentially dependant paths variables, resulting each them containing independant paths. Experiments collections show good results structure-only collections, but could scale well large structure-and-content collections.

参考文章(26)

Elio Masciari, Sergio Flesca, Giuseppe Manco, Andrea Pugliese, Luigi Pontieri, Detecting Structural Similarities between XML Documents. international workshop on the web and databases. pp. 55- 60 ,(2002)

Gianluca Gordano, Andrea Tagarelli, Riccardo Ortale, Francesco De Francesca, Distance-based Clustering of XML Documents ,(2003)

Helena Ahonen-Myka, Antoine Doucet, Naïve Clustering of a large XML Document Collection. INEX Workshop. pp. 81- 87 ,(2002)

Ludovic Denoyer, Apprentissage et inférence statistique dans les bases de documents structurés : application aux corpus de documents textuels Paris 6. ,(2004)

Mounir Fegas, Thierry Despeyroux, Anne-Marie Vercoustre, Yves Lechevallier, Classification de documents XML à partir d'une représentation linéaire des arbres de ces documents. Actes des 6ème journées Extraction et Gestion des Connaissances (EGC 2006), Revue des Nouvelles Technologies de l'Information (RNTI-E-3). ,vol. 2, pp. 433- 444 ,(2006)

Theodore Dalamagas, Tao Cheng, Klaas-Jan Winkel, Timos Sellis, Clustering XML documents using structural summaries extending database technology. pp. 547- 556 ,(2004) , 10.1007/978-3-540-30192-9_54

Theodore Dalamagas, Tao Cheng, Klaas-Jan Winkel, Timos Sellis, Clustering XML Documents by Structure hellenic conference on artificial intelligence. pp. 112- 121 ,(2004) , 10.1007/978-3-540-24674-9_13

Laurent Candillier, Isabelle Tellier, Fabien Torre, Olivier Bousquet, SSC: Statistical Subspace Clustering Machine Learning and Data Mining in Pattern Recognition. ,vol. 3587, pp. 100- 109 ,(2005) , 10.1007/11510888_11

Jianwu Yang, Xiaoou Chen, A semi-structured document model for text mining Journal of Computer Science and Technology. ,vol. 17, pp. 603- 610 ,(2002) , 10.1007/BF02948828

10.

H. V. Jagadish, Andrew Nierman, Evaluating Structural Similarity in XML Documents international workshop on the web and databases. pp. 61- 66 ,(2002)

A flexible structured-based representation for XML document mining

来源期刊

我的账户

A flexible structured-based representation for XML document mining

来源期刊

相似文章 10

我的账户