作者: Anne-Marie Vercoustre , Mounir Fegas , Saba Gul , Yves Lechevallier
DOI: 10.1007/978-3-540-34963-1_34
关键词:
摘要: This paper reports on the INRIA group’s approach to XML mining while participating in INEX Mining track 2005. We use a flexible representation of documents that allows taking into account structure only or both and content. Our consists representing by set their sub-paths, defined according some criteria (length, root beginning, leaf ending). By considering those sub-paths as words, we can standard methods for vocabulary reduction, simple clustering such k-means. an implementation algorithm known dynamic clouds work with distinct groups independent modalities put separate variables. is useful our model since embedded are not independent: split potentially dependant paths variables, resulting each them containing independant paths. Experiments collections show good results structure-only collections, but could scale well large structure-and-content collections.