作者: D. J. Cook , L. B. Holder
DOI: 10.1613/JAIR.43
关键词: Mathematics 、 Minimum description length 、 Measure (mathematics) 、 Substructure 、 Component (UML) 、 Domain (software engineering) 、 Molecule mining 、 Variety (cybernetics) 、 Data mining 、 Closeness
摘要: The ability to identify interesting and repetitive substructures is an essential component discovering knowledge in structural data. We describe a new version of our SUBDUE substructure discovery system based on the minimum description length principle. discovers that compress original data represent concepts By replacing previously-discovered data, multiple passes produce hierarchical regularities uses computationally-bounded inexact graph match identifies similar, but not identical, instances finds approximate measure closeness two when under computational constraints. In addition minimumdescription principle, other background can be used by guide search towards more appropriate substructures. Experiments variety domains demonstrate SUBDUE's find capable compressing discover important domain.