XBeGene: Scalable XML Documents Generator by Example Based on Real Data

作者: Manami Harazaki , Joe Tekli , Shohei Yokoyama , Naoki Fukuta , Richard Chbeir

DOI: 10.1007/978-3-642-28807-4_63

关键词: XML EncryptionStreaming XMLInformation retrievalXML frameworkDatabaseXML databaseComputer scienceDocument Structure DescriptionXML Schema EditorEfficient XML InterchangeXML validation

摘要: XML datasets of various sizes and properties are needed to evaluate the correctness efficiency XML-based algorithms applications. While several downloadable can be found online, these predefined by system experts might not suitable every algorithm. Tools for generating synthetic documents underline an alternative solution, promoting flexibility adaptability in document collections. Nonetheless, usefulness existing generators remains rather limited due restricted levels expressiveness allowed users. In this paper, we develop a novel By example Generator (XBeGene) producing data which closely reflect user’s requirements. Inspired query-by-example paradigm information retrieval, Our generator i)allows user provide her own sample as input, ii) analyzes structure, occurrence frequencies, content distributions each element input documents, iii) produces concur, both structural features, data. The size well that entire collection also specified user. Clustering experiments demonstrate high correlation between requirements characteristics generated data, while timing results confirm our approach’s scalability large scale

参考文章(21)
Denilson Barbosa, Kelly A. Lyons, John Keenleyside, Alberto O. Mendelzon, ToXgene: An extensible template-based data generator for XML. international workshop on the web and databases. pp. 49- 54 ,(2002)
Ashraf Aboulnaga, Jeffrey F. Naughton, Chun Zhang, Generating Synthetic Complex-Structured XML Data. international workshop on the web and databases. pp. 79- 84 ,(2001)
Joe Tekli, Richard Chbeir, Kokou Yetongnon, Extensible User-Based XML Grammar Matching Conceptual Modeling - ER 2009. ,vol. 5829, pp. 294- 314 ,(2009) , 10.1007/978-3-642-04840-1_23
Sven Helmer, Measuring the structural similarity of semistructured documents using entropy very large data bases. pp. 1022- 1032 ,(2007)
Joe Tekli, Richard Chbeir, Kokou Yetongnon, A Hybrid Approach for XML Similarity conference on current trends in theory and practice of informatics. pp. 783- 795 ,(2007) , 10.1007/978-3-540-69507-3_68
Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi, Efficient Computation of Frequent and Top-k Elements in Data Streams Database Theory - ICDT 2005. pp. 398- 412 ,(2004) , 10.1007/978-3-540-30570-5_27
Laurent Candillier, Isabelle Tellier, Fabien Torre, Transforming XML trees for efficient classification and clustering INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval. pp. 469- 480 ,(2005) , 10.1007/978-3-540-34963-1_36
H. V. Jagadish, Andrew Nierman, Evaluating Structural Similarity in XML Documents international workshop on the web and databases. pp. 61- 66 ,(2002)
Kanda Runapongsa, Jignesh M. Patel, H.V. Jagadish, Yun Chen, Shurug Al-Khalifa, The Michigan benchmark: towards XML query performance diagnostics Information Systems. ,vol. 31, pp. 73- 97 ,(2006) , 10.1016/J.IS.2004.09.004
Elisa Bertino, Giovanna Guerrini, Marco Mesiti, A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications Information Systems. ,vol. 29, pp. 23- 46 ,(2004) , 10.1016/S0306-4379(03)00031-0