XBeGene: Scalable XML Documents Generator by Example Based on Real Data

作者： Manami Harazaki , Joe Tekli , Shohei Yokoyama , Naoki Fukuta , Richard Chbeir

关键词: XML Encryption 、 Streaming XML 、 Information retrieval 、 XML framework 、 Database 、 XML database 、 Computer science 、 Document Structure Description 、 XML Schema Editor 、 Efficient XML Interchange 、 XML validation

摘要: XML datasets of various sizes and properties are needed to evaluate the correctness efficiency XML-based algorithms applications. While several downloadable can be found online, these predefined by system experts might not suitable every algorithm. Tools for generating synthetic documents underline an alternative solution, promoting flexibility adaptability in document collections. Nonetheless, usefulness existing generators remains rather limited due restricted levels expressiveness allowed users. In this paper, we develop a novel By example Generator (XBeGene) producing data which closely reflect user’s requirements. Inspired query-by-example paradigm information retrieval, Our generator i)allows user provide her own sample as input, ii) analyzes structure, occurrence frequencies, content distributions each element input documents, iii) produces concur, both structural features, data. The size well that entire collection also specified user. Clustering experiments demonstrate high correlation between requirements characteristics generated data, while timing results confirm our approach’s scalability large scale

springer.com 本地加速

springerlink.com 本地加速

doi.org 本地加速

sci-hub.st HTML 下载加速

参考文章(21)

Denilson Barbosa, Kelly A. Lyons, John Keenleyside, Alberto O. Mendelzon, ToXgene: An extensible template-based data generator for XML. international workshop on the web and databases. pp. 49- 54 ,(2002)

Ashraf Aboulnaga, Jeffrey F. Naughton, Chun Zhang, Generating Synthetic Complex-Structured XML Data. international workshop on the web and databases. pp. 79- 84 ,(2001)

Joe Tekli, Richard Chbeir, Kokou Yetongnon, Extensible User-Based XML Grammar Matching Conceptual Modeling - ER 2009. ,vol. 5829, pp. 294- 314 ,(2009) , 10.1007/978-3-642-04840-1_23

Sven Helmer, Measuring the structural similarity of semistructured documents using entropy very large data bases. pp. 1022- 1032 ,(2007)

Joe Tekli, Richard Chbeir, Kokou Yetongnon, A Hybrid Approach for XML Similarity conference on current trends in theory and practice of informatics. pp. 783- 795 ,(2007) , 10.1007/978-3-540-69507-3_68

Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi, Efficient Computation of Frequent and Top-k Elements in Data Streams Database Theory - ICDT 2005. pp. 398- 412 ,(2004) , 10.1007/978-3-540-30570-5_27

Laurent Candillier, Isabelle Tellier, Fabien Torre, Transforming XML trees for efficient classification and clustering INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval. pp. 469- 480 ,(2005) , 10.1007/978-3-540-34963-1_36

H. V. Jagadish, Andrew Nierman, Evaluating Structural Similarity in XML Documents international workshop on the web and databases. pp. 61- 66 ,(2002)

Kanda Runapongsa, Jignesh M. Patel, H.V. Jagadish, Yun Chen, Shurug Al-Khalifa, The Michigan benchmark: towards XML query performance diagnostics Information Systems. ,vol. 31, pp. 73- 97 ,(2006) , 10.1016/J.IS.2004.09.004

10.

Elisa Bertino, Giovanna Guerrini, Marco Mesiti, A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications Information Systems. ,vol. 29, pp. 23- 46 ,(2004) , 10.1016/S0306-4379(03)00031-0

XBeGene: Scalable XML Documents Generator by Example Based on Real Data

来源期刊

我的账户

XBeGene: Scalable XML Documents Generator by Example Based on Real Data

来源期刊

相似文章 2

XQuery Testing from XML Schema Based Random Test Cases

Automatic property-based testing and path validation of XQuery programs

我的账户