作者: Manami Harazaki , Joe Tekli , Shohei Yokoyama , Naoki Fukuta , Richard Chbeir
DOI: 10.1007/978-3-642-28807-4_63
关键词: XML Encryption 、 Streaming XML 、 Information retrieval 、 XML framework 、 Database 、 XML database 、 Computer science 、 Document Structure Description 、 XML Schema Editor 、 Efficient XML Interchange 、 XML validation
摘要: XML datasets of various sizes and properties are needed to evaluate the correctness efficiency XML-based algorithms applications. While several downloadable can be found online, these predefined by system experts might not suitable every algorithm. Tools for generating synthetic documents underline an alternative solution, promoting flexibility adaptability in document collections. Nonetheless, usefulness existing generators remains rather limited due restricted levels expressiveness allowed users. In this paper, we develop a novel By example Generator (XBeGene) producing data which closely reflect user’s requirements. Inspired query-by-example paradigm information retrieval, Our generator i)allows user provide her own sample as input, ii) analyzes structure, occurrence frequencies, content distributions each element input documents, iii) produces concur, both structural features, data. The size well that entire collection also specified user. Clustering experiments demonstrate high correlation between requirements characteristics generated data, while timing results confirm our approach’s scalability large scale