Analysis of grammatical evolutionary approaches to regular expression induction

作者: Antonio Gonzalez-Pardo , David Camacho

DOI: 10.1109/CEC.2011.5949679

关键词:

摘要: Regular expressions, or regexes, have been used traditionally as a pattern matching tool to search for structures in set of objects, like files, text documents folders. Pattern can be look files whose name contains given string, that contain specific within them, simply extract documents. It is very popular apply regexes detect and patterns represent phone numbers, URLs, email addresses, etc. These kind information characterized because it has well defined structure. Nevertheless, are not frequently its high complexity both, syntax grammatical rules, makes difficult understand. For this reason, the development programs able automatically generate, evaluate, become valuable task. This work analyzes performance different evolutionary approaches generation URL patterns. Four types grammars evaluated: context-free grammar, grammar with penalized fitness function, an extensible Christiansen grammar. considered problem, experimental results show best system, measured cumulative success rate, achieved using grammars.

参考文章(16)
Antonio González-Pardo, David F. Barrero, David Camacho, María D. R-Moreno, A case study on grammatical-based representation for regular expression evolution practical applications of agents and multi agent systems. pp. 379- 386 ,(2010) , 10.1007/978-3-642-12433-4_45
Agoston E. Eiben, J. E. Smith, Introduction to evolutionary computing ,(2003)
David F. Barrero, Antonio González, María D. R-Moreno, David Camacho, Variable Length-Based Genetic Representation to Automatically Evolve Wrappers practical applications of agents and multi agent systems. pp. 371- 378 ,(2010) , 10.1007/978-3-642-12433-4_44
Marina de la Cruz Echeandía, Alfonso Ortega de la Puente, A Christiansen Grammar for Universal Splicing Systems international work conference on the interplay between natural and artificial computation. pp. 336- 345 ,(2009) , 10.1007/978-3-642-02264-7_35
Jeffrey E. F. Friedl, Mastering Regular Expressions O'Reilly & Associates, Inc.. ,(1997)
Donald E. Knuth, Semantics of context-free languages Theory of Computing Systems \/ Mathematical Systems Theory. ,vol. 2, pp. 127- 145 ,(1968) , 10.1007/BF01692511
E Mark Gold, Language identification in the limit Information & Computation. ,vol. 10, pp. 447- 474 ,(1967) , 10.1016/S0019-9958(67)91165-5
H. Christiansen, A survey of adaptable grammars Sigplan Notices. ,vol. 25, pp. 35- 44 ,(1990) , 10.1145/101356.101357