作者: Uday Kamath , Kenneth De Jong , Amarda Shehu
DOI: 10.1371/JOURNAL.PONE.0099982
关键词: Artificial intelligence 、 Sequence 、 Biology 、 Pattern recognition (psychology) 、 Kernel method 、 Feature (machine learning) 、 Sequence analysis 、 Machine learning 、 Construct (python library) 、 Evolutionary algorithm 、 Bioinformatics 、 Set (abstract data type)
摘要: Background Many open problems in bioinformatics involve elucidating underlying functional signals biological sequences. DNA sequences, particular, are characterized by rich architectures which increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection regulatory regions, splice sites, exons, hypersensitive more. These naturally lend themselves formulation as classification machine learning. When is based on features extracted from sequences under investigation, success critically dependent chosen set features. Methodology We present an algorithmic framework (EFFECT) for automated We focus here involving state-of-the-art work learning shows be challenging complex combinations features. EFFECT uses a two-stage process first construct candidate sequence-based then select most effective subset task hand. Both stages make heavy use evolutionary algorithms efficiently guide search towards informative capable discriminating between that contain particular signal those do not. Results To demonstrate its generality, applied three separate importance research: recognition ALU sites. Comparisons with show both general powerful. In addition, detailed analysis constructed they valuable information about architecture, allowing biologists other researchers directly inspect potentially insights obtained assist wet-laboratory studies retainment or modification specific signal. Code, documentation, all data applications presented provided community http://www.cs.gmu.edu/~ashehu/?q=OurTools.