MRCRAIG: MapReduce and Ensemble Classifiers for Parallelizing Data Classification Problems

作者: Sjsu ScholarWorks , Sami Khuri , Glenn Jahnke

DOI:

关键词:

摘要: MRCRAIG: MAPREDUCE AND ENSEMBLE CLASSIFIERS FOR PARALLELIZING DATA CLASSIFICATION PROBLEMS by Glenn Jahnke In this paper, a novel technique for parallelizing data-classification problems is applied to finding genes in sequences of DNA. The involves various ensemble classification methods such as Bagging and Select Best. It then distributes the classifier training prediction using MapReduce. A sequence voting algorithm evaluated method, well compared against Best method.

参考文章(11)
Michel Dumontier, Christopher WV Hogue, NBLAST: a cluster variant of BLAST for NxN comparisons BMC Bioinformatics. ,vol. 3, pp. 13- 13 ,(2002) , 10.1186/1471-2105-3-13
Axel Bernal, Koby Crammer, Artemis Hatzigeorgiou, Fernando Pereira, Global discriminative learning for higher-accuracy computational gene prediction. PLOS Computational Biology. ,vol. 3, ,(2005) , 10.1371/JOURNAL.PCBI.0030054
Remzi H Arpaci-Dusseau, Eric Anderson, Noah Treuhaft, David E Culler, Joseph M Hellerstein, David Patterson, Kathy Yelick, None, Cluster I/O with River: making the fast case common workshop on i/o in parallel and distributed systems. pp. 10- 22 ,(1999) , 10.1145/301816.301823
Joe Armstrong, Making reliable distributed systems in the presence of software errors Mikroelektronik och informationsteknik. ,(2003)
John D. Lafferty, Andrew McCallum, Fernando C. N. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data international conference on machine learning. pp. 282- 289 ,(2001)
Zhong-Hui Hu, Yuan-Gui Li, Yun-Ze Cai, Xiao-Ming Xu, An empirical comparison of ensemble classification algorithms with support vector machines international conference on machine learning and cybernetics. ,vol. 6, pp. 3520- 3523 ,(2004) , 10.1109/ICMLC.2004.1380399
Evan Keibler, Michael R Brent, Eval: A software package for analysis of genome annotations BMC Bioinformatics. ,vol. 4, pp. 50- 50 ,(2003) , 10.1186/1471-2105-4-50
Robert E. Schapire, The Boosting Approach to Machine Learning An Overview Springer, New York, NY. pp. 149- 171 ,(2003) , 10.1007/978-0-387-21579-2_9
Jeffrey Dean, Sanjay Ghemawat, MapReduce Communications of the ACM. ,vol. 51, pp. 107- 113 ,(2008) , 10.1145/1327452.1327492