Data mining the PIMA dataset using rough set theory with a special emphasis on rule reduction

作者: Kenneth Revert Aurangzeb Khan

DOI: 10.1109/INMIC.2004.1492899

关键词: Dominance-based rough set approachReduction (complexity)Computer scienceGenetic algorithmTest setDecision tableRough setBiological databaseSet (abstract data type)Artificial intelligenceMachine learningData mining

摘要: This paper describes how rough set theory can be utilized as a tool for analyzing relatively complex decision tables like the Pima Indian Diabetes Database (PIDD). We Rosetta, public domain implementation of sets on PIDD in order to determine we could generate predictive rule with highest accuracy and fewest number rules. Having reduced is advantageous it provides focus salient attributes makes application clinical practice more efficient (and likely). In this paper, report use genetic algorithm based approach classification diabetes achieved success rate test 83%. favors highly compared other reported results, which ranged from 65% 75%. addition, were able achieve less than 100 The high low support data mining biological databases.

参考文章(5)
Shusaku Tsumoto, Automated knowledge acquisition from clinical databases based on rough sets and attribute-oriented generalization. american medical informatics association annual symposium. pp. 548- 552 ,(1998)
Dominik Ślęzak, Jakub Wróblewski, Classification Algorithms Based on Linear Combinations of Features european conference on principles of data mining and knowledge discovery. pp. 548- 553 ,(1999) , 10.1007/978-3-540-48247-5_72
Aleksander Øhrn, Todd Rowland, Rough sets: a knowledge discovery technique for multifactorial medical outcomes. American Journal of Physical Medicine & Rehabilitation. ,vol. 79, pp. 100- 108 ,(2000) , 10.1097/00002060-200001000-00022
C. C. Taylor, John Campbell, Donald Michie, D. J. Spiegelhalter, Machine Learning: Neural and Statistical Classification ,(2009)