作者: Emanuel Rocha Woiski
DOI: 10.1007/978-3-319-55852-3_10
关键词:
摘要: Remaining useful life (RUL) of an equipment or system is a prognostic value that depends on data gathered from multiple and diverse sources. Moreover, assumed for the sake present study as binary classification problem, probability failure any usually very much smaller than same to be in normal operating conditions. The imbalanced outcome (largely more ‘normal’ ‘failure’ states) at time results combined values large set features, some related one another, redundant, most quite noisy. Previewing development requirements robust framework, it advocated by using Python libraries, those difficulties can dealt with. In Chapter, DOROTHEA, dataset UCI library with hundred thousand sparse anonymized (i.e. unrecognizable labels) features classes are analyzed. For that, ipython (jupyter) notebook, pandas used import set, then exploratory analysis feature engineering performed several estimators (classifiers) obtained scikit-learn applied. It demonstrated global accuracy does not work this case, since minority class easily overlooked algorithms. Therefore, receiver characteristics (ROC), Precision-Recall curves respective area under curve (AUCs) evaluated each estimator ensemble, well simple statistics, three hybrid methods, are, mix filter, embedded wrapper selection strategies, were compared.