作者: Michael V. Lombardo , Isotta Landi , Veronica Mandelli
DOI:
关键词:
摘要: Determining the number of clusters that best partitions a dataset can be challenging task because 1) lack priori information within an unsupervised learning framework; and 2) absence unique clustering validation approach to evaluate solutions. Here we present reval: Python package leverages stability-based relative methods determine Statistical software, both in R Python, usually rely on internal metrics, such as silhouette index, select fits data. Meanwhile, open-source software solutions easily implement techniques are lacking. Internal exploit characteristics data itself produce result, whereas approaches attempt leverage unknown underlying distribution points looking for replicable generalizable solution. The implementation further theory by enriching already available used investigate results different situations distributions. This work aims at contributing this effort developing method selects solution one replicates, via supervised learning, unseen subsets works with multiple classification algorithms, hence allowing assessment stability mechanisms.